|
I've seen in Erlang promotional materials some rather impressive claims about how cheap Erlang processes are, and how many of them one can spawn. Which is pretty cool. But, what Erlang programs take advantage of that kind of power? Are there any examples of programs which use huge numbers of processes in interesting ways? (I am the local Erlang fancier. I got challenged on that point, and didn't have a very good answer.) |
|
In the products we build, we tend to hit memory and bandwidth
limits long before we reach millions (even hundreds of thousand) processes. This is partly because many of the processes do network IO, and because our products are physically constrained by fairly tight space, cost and power budgets. Physical constraints are subject to change, of course... The way we chose to program nowadays is, however, inspired by the knowledge that we don't have to worry about the number of processes per se, even if we'd end up with several hundred thousand of them. BR, Ulf W 2008/9/23 Bard Bloom <[hidden email]>: > I've seen in Erlang promotional materials some rather impressive claims > about how cheap Erlang processes are, and how many of them one can spawn. > Which is pretty cool. But, what Erlang programs take advantage of that kind > of power? Are there any examples of programs which use huge numbers of > processes in interesting ways? (I am the local Erlang fancier. I got > challenged on that point, and didn't have a very good answer.) > > Thanks very much, > Bard Bloom > > _______________________________________________ > erlang-questions mailing list > [hidden email] > http://www.erlang.org/mailman/listinfo/erlang-questions > erlang-questions mailing list [hidden email] http://www.erlang.org/mailman/listinfo/erlang-questions |
|
In reply to this post by Bard Bloom
Bard Bloom ecrivait le 23.09.2008 15:22:
> I've seen in Erlang promotional materials some rather impressive claims > about how cheap Erlang processes are, and how many of them one can > spawn. Which is pretty cool. But, what Erlang programs take advantage of > that kind of power? Are there any examples of programs which use huge > numbers of processes in interesting ways? (I am the local Erlang > fancier. I got challenged on that point, and didn't have a very good > answer.) You can use tsung to simulate millions of users to do load/stress testing. It uses an erlang process for each simulated user. I tried to simulate ~1.3 million users, distributed on ~30 nodes, with one smp beam per node, to see if it works. it does :) -- Nicolas _______________________________________________ erlang-questions mailing list [hidden email] http://www.erlang.org/mailman/listinfo/erlang-questions |
|
Nicolas Niclausse skrev:
> Bard Bloom ecrivait le 23.09.2008 15:22: >> I've seen in Erlang promotional materials some rather impressive claims >> about how cheap Erlang processes are, and how many of them one can >> spawn. Which is pretty cool. But, what Erlang programs take advantage of >> that kind of power? Are there any examples of programs which use huge >> numbers of processes in interesting ways? (I am the local Erlang >> fancier. I got challenged on that point, and didn't have a very good >> answer.) > > You can use tsung to simulate millions of users to do load/stress testing. > It uses an erlang process for each simulated user. > > I tried to simulate ~1.3 million users, distributed on ~30 nodes, with one > smp beam per node, to see if it works. it does :) > Yeah, but that's just some 43k processes per node then. (: Don't you also run into the problem that practically every process in tsung does network IO? BR, Ulf W _______________________________________________ erlang-questions mailing list [hidden email] http://www.erlang.org/mailman/listinfo/erlang-questions |
|
In reply to this post by Bard Bloom
We've got a couple applications that use thousands of processes per
node. If those were pthreads, we'd be out of RAM before actually doing anything. 2008/9/23 Bard Bloom <[hidden email]>: > I've seen in Erlang promotional materials some rather impressive claims > about how cheap Erlang processes are, and how many of them one can spawn. > Which is pretty cool. But, what Erlang programs take advantage of that kind > of power? Are there any examples of programs which use huge numbers of > processes in interesting ways? (I am the local Erlang fancier. I got > challenged on that point, and didn't have a very good answer.) > > Thanks very much, > Bard Bloom > > _______________________________________________ > erlang-questions mailing list > [hidden email] > http://www.erlang.org/mailman/listinfo/erlang-questions > erlang-questions mailing list [hidden email] http://www.erlang.org/mailman/listinfo/erlang-questions |
|
In reply to this post by Ulf Wiger (TN/EAB)
Ulf Wiger (TN/EAB) ecrivait le 23.09.2008 17:14:
> Nicolas Niclausse skrev: >> Bard Bloom ecrivait le 23.09.2008 15:22: >>> I've seen in Erlang promotional materials some rather impressive claims >>> about how cheap Erlang processes are, and how many of them one can >>> spawn. Which is pretty cool. But, what Erlang programs take advantage of >>> that kind of power? Are there any examples of programs which use huge >>> numbers of processes in interesting ways? (I am the local Erlang >>> fancier. I got challenged on that point, and didn't have a very good >>> answer.) >> >> You can use tsung to simulate millions of users to do load/stress >> testing. >> It uses an erlang process for each simulated user. >> >> I tried to simulate ~1.3 million users, distributed on ~30 nodes, with >> one >> smp beam per node, to see if it works. it does :) >> > > Yeah, but that's just some 43k processes per node then. (: Yes, i didn't have enough memory on nodes so i had to use many of them :) (4GB per node with a 64 bit erlang VM) > Don't you also run into the problem that practically every > process in tsung does network IO? No. But each node had a gigabit ethernet link, and not all users (processes) were doing network IO simultaneously in the test (because of "thinktimes" in the scenario ), it was something like 10% of them. The cumulative bandwitdh was "only" 1Gbit/s -- Nicolas _______________________________________________ erlang-questions mailing list [hidden email] http://www.erlang.org/mailman/listinfo/erlang-questions |
|
In reply to this post by Bob Ippolito
I'm no Linux expert, but
http://en.wikipedia.org/wiki/Native_POSIX_Thread_Library "The Native POSIX Thread Library (NPTL) is a software feature that enables the Linux kernel to run programs written to use POSIX Threads fairly efficiently. In tests, NPTL succeeded in starting 100,000 threads on a IA-32 in two seconds. In comparison, this test under a kernel without NPTL would have taken around 15 minutes." I guess future Erlang VM will offer some more generic MxN threading model, i.e. M Erlang user-level processes implemented on N "schedulers" - native threads. Today in SMP Erlang is only limited support (i.e. command line options) to specify number o scheduler and no programmatic support for affinity of schedulers per core and Erlang processes per schedulers. Zvi
|
|
NPTL is fast, but AFAIK uses a minimum stack size of 8KB per thread (the minimum heap size for erlang processes seems to be 932 bytes on a 32-bit system). There also seem to be other limits, making it very difficult in practice to reach anywhere near 100,000 threads, and it's not encouraged either. http://nptl.bullopensource.org/Tests/NPTL-limits.html In the Linux kernel FAQ, the philosophy on threads is explained thus: "Avoid the temptation to create large numbers of threads in your application. Threads should only be used to take advantage of multiple processors or for specialised applications (i.e. low-latency real-time), not as a way of avoiding programmer effort (writing a state machine or an event callback system is quite easy). A good rule of thumb is to have up to 1.5 threads per processor and/or one thread per RT input stream. On a single processor system, a normal application would have at most two threads, over 10 threads is seriously flawed and hundreds or thousands of threads is progressively more insane. A common request is to modify the Linux scheduler to better handle large numbers of running processes/threads. This is always rejected by the kernel developer community because it is, frankly, stupid to have large numbers of threads. Many noted and respected people will extol the virtues of large numbers of threads. They are wrong. Some languages and toolkits create a thread for each object, because it fits into a particular ideology. A thread per object may be appealing in the abstract, but is in fact inefficient in the real world. Linux is not a good computer science project. It is, however, good engineering. Understand the distinction, and you will understand why many widely acclaimed ideas in computer science are held with contempt in the Linux kernel developer community. " http://www.kernel.org/pub/linux/docs/lkml/#s7-21 BR, Ulf W Zvi skrev: > I'm no Linux expert, but > > http://en.wikipedia.org/wiki/Native_POSIX_Thread_Library > > "The Native POSIX Thread Library (NPTL) is a software feature that > enables the Linux kernel to run programs written to use POSIX Threads > fairly efficiently. In tests, NPTL succeeded in starting 100,000 > threads on a IA-32 in two seconds. In comparison, this test under a > kernel without NPTL would have taken around 15 minutes." > > I guess future Erlang VM will offer some more generic MxN threading > model, i.e. M Erlang user-level processes implemented on N > "schedulers" - native threads. Today in SMP Erlang is only limited > support (i.e. command line options) to specify number o scheduler and > no programmatic support for affinity of schedulers per core and > Erlang processes per schedulers. > > Zvi > > > Bob Ippolito wrote: >> We've got a couple applications that use thousands of processes per >> node. If those were pthreads, we'd be out of RAM before actually >> doing anything. >> >> 2008/9/23 Bard Bloom <[hidden email]>: >>> I've seen in Erlang promotional materials some rather impressive >>> claims about how cheap Erlang processes are, and how many of them >>> one can spawn. Which is pretty cool. But, what Erlang programs >>> take advantage of that kind of power? Are there any examples of >>> programs which use huge numbers of processes in interesting ways? >>> (I am the local Erlang fancier. I got challenged on that point, >>> and didn't have a very good answer.) >>> >>> Thanks very much, Bard Bloom >>> >>> _______________________________________________ erlang-questions >>> mailing list [hidden email] >>> http://www.erlang.org/mailman/listinfo/erlang-questions >>> >> _______________________________________________ erlang-questions >> mailing list [hidden email] >> http://www.erlang.org/mailman/listinfo/erlang-questions >> >> > erlang-questions mailing list [hidden email] http://www.erlang.org/mailman/listinfo/erlang-questions |
|
In reply to this post by Zvi-2
On 9/23/08, Zvi <[hidden email]> wrote:
> > I'm no Linux expert, but > > http://en.wikipedia.org/wiki/Native_POSIX_Thread_Library > > "The Native POSIX Thread Library (NPTL) is a software feature that enables > the Linux kernel to run programs written to use POSIX Threads fairly > efficiently. > In tests, NPTL succeeded in starting 100,000 threads on a IA-32 in two > seconds. In comparison, this test under a kernel without NPTL would have > taken around 15 minutes." This test was done 6-7 years ago. As a result of this work, have you seen any Linux apps that make use of huge numbers of pthreads, thereby noticeably advancing the state of the art for whatever application domain they address? I haven't. Also, I wonder what the default thread stack size was set to for this test? I'm guessing it was set to be artificially small -- too small to do any real work, in fact -- just to get to the 100000 mark. I have to side with Bob's comment on this one. --steve > Bob Ippolito wrote: > > > > We've got a couple applications that use thousands of processes per > > node. If those were pthreads, we'd be out of RAM before actually doing > > anything. > > > > 2008/9/23 Bard Bloom <[hidden email]>: > >> I've seen in Erlang promotional materials some rather impressive claims > >> about how cheap Erlang processes are, and how many of them one can spawn. > >> Which is pretty cool. But, what Erlang programs take advantage of that > >> kind > >> of power? Are there any examples of programs which use huge numbers of > >> processes in interesting ways? (I am the local Erlang fancier. I got > >> challenged on that point, and didn't have a very good answer.) > >> > >> Thanks very much, > >> Bard Bloom > >> > >> _______________________________________________ > >> erlang-questions mailing list > >> [hidden email] > >> http://www.erlang.org/mailman/listinfo/erlang-questions > >> > > _______________________________________________ > > erlang-questions mailing list > > [hidden email] > > http://www.erlang.org/mailman/listinfo/erlang-questions > > > > > > > -- > View this message in context: http://www.nabble.com/Millions-of-processes--tp19627769p19631434.html > Sent from the Erlang Questions mailing list archive at Nabble.com. > > > _______________________________________________ > erlang-questions mailing list > [hidden email] > http://www.erlang.org/mailman/listinfo/erlang-questions > erlang-questions mailing list [hidden email] http://www.erlang.org/mailman/listinfo/erlang-questions |
|
In reply to this post by Ulf Wiger (TN/EAB)
also, aren't Erlang processes safer than Linux threads (not processes)?
_______________________________________________ erlang-questions mailing list [hidden email] http://www.erlang.org/mailman/listinfo/erlang-questions |
|
In reply to this post by Ulf Wiger (TN/EAB)
lol, this reminds me, that in one of the first PL/1 texbooks, was written: "please avoid using subroutines, since they are very slow" (not exact citation, but something like this). I guess, then subroutine was State-of-the-Art Computer Science concept and spaggethi-code was called a "good engineering" :-) I thought Linux is a server-side OS, so simple "thread-per-connection" model is bad for Linux. Now I understand why Apache fork itself. Zvi |
|
> I guess, then subroutine was State-of-the-Art Computer Science concept and
> spaggethi-code was called a "good engineering" :-) i see premature or sometimes sadly required optimization every working day. pretty depressing. _______________________________________________ erlang-questions mailing list [hidden email] http://www.erlang.org/mailman/listinfo/erlang-questions |
|
Raoul, I would not be depressed, there has always been a tradeoff between elegance and performance at the bleeding edge. On my first computer we taped bits of 5-hole paper tape into our programs because the plug-panel sine and cosine were too slow. Nothing has changed... /s/ Bill
-----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Raoul Duke Sent: Tuesday, September 23, 2008 3:16 PM To: [hidden email] Subject: Re: [erlang-questions] Millions of processes? > I guess, then subroutine was State-of-the-Art Computer Science concept and > spaggethi-code was called a "good engineering" :-) i see premature or sometimes sadly required optimization every working day. pretty depressing. _______________________________________________ erlang-questions mailing list [hidden email] http://www.erlang.org/mailman/listinfo/erlang-questions _______________________________________________ erlang-questions mailing list [hidden email] http://www.erlang.org/mailman/listinfo/erlang-questions |
|
In reply to this post by Ulf Wiger (TN/EAB)
On Tue, Sep 23, 2008 at 07:09:30PM +0200, Ulf Wiger (TN/EAB) wrote:
} In the Linux kernel FAQ, the philosophy on threads is } explained thus: } } "Avoid the temptation to create large numbers of threads in your } application. Threads should only be used to take advantage of multiple } processors or for specialised applications (i.e. low-latency real-time), } not as a way of avoiding programmer effort (writing a state machine or } an event callback system is quite easy). Contrast that to the Erlang Programming Rules and Conventions: 5.4 Assign exactly one parallel process to each true concurrent activity in the system -Vance _______________________________________________ erlang-questions mailing list [hidden email] http://www.erlang.org/mailman/listinfo/erlang-questions |
|
On Tue, 23 Sep 2008, Vance Shipley wrote:
> On Tue, Sep 23, 2008 at 07:09:30PM +0200, Ulf Wiger (TN/EAB) wrote: > } In the Linux kernel FAQ, the philosophy on threads is > } explained thus: > } > } "Avoid the temptation to create large numbers of threads in your > } application. Threads should only be used to take advantage of multiple > } processors or for specialised applications (i.e. low-latency real-time), > } not as a way of avoiding programmer effort (writing a state machine or > } an event callback system is quite easy). > > Contrast that to the Erlang Programming Rules and Conventions: > > 5.4 Assign exactly one parallel process to each true concurrent > activity in the system Don't forget that in erlang's case the above mentioned "programmer effort" was already done by the developers of the erlang VM... Bye,NAR -- "Beware of bugs in the above code; I have only proved it correct, not tried it." _______________________________________________ erlang-questions mailing list [hidden email] http://www.erlang.org/mailman/listinfo/erlang-questions |
|
On Wed, Sep 24, 2008 at 10:36 AM, <[hidden email]> wrote:
Also: "... because it is, frankly, stupid to have large numbers of threads. " Dear oh dear. _______________________________________________ erlang-questions mailing list [hidden email] http://www.erlang.org/mailman/listinfo/erlang-questions |
|
Alex Arnon skrev:
> > Also: > > "... because it is, frankly, stupid to have large numbers of threads. " > > Dear oh dear. ...but let's all remember that there's a big difference between erlang processes and POSIX threads. Threads share memory, so there's no isolation at all. I'd hate to debug a misbehaving application where hundreds of thousand threads all have unconstrained access to all data. BR, Ulf W _______________________________________________ erlang-questions mailing list [hidden email] http://www.erlang.org/mailman/listinfo/erlang-questions |
|
In reply to this post by Ulf Wiger (TN/EAB)
Greetings,
It is not only _ideas_ in computer science that are held in contempt (as quoted below). Ages ago (before 2000) I read an article about Linux in embedded environments. The article quoted Linus Torvalds on why not to use micro kernels. The reasons where that they are: 1 Experimental 2 Complex 3 Slow After looking around for a while I found plenty of articles about commercial micro kernels, and benchmarks showing micro kernels running workloads faster than monolithic kernels. So 1 and 3 seemed to be incorrect. I submitted these findings to the magazine, which prompted an answer from Mr Torvalds. He assert that all three where true, but did not discuss what I had found. So (IMHO) it is also facts that are not held in very high regard. bengt On Tue, 2008-09-23 at 19:09 +0200, Ulf Wiger (TN/EAB) wrote: > NPTL is fast, but AFAIK uses a minimum stack size of 8KB > per thread (the minimum heap size for erlang processes > seems to be 932 bytes on a 32-bit system). There also seem > to be other limits, making it very difficult in practice > to reach anywhere near 100,000 threads, and it's not > encouraged either. > > http://nptl.bullopensource.org/Tests/NPTL-limits.html > > In the Linux kernel FAQ, the philosophy on threads is > explained thus: > > "Avoid the temptation to create large numbers of threads in your > application. Threads should only be used to take advantage of multiple > processors or for specialised applications (i.e. low-latency real-time), > not as a way of avoiding programmer effort (writing a state machine or > an event callback system is quite easy). A good rule of thumb is to have > up to 1.5 threads per processor and/or one thread per RT input stream. > On a single processor system, a normal application would have at most > two threads, over 10 threads is seriously flawed and hundreds or > thousands of threads is progressively more insane. > A common request is to modify the Linux scheduler to better handle large > numbers of running processes/threads. This is always rejected by the > kernel developer community because it is, frankly, stupid to have large > numbers of threads. Many noted and respected people will extol the > virtues of large numbers of threads. They are wrong. Some languages and > toolkits create a thread for each object, because it fits into a > particular ideology. A thread per object may be appealing in the > abstract, but is in fact inefficient in the real world. Linux is not a > good computer science project. It is, however, good engineering. > Understand the distinction, and you will understand why many widely > acclaimed ideas in computer science are held with contempt in the Linux > kernel developer community. " > > http://www.kernel.org/pub/linux/docs/lkml/#s7-21 > > BR, > Ulf W > > > Zvi skrev: > > I'm no Linux expert, but > > > > http://en.wikipedia.org/wiki/Native_POSIX_Thread_Library > > > > "The Native POSIX Thread Library (NPTL) is a software feature that > > enables the Linux kernel to run programs written to use POSIX Threads > > fairly efficiently. In tests, NPTL succeeded in starting 100,000 > > threads on a IA-32 in two seconds. In comparison, this test under a > > kernel without NPTL would have taken around 15 minutes." > > > > I guess future Erlang VM will offer some more generic MxN threading > > model, i.e. M Erlang user-level processes implemented on N > > "schedulers" - native threads. Today in SMP Erlang is only limited > > support (i.e. command line options) to specify number o scheduler and > > no programmatic support for affinity of schedulers per core and > > Erlang processes per schedulers. > > > > Zvi > > > > > > Bob Ippolito wrote: > >> We've got a couple applications that use thousands of processes per > >> node. If those were pthreads, we'd be out of RAM before actually > >> doing anything. > >> > >> 2008/9/23 Bard Bloom <[hidden email]>: > >>> I've seen in Erlang promotional materials some rather impressive > >>> claims about how cheap Erlang processes are, and how many of them > >>> one can spawn. Which is pretty cool. But, what Erlang programs > >>> take advantage of that kind of power? Are there any examples of > >>> programs which use huge numbers of processes in interesting ways? > >>> (I am the local Erlang fancier. I got challenged on that point, > >>> and didn't have a very good answer.) > >>> > >>> Thanks very much, Bard Bloom > >>> > >>> _______________________________________________ erlang-questions > >>> mailing list [hidden email] > >>> http://www.erlang.org/mailman/listinfo/erlang-questions > >>> > >> _______________________________________________ erlang-questions > >> mailing list [hidden email] > >> http://www.erlang.org/mailman/listinfo/erlang-questions > >> > >> > > > _______________________________________________ > erlang-questions mailing list > [hidden email] > http://www.erlang.org/mailman/listinfo/erlang-questions _______________________________________________ erlang-questions mailing list [hidden email] http://www.erlang.org/mailman/listinfo/erlang-questions |
|
In reply to this post by Ulf Wiger (TN/EAB)
I do not understand *why* do we even compare ERLANG processes with POSIX
threads? What is the connection?? V. -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Ulf Wiger (TN/EAB) Sent: 24 September 2008 11:39 AM To: Alex Arnon Cc: [hidden email] Subject: Re: [erlang-questions] Millions of processes? Alex Arnon skrev: > > Also: > > "... because it is, frankly, stupid to have large numbers of threads. " > > Dear oh dear. ...but let's all remember that there's a big difference between erlang processes and POSIX threads. Threads share memory, so there's no isolation at all. I'd hate to debug a misbehaving application where hundreds of thousand threads all have unconstrained access to all data. BR, Ulf W _______________________________________________ erlang-questions mailing list [hidden email] http://www.erlang.org/mailman/listinfo/erlang-questions _______________________________________________ erlang-questions mailing list [hidden email] http://www.erlang.org/mailman/listinfo/erlang-questions |
|
On Wed, Sep 24, 2008 at 3:32 PM, Valentin Micic <[hidden email]> wrote:
> I do not understand *why* do we even compare ERLANG processes with POSIX > threads? What is the connection?? It's valuable to spread Erlang mindshare. What leaks out about Erlang to the outsiders is how Erlang is good for making use of multi-core computers and distributing computations. And while Erlang has some great examples of applications that scale very well with the numbers of cores available, its not really a representative picture that you go to Erlang to write your ray-tracer if you want it to run 16 times faster on a 16-core machine than a 1-core machine. The advice represented in the Linux FAQ mentioned above probably comes from people that are worried about using the optimal number of threads for their computations on a specific hardware platform to get the most out of it. I'd like to call the distinction: using threads for "modeling" reasons, and using it for "technical" reasons. In Erlang we use Erlang processes for modeling reasons, it is simpler to program if you map each concurrent activity in a system to a process, so code for each process only have a single sequential job to focus on (and to get right). Technical reasons to use threads or processes are those that are unrelated to making the modeling simpler, often they make the model more complex. Performing kernel convolutions on large images are typical of something that is very simple to express as a single sequential job, its basically four nested loops, for each pixel in the image, you sum the products of the kernel coefficients with a pixels in a region around the current pixel. This means each pixel in the result image only depends on the kernel and a NxN region around that pixel in the input image. There is potential for huge gains in speed in having 8 cores performing the convolution concurrently for 1/8th of pixels in the image each. But you need to do it right, you must make sure each core make the most of the data it gets into its cache-lines, otherwise you risk having your data bus being the bottleneck and you wont get a 8 time speedup. A lot of people coming to Erlang wonder where the libraries are for splitting up a job into parts executed in multiple processes so they can make this "technical" use of having multi-core computers. However, most use of Erlang is in domains where you have embarasingly simple parallelism. A web application can easily have 100 concurrent requests to process, and when modelled as an isolated process each, you can already potentially make use of a 96 core machine, which most of us can only wish for in a year or two right now. _______________________________________________ erlang-questions mailing list [hidden email] http://www.erlang.org/mailman/listinfo/erlang-questions |
| Powered by Nabble | Edit this page |
