Why erlang's computing performance is enormously less than c++

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Why erlang's computing performance is enormously less than c++

谈广云
i campare the erlang's computing with c++ 

erlang run 100000000 time the test_sum_0

test_sum_0(N) ->
  bp_eva_delta([1,2,3,4],[3,4,5,6],[]),
  test_sum_0(N-1).


bp_eva_delta([],_,L) ->
lists:reverse(L);
bp_eva_delta([O|Output],[S|Sigma],L) ->
bp_eva_delta(Output,Sigma,[S * O * (1-O) |L]).




c++ run the same time (100000000 ) the similar fun ,

for(int i = 0 ;i< 100000000;++i)
{
double b[5] = {1,2,3,4,5};
double s[5] = {6,7,8,9,10};
double o[5];
for(int i = 0; i < 5;++i)
{
o[i] = s[i] * b[i] * (1 - b[i]);
}

}.

the erlang spend 29's , and c++ spend 2.78's.

why the erlang is so slower than c++?

 Or I do not configure  the right parameter?




 


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Why erlang's computing performance is enormously less than c++

Raimo Niskanen-2
On Fri, Nov 11, 2016 at 03:26:30PM +0800, 谈广云 wrote:

> i campare the erlang's computing with c++
>
>
> erlang run 100000000 time the test_sum_0
>
>
> test_sum_0(N) ->
>   bp_eva_delta([1,2,3,4],[3,4,5,6],[]),
>   test_sum_0(N-1).
>
>
>
>
> bp_eva_delta([],_,L) ->
> lists:reverse(L);
> bp_eva_delta([O|Output],[S|Sigma],L) ->
> bp_eva_delta(Output,Sigma,[S * O * (1-O) |L]).
>
>
>
>
>
>
>
>
> c++ run the same time (100000000 ) the similar fun ,
>
>
> for(int i = 0 ;i< 100000000;++i)
> {
> double b[5] = {1,2,3,4,5};
> double s[5] = {6,7,8,9,10};
> double o[5];
> for(int i = 0; i < 5;++i)
> {
> o[i] = s[i] * b[i] * (1 - b[i]);
> }
>
>
> }.
>
>
> the erlang spend 29's , and c++ spend 2.78's.
>
>
> why the erlang is so slower than c++?
>

You are comparing apples with pears.

For starters; your Erlang code probably spends most of it time allocing new
memory, garbage collecting and freeing memory, while your C++ code just
reads and writes from the same stack memory locations.

Both examples produce nothing but in different ways.
This is a very syntethic and unjust comparision.

>
>  Or I do not configure  the right parameter?

What are you trying to measure?


--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Why erlang's computing performance is enormously less than c++

Richard Carlsson-3
In reply to this post by 谈广云
You are comparing a native-compiled C++ program that works on small arrays of raw numbers with an interpreted Erlang program that traverses linked lists of tagged numbers. The only surprise is that the difference is _only_ a factor 10. (And if the C code was using integers instead of double precision floats, it would be even faster.)


        /Richard

2016-11-11 8:26 GMT+01:00 谈广云 <[hidden email]>:
i campare the erlang's computing with c++ 

erlang run 100000000 time the test_sum_0

test_sum_0(N) ->
  bp_eva_delta([1,2,3,4],[3,4,5,6],[]),
  test_sum_0(N-1).


bp_eva_delta([],_,L) ->
lists:reverse(L);
bp_eva_delta([O|Output],[S|Sigma],L) ->
bp_eva_delta(Output,Sigma,[S * O * (1-O) |L]).




c++ run the same time (100000000 ) the similar fun ,

for(int i = 0 ;i< 100000000;++i)
{
double b[5] = {1,2,3,4,5};
double s[5] = {6,7,8,9,10};
double o[5];
for(int i = 0; i < 5;++i)
{
o[i] = s[i] * b[i] * (1 - b[i]);
}

}.

the erlang spend 29's , and c++ spend 2.78's.

why the erlang is so slower than c++?

 Or I do not configure  the right parameter?




 


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Why erlang's computing performance is enormously less than c++

Tony Rogvall-2
The original program is looping over [1,2,3,4] when it should loop over [1,2,3,4,5] as
in the C program. After fixing that and also making sure the program is using floating point
numbers and adding a variable that calculate a result, then the difference is  ( on my mac )

Erlang: 57s
C: 3.7s

That is 15 times slower which is not that bad considering :-)
But when adding a -O3 flag to the C code compilation that ratio will increases to 110 times slower.
Just tossing in a -native flag did not lead to a any significant change but…
The when using the still forgotten loop unrolling directives, inline sizes and friends.
I used this ( WARNING! not to be used in production code yet, I guess? )

-compile(native).
-compile(inline).
-compile({inline_size,1000}).
-compile({inline_effort,2000}).
-compile({inline_unroll,6}).

Erlang: 4.3s

Which is nearly the same as unoptimized C code and
just 8 times slower than -O3 optimized C code.
and that is just amazing!

/Tony

> On 11 nov 2016, at 11:13, Richard Carlsson <[hidden email]> wrote:
>
> You are comparing a native-compiled C++ program that works on small arrays of raw numbers with an interpreted Erlang program that traverses linked lists of tagged numbers. The only surprise is that the difference is _only_ a factor 10. (And if the C code was using integers instead of double precision floats, it would be even faster.)
>
>
>         /Richard
>
> 2016-11-11 8:26 GMT+01:00 谈广云 <[hidden email]>:
> i campare the erlang's computing with c++
>
> erlang run 100000000 time the test_sum_0
>
> test_sum_0(N) ->
>   bp_eva_delta([1,2,3,4],[3,4,5,6],[]),
>   test_sum_0(N-1).
>
>
> bp_eva_delta([],_,L) ->
> lists:reverse(L);
> bp_eva_delta([O|Output],[S|Sigma],L) ->
> bp_eva_delta(Output,Sigma,[S * O * (1-O) |L]).
>
>
>
>
> c++ run the same time (100000000 ) the similar fun ,
>
> for(int i = 0 ;i< 100000000;++i)
> {
> double b[5] = {1,2,3,4,5};
> double s[5] = {6,7,8,9,10};
> double o[5];
> for(int i = 0; i < 5;++i)
> {
> o[i] = s[i] * b[i] * (1 - b[i]);
> }
>
> }.
>
> the erlang spend 29's , and c++ spend 2.78's.
>
> why the erlang is so slower than c++?
>
>  Or I do not configure  the right parameter?
>
>
>
>
>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

signature.asc (817 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Why erlang's computing performance is enormously less than c++

Vans S
In reply to this post by 谈广云
Please write C nif containing function:

for(int i = 0 ;i< 100000000;++i)
{
double b[5] = {1,2,3,4,5};
double s[5] = {6,7,8,9,10};
double o[5];
for(int i = 0; i < 5;++i)
{
o[i] = s[i] * b[i] * (1 - b[i]);
}

}.


change erlang code to:

test_sum_0(N) ->
  call_c_nif(N);

Try test again.



On Friday, November 11, 2016 5:01 AM, 谈广云 <[hidden email]> wrote:


i campare the erlang's computing with c++ 

erlang run 100000000 time the test_sum_0

test_sum_0(N) ->
  bp_eva_delta([1,2,3,4],[3,4,5,6],[]),
  test_sum_0(N-1).


bp_eva_delta([],_,L) ->
lists:reverse(L);
bp_eva_delta([O|Output],[S|Sigma],L) ->
bp_eva_delta(Output,Sigma,[S * O * (1-O) |L]).




c++ run the same time (100000000 ) the similar fun ,

for(int i = 0 ;i< 100000000;++i)
{
double b[5] = {1,2,3,4,5};
double s[5] = {6,7,8,9,10};
double o[5];
for(int i = 0; i < 5;++i)
{
o[i] = s[i] * b[i] * (1 - b[i]);
}

}.

the erlang spend 29's , and c++ spend 2.78's.

why the erlang is so slower than c++?

 Or I do not configure  the right parameter?




 

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Why erlang's computing performance is enormously less than c++

谈广云
In reply to this post by Tony Rogvall-2
can you tell me why the native flag can cause such improvement?

the way that add natvie flag is not the bese ,since it is not in safe mode,.

is there anyother Available way?





At 2016-11-11 21:10:10, "Tony Rogvall" <[hidden email]> wrote: >The original program is looping over [1,2,3,4] when it should loop over [1,2,3,4,5] as >in the C program. After fixing that and also making sure the program is using floating point >numbers and adding a variable that calculate a result, then the difference is ( on my mac ) > >Erlang: 57s >C: 3.7s > >That is 15 times slower which is not that bad considering :-) >But when adding a -O3 flag to the C code compilation that ratio will increases to 110 times slower. >Just tossing in a -native flag did not lead to a any significant change but… >The when using the still forgotten loop unrolling directives, inline sizes and friends. >I used this ( WARNING! not to be used in production code yet, I guess? ) > >-compile(native). >-compile(inline). >-compile({inline_size,1000}). >-compile({inline_effort,2000}). >-compile({inline_unroll,6}). > >Erlang: 4.3s > >Which is nearly the same as unoptimized C code and >just 8 times slower than -O3 optimized C code. >and that is just amazing! > >/Tony > >> On 11 nov 2016, at 11:13, Richard Carlsson <[hidden email]> wrote: >> >> You are comparing a native-compiled C++ program that works on small arrays of raw numbers with an interpreted Erlang program that traverses linked lists of tagged numbers. The only surprise is that the difference is _only_ a factor 10. (And if the C code was using integers instead of double precision floats, it would be even faster.) >> >> >> /Richard >> >> 2016-11-11 8:26 GMT+01:00 谈广云 <[hidden email]>: >> i campare the erlang's computing with c++ >> >> erlang run 100000000 time the test_sum_0 >> >> test_sum_0(N) -> >> bp_eva_delta([1,2,3,4],[3,4,5,6],[]), >> test_sum_0(N-1). >> >> >> bp_eva_delta([],_,L) -> >> lists:reverse(L); >> bp_eva_delta([O|Output],[S|Sigma],L) -> >> bp_eva_delta(Output,Sigma,[S * O * (1-O) |L]). >> >> >> >> >> c++ run the same time (100000000 ) the similar fun , >> >> for(int i = 0 ;i< 100000000;++i) >> { >> double b[5] = {1,2,3,4,5}; >> double s[5] = {6,7,8,9,10}; >> double o[5]; >> for(int i = 0; i < 5;++i) >> { >> o[i] = s[i] * b[i] * (1 - b[i]); >> } >> >> }. >> >> the erlang spend 29's , and c++ spend 2.78's. >> >> why the erlang is so slower than c++? >> >> Or I do not configure the right parameter? >> >> >> >> >> >> >> _______________________________________________ >> erlang-questions mailing list >> [hidden email] >> http://erlang.org/mailman/listinfo/erlang-questions >> >> >> _______________________________________________ >> erlang-questions mailing list >> [hidden email] >> http://erlang.org/mailman/listinfo/erlang-questions >


 


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Why erlang's computing performance is enormously less than c++

Max Lapshin-2
Tony, you've just exploded my brain.

I will spend next 2 weeks in a fuzzy applying hipe flags to our Flussonic =)

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Why erlang's computing performance is enormously less than c++

Pierre Fenoll-2
Tony, could you write a blog post describing how & when to use
"the still forgotten loop unrolling directives, inline sizes and friends"?

Pretty please


Cheers,
-- 
Pierre Fenoll


On 11 November 2016 at 13:54, Max Lapshin <[hidden email]> wrote:
Tony, you've just exploded my brain.

I will spend next 2 weeks in a fuzzy applying hipe flags to our Flussonic =)

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Why erlang's computing performance is enormously less than c++

Sergej Jurečko
In reply to this post by Tony Rogvall-2

On Nov 11, 2016 6:40 PM, "Tony Rogvall" <[hidden email]> wrote:

> I used this ( WARNING! not to be used in production code yet, I guess? )

Are these flags new?

Sergej


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Why erlang's computing performance is enormously less than c++

Richard Carlsson-3
New as of 2001 or so. :-) Inlining is documented towards the bottom of this page: http://erlang.org/doc/man/compile.html

However, it only describes the inline_size option. There is also the inline_effort limit, which can be increased from the default 150 at the expense of compile time (its purpose is to ensure that the automatic inliner does not get bogged down in any particular part of the code). And then there's the slightly experimental inline_unroll, which is actually more of a side effect of the normal inlining behaviour if you just allow it to repeat itself on loops.

The interaction between the unroll limit and the size/effort limit is not obvious (and could maybe be improved - I haven't looked at that code for 15 years). In particular, the size limit seems to need bumping from the default 24 to about 200 or more for unrolling to happen, depending on the size of the loop body, and the effort limit also needs raising to at least 500 or 1000. I suggest you use the 'to_core' option and inspect the result until you find settings that work for your program. If you want to use unrolling you should probably put that code in a separate module and use custom compiler option for that module, not apply the same limits to your whole code base.

See comments in https://github.com/erlang/otp/blob/maint/lib/compiler/src/cerl_inline.erl for details. (It's a wonderful algorithm, if you're into that sort of thing, but can take a while to get your head around. It's basically just constant propagation and folding, treating functions like any other constants, and handling local functions and funs in the same way. I'd revisit it if I had the time.)

Note that if you use the option {inline,[{Name,Arity},...]} instead of just 'inline', then an older, simpler inliner is used, which _only_ inlines those functions you listed, ignoring any size limits.

        /Richard

2016-11-12 6:56 GMT+01:00 Sergej Jurečko <[hidden email]>:

On Nov 11, 2016 6:40 PM, "Tony Rogvall" <[hidden email]> wrote:

> I used this ( WARNING! not to be used in production code yet, I guess? )

Are these flags new?

Sergej



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Why erlang's computing performance is enormously less than c++

Tony Rogvall-2
Thank you for the pointers and some insights.
I noticed that unroll directive was needed for this case. Actually size, effort and unroll was needed to get the desired effect.
I have been nagging about this before. Some effort should be made to do these optimizations automatically, they can be really hard to do manually, also consider how "funny" the code would look if you unroll the code your self ;-)

/Tony
"typed while walking!"

On 12 Nov 2016, at 10:53, Richard Carlsson <[hidden email]> wrote:

New as of 2001 or so. :-) Inlining is documented towards the bottom of this page: http://erlang.org/doc/man/compile.html

However, it only describes the inline_size option. There is also the inline_effort limit, which can be increased from the default 150 at the expense of compile time (its purpose is to ensure that the automatic inliner does not get bogged down in any particular part of the code). And then there's the slightly experimental inline_unroll, which is actually more of a side effect of the normal inlining behaviour if you just allow it to repeat itself on loops.

The interaction between the unroll limit and the size/effort limit is not obvious (and could maybe be improved - I haven't looked at that code for 15 years). In particular, the size limit seems to need bumping from the default 24 to about 200 or more for unrolling to happen, depending on the size of the loop body, and the effort limit also needs raising to at least 500 or 1000. I suggest you use the 'to_core' option and inspect the result until you find settings that work for your program. If you want to use unrolling you should probably put that code in a separate module and use custom compiler option for that module, not apply the same limits to your whole code base.

See comments in https://github.com/erlang/otp/blob/maint/lib/compiler/src/cerl_inline.erl for details. (It's a wonderful algorithm, if you're into that sort of thing, but can take a while to get your head around. It's basically just constant propagation and folding, treating functions like any other constants, and handling local functions and funs in the same way. I'd revisit it if I had the time.)

Note that if you use the option {inline,[{Name,Arity},...]} instead of just 'inline', then an older, simpler inliner is used, which _only_ inlines those functions you listed, ignoring any size limits.

        /Richard

2016-11-12 6:56 GMT+01:00 Sergej Jurečko <[hidden email]>:

On Nov 11, 2016 6:40 PM, "Tony Rogvall" <[hidden email]> wrote:

> I used this ( WARNING! not to be used in production code yet, I guess? )

Are these flags new?

Sergej



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Why erlang's computing performance is enormously less than c++

Richard A. O'Keefe-2
In reply to this post by 谈广云


On 11/11/16 8:26 PM, 谈广云 wrote:

> i campare the erlang's computing with c++
>
> erlang run 100000000 time the test_sum_0
>
> test_sum_0(N) ->
>   bp_eva_delta([1,2,3,4],[3,4,5,6],[]),
>   test_sum_0(N-1).
>
>
> bp_eva_delta([],_,L) ->
> lists:reverse(L);
> bp_eva_delta([O|Output],[S|Sigma],L) ->
> bp_eva_delta(Output,Sigma,[S * O * (1-O) |L]).

Here are some times I got.
Erlang (native compilation)  :  10.1 seconds.
Erlang (unrolled loop)       :   2.8 seconds.
Standard ML                  :   2.7 seconds.
Clean (default lazy lists)   :   8.3 seconds.
Clean (unrolled strict data) :   3.0 seconds.
A fair comparison in C       : 118.4 seconds.

The thing is that the C and Erlang code may be computing
the same function (technically they aren't), but they are
not doing it the same WAY, so the comparison is not a
comparison of LANGUAGES but a comparison of
*list processing* in one language with
*array processing* in another language.
When you compare Erlang with statically typed languages
doing the same thing (well, not quite) the same *way*
you find the numbers pleasantly close.

A list is made up of pairs.
A fairer analogue of this in C would be

     struct Node {
         struct Node *next;
         int          item;
     };
     struct Node dummy = {0,0};

     struct Node *revloop(
         struct Node *L,
         struct Node *R
     ) {
         while (L != &dummy) {
             struct Node *N = malloc(sizeof *N);
             N->next = R, N->item = L->item;
             R = N, L = L->next;
         }
         return R;
     }

     struct Node *reverse(
         struct Node *L
     ) {
         return revloop(L, &dummy);
     }

     struct Node *bp_eva_delta(
         struct Node *Output,
         struct Node *Sigma
     ) {
         struct Node *L = &dummy;
         while (Output != &dummy && Sigma != &dummy) {
             int O = Output->item, S = Sigma->item;
             struct Node *N = malloc(sizeof *N);
             N->next = L,
             N->item = S * O * (1 - O);
             L = N;
         }
         return reverse(L);
     }

     struct Node *cons(
         int item,
         struct Node *next
     ) {
         struct Node *N = malloc(sizeof *N);
         N->next = next, N->item = item;
         return N;
     }

     void test_sum_0(
         void
     ) {
         struct Node *Output =
             cons(1, cons(2, cons(3, cons(4, &dummy))));
         struct Node *Sigma =
             cons(3, cons(4, cons(5, cons(6, &dummy))));
         struct Node *R;
         int N;
         for (N = 100*1000*1000; N > 0; N--) {
             R = bp_eva_delta(Output, Sigma);
         }
     }

     int main(void) {
         clock_t t0, t1;
         t0 = clock();
         test_sum_0();
         t1 = clock();
         printf("%g\n", (t1-t0)/(double)CLOCKS_PER_SEC);
         return 0;
     }


> the erlang spend 29's , and c++ spend 2.78's.
>
> why the erlang is so slower than c++?

On the contrary, why is C so staggeringly slow compared
with Erlang, Clean, and SML?  (On my desktop machine, that
is.  On my laptop, it ran for a LONG time and then other
things started dying.  Hint: no GC.)


There are at least five differences between your C++
and Erlang examples:

(1) List processing vs array processing.
(2) Memory allocation costs (malloc() can be  S  L  O  W).
(3) Static type system.
(4) Truncating arithmetic.
(5) Loop unrolling.
and there may be an issue of
(6) native code compilation vs emulated code.

The SML, Clean, C, and C++ programs use *truncating*
integer arithmetic.  The Erlang program uses unbounded
integer arithmetic, with no prospect of overflow.  It
takes extra time to be ready for that.

The fast Erlang code doesn't use a list, it uses an
*unrolled* list:
  -type urlist(T) :: {T,T,T,T,urlist(T)}
    | {T,T,T} | {T,T} | {T} | {}.

For example, {1,2,3,4,{5,6,7,8,{}}}.
The Erlang (unrolled data) code does this, with manual
loop unrolling.  I have library code for unrolled
strict lists in Haskell, Clean, and SML, but not (yet)
for Erlang.

Thinking about unrolling is fair because this is something
that C and C++ compilers routinely do these days.
>
>  Or I do not configure  the right parameter?

Assuming we are using similar machines, it is possible that
your Erlang code was running emulated, not native.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions