Pitiful benchmark performance

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Pitiful benchmark performance

James Hague-3
(I hit "send" by accident.  Apologies.)

> I've been browsing the Doug Bagley Shootout pages:
> http://www.bagley.org/~doug/shootout/ and Erlang does
> spectacularly badly in
> quite a few areas where it shouldn't really be quite that bad.

Some of the benchmarks are pretty convoluted in Erlang.  I'm thinking
specifically of one of the Hash benchmarks, which does a lot of conversion
between lists and atoms inside an inner loop.  To wit:

doinserts1(10000, _) -> ok;
doinserts1(I, H) ->
    Key = list_to_atom(lists:append(
        "foo_", integer_to_list(I))),
    ets:insert(H, { Key, I }),
    doinserts1(I+1, H).

This is much more than a test of hashing.

James


Reply | Threaded
Open this post in threaded view
|

Pitiful benchmark performance

Sebastian Strollo-3
James Hague <jamesh> writes:

> Some of the benchmarks are pretty convoluted in Erlang.  I'm thinking
> specifically of one of the Hash benchmarks, which does a lot of conversion
> between lists and atoms inside an inner loop.
...

Hmm, further... This way:

  server_loop(Sock, Bytes + length(binary_to_list(Packet)));

of counting the number of bytes received in the echo test (instead of
just size(Packet)) creates quite an overhead...

/Sebastian


Reply | Threaded
Open this post in threaded view
|

Pitiful benchmark performance

Sebastian Strollo-3
matthias writes:

> Sebastian Strollo writes:
>
>  > of counting the number of bytes received in the echo test (instead of
>  > just size(Packet)) creates quite an overhead...
>
> Yes, that's exactly what I thought.
>
> So I changed the program and ran tests comparing the original and
> changed ones and there's no big difference.

Hm, you are quite right. I guess when it comes to 19 bytes the
difference is insignifcant. So I kept that change, made the constant a
binary (using <<"...">>), changed the sockets to active mode and reran
the tests. This gained me 10-20% which still leaves Erlang way behind
the other implementations. One thought I had was that the other tests
are forking and are not being charged for both processes CPU
time. *But* I ran the C, perl and the Erlang version of the test on
the same machine and just the difference in wall clock is really
big. I don't know what is going on? Any takers?

/Sebastian

Results from a linux machine:

  % /usr/bin/time ./a.out 100000
  server processed 1900000 bytes
  0.32user 6.68system 0:07.07elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
  0inputs+0outputs (95major+27minor)pagefaults 0swaps

  % /usr/bin/time ./echo.perl 100000
  server processed 1900000 bytes
  4.90user 7.91system 0:12.84elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
  0inputs+0outputs (270major+225minor)pagefaults 0swaps
 
  % /usr/bin/time erl -noshell -noinput -s echo main 100000
  server processed 1900000 bytes
  32.98user 11.39system 0:45.78elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
  0inputs+0outputs (632major+1099minor)pagefaults 0swaps
 
  % /usr/bin/time erl -noshell -noinput -s echo3 main 100000
  server processed 1900000 bytes
  26.44user 10.39system 0:38.45elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k
  0inputs+0outputs (633major+1106minor)pagefaults 0swaps


--- echo.erl Sat Jun  9 00:12:09 2001
+++ echo3.erl Sat Jun  9 01:14:18 2001
@@ -4,10 +4,10 @@
 
 %%% TBD - need to add check for valid response.
 
--module(echo).
+-module(echo3).
 -export([main/0, main/1, client/2, server/1]).
 
--define(DATA, "Hello there sailor\n").
+-define(DATA, <<"Hello there sailor\n">>).
 
 main() -> main(['1']).
 main([Arg]) ->
@@ -18,7 +18,7 @@
     init:stop().
 
 create_server_sock() ->
-    {ok, LSock} = gen_tcp:listen(0, [binary, {packet, 0}, {active, false}]),
+    {ok, LSock} = gen_tcp:listen(0, [binary]),
     LSock.
 
 socket_port(Sock) ->
@@ -26,17 +26,16 @@
     Port.
 
 client(N, ServerPort) ->
-    {ok, Sock} = gen_tcp:connect("localhost", ServerPort,
- [binary, {packet, 0}, {active, false}]),
+    {ok, Sock} = gen_tcp:connect("localhost", ServerPort, [binary]),
     client_loop(N, Sock),
     gen_tcp:close(Sock).
 
 client_loop(0, Sock) -> ok;
 client_loop(N, Sock) ->
     ok = gen_tcp:send(Sock, ?DATA),
-    case gen_tcp:recv(Sock, 0) of
- {ok, Packet} -> client_loop(N-1, Sock);
- {error, closed} -> ok
+    receive
+ {tcp, Sock, _} -> client_loop(N-1, Sock);
+ {tcp_closed, Sock} -> ok
     end.
 
 server(LSock) ->
@@ -45,11 +44,11 @@
     gen_tcp:close(LSock).
 
 server_loop(Sock, Bytes) ->
-    case gen_tcp:recv(Sock, 0) of
- {ok, Packet} ->
+    receive
+ {tcp, Sock, Packet} ->
     ok = gen_tcp:send(Sock, Packet),
-    server_loop(Sock, Bytes + length(binary_to_list(Packet)));
- {error, closed} ->
+    server_loop(Sock, Bytes + size(Packet));
+ {tcp_closed, Sock} ->
     io:format("server processed ~w bytes~n", [Bytes]),
     gen_tcp:close(Sock)
     end.


Reply | Threaded
Open this post in threaded view
|

Pitiful benchmark performance

Ulf Wiger-4
On 9 Jun 2001, Sebastian Strollo wrote:


>*But* I ran the C, perl and the Erlang version of the test on
>the same machine and just the difference in wall clock is really
>big. I don't know what is going on? Any takers?

I don't know, but I've never had the impression that Erlang was
very quick to initialize. What happens if the clock is started
by triggering it inside an already running VM?

Also, I don't know how much it affects the benchmark, but sockets
basically behave like low-priority processes. This seems to work
OK in a system that handles large volumes of I/O and doing some-
thing significant with most of it (e.g. connection handling.)

/Uffe
--
Ulf Wiger                                    tfn: +46  8 719 81 95
Senior System Architect                      mob: +46 70 519 81 95
Strategic Product & System Management    ATM Multiservice Networks
Data Backbone & Optical Services Division      Ericsson Telecom AB



Reply | Threaded
Open this post in threaded view
|

Pitiful benchmark performance

Ulf Wiger-4
On Sat, 9 Jun 2001, Ulf Wiger wrote:


>I don't know, but I've never had the impression that Erlang was
>very quick to initialize. What happens if the clock is started
>by triggering it inside an already running VM?

(Since I can barely understand what I wrote myself:)
By "triggering it", I meant first starting the VM, and then
running the benchmark from within Erlang. At least, this could
give a perspective on how much of the 19 sec lies in just
starting Erlang. Of course, clocking by hand on my old Pentium
90, this still only saves you 2-3 seconds...

What would happen if you e.g. change INPUT_REDUCTIONS in erl_vm.h
from (2* CONTEXT_REDS) to CONTEXT_REDS, doubling the I/O poll
frequency?  (CONTEXT_REDS is set to 1000)

If basically all that happens in the erlang vm is that some
process is waiting for I/O, ports may not be polled with any
predictably high frequency (then again, this is only guessing
wildly. Perhaps it effectively gives the ports even higher
priority...)


/Uffe
--
Ulf Wiger                                    tfn: +46  8 719 81 95
Senior System Architect                      mob: +46 70 519 81 95
Strategic Product & System Management    ATM Multiservice Networks
Data Backbone & Optical Services Division      Ericsson Telecom AB