Quantcast

Erlang read file benchmark

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Erlang read file benchmark

Evans, Matthew
Sorry if this is a duplicate email.

I can understand Erlang being a bit slower than Perl for this. Can't see an excuse for such a difference though.

http://agentzh.org/#ErlangFileReadLineBenchmark

Matt

Sent from my iPhone
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Erlang read file benchmark

Michael Truog
He only showed the results on the command-line.  It would be nice to see results that show runtime without the startup/teardown overhead that the Erlang VM has, since it has a lot more going on than the perl interpreter.  I know he briefly mentioned that the difference seemed minimal, but he posted no results to show that.

On 07/09/2011 12:15 PM, Evans, Matthew wrote:

> Sorry if this is a duplicate email.
>
> I can understand Erlang being a bit slower than Perl for this. Can't see an excuse for such a difference though.
>
> http://agentzh.org/#ErlangFileReadLineBenchmark
>
> Matt
>
> Sent from my iPhone
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Erlang read file benchmark

Bob Ippolito
file:read_line does some pretty awful things, I'd expect it to be very
slow. That said, there should be a much faster yet still easy way to
do this quickly but there isn't one baked into OTP that I know of.

On Saturday, July 9, 2011, Michael Truog <[hidden email]> wrote:

> He only showed the results on the command-line.  It would be nice to see results that show runtime without the startup/teardown overhead that the Erlang VM has, since it has a lot more going on than the perl interpreter.  I know he briefly mentioned that the difference seemed minimal, but he posted no results to show that.
>
> On 07/09/2011 12:15 PM, Evans, Matthew wrote:
>> Sorry if this is a duplicate email.
>>
>> I can understand Erlang being a bit slower than Perl for this. Can't see an excuse for such a difference though.
>>
>> http://agentzh.org/#ErlangFileReadLineBenchmark
>>
>> Matt
>>
>> Sent from my iPhone
>> _______________________________________________
>> erlang-questions mailing list
>> [hidden email]
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Erlang read file benchmark

Kenny Stone
Why is it awful?

On Sat, Jul 9, 2011 at 6:07 PM, Bob Ippolito <[hidden email]> wrote:
file:read_line does some pretty awful things, I'd expect it to be very
slow. That said, there should be a much faster yet still easy way to
do this quickly but there isn't one baked into OTP that I know of.

On Saturday, July 9, 2011, Michael Truog <[hidden email]> wrote:
> He only showed the results on the command-line.  It would be nice to see results that show runtime without the startup/teardown overhead that the Erlang VM has, since it has a lot more going on than the perl interpreter.  I know he briefly mentioned that the difference seemed minimal, but he posted no results to show that.
>
> On 07/09/2011 12:15 PM, Evans, Matthew wrote:
>> Sorry if this is a duplicate email.
>>
>> I can understand Erlang being a bit slower than Perl for this. Can't see an excuse for such a difference though.
>>
>> http://agentzh.org/#ErlangFileReadLineBenchmark
>>
>> Matt
>>
>> Sent from my iPhone
>> _______________________________________________
>> erlang-questions mailing list
>> [hidden email]
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Erlang read file benchmark

Bob Ippolito
I think what most people want (especially for benchmarks) is something
that doesn't care about encodings and doesn't have a lot of
indirection. The current solution is VERY flexible, which comes at a
severe cost to performance in this case.

If you read the source code you'll see that file:read_file/1 calls
into io:request/2 which eventually (in another process) ends up in
file_io_server:io_request/2 and ends up reading either 128 bytes or
8kb at a time, doing some unicode junk, and ends up calling
io_lib:collect_line/4 to collect each line chunk at a time.

If I was trying to win a benchmark I'd probably go directly to
prim_file, do my own buffering, and use erlang:decode_packet/3 or the
binary module to split on the newlines. If I wanted to make a nicer
API I'd put that in a process to manage the buffering.

On Sat, Jul 9, 2011 at 4:08 PM, Kenny Stone <[hidden email]> wrote:

> Why is it awful?
>
> On Sat, Jul 9, 2011 at 6:07 PM, Bob Ippolito <[hidden email]> wrote:
>>
>> file:read_line does some pretty awful things, I'd expect it to be very
>> slow. That said, there should be a much faster yet still easy way to
>> do this quickly but there isn't one baked into OTP that I know of.
>>
>> On Saturday, July 9, 2011, Michael Truog <[hidden email]> wrote:
>> > He only showed the results on the command-line.  It would be nice to see
>> > results that show runtime without the startup/teardown overhead that the
>> > Erlang VM has, since it has a lot more going on than the perl interpreter.
>> >  I know he briefly mentioned that the difference seemed minimal, but he
>> > posted no results to show that.
>> >
>> > On 07/09/2011 12:15 PM, Evans, Matthew wrote:
>> >> Sorry if this is a duplicate email.
>> >>
>> >> I can understand Erlang being a bit slower than Perl for this. Can't
>> >> see an excuse for such a difference though.
>> >>
>> >> http://agentzh.org/#ErlangFileReadLineBenchmark
>> >>
>> >> Matt
>> >>
>> >> Sent from my iPhone
>> >> _______________________________________________
>> >> erlang-questions mailing list
>> >> [hidden email]
>> >> http://erlang.org/mailman/listinfo/erlang-questions
>> >>
>> >
>> > _______________________________________________
>> > erlang-questions mailing list
>> > [hidden email]
>> > http://erlang.org/mailman/listinfo/erlang-questions
>> >
>> _______________________________________________
>> erlang-questions mailing list
>> [hidden email]
>> http://erlang.org/mailman/listinfo/erlang-questions
>
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Erlang read file benchmark

Evans, Matthew
Using prim_file and the binary module produces:

[mevans@scream ~]$  time perl reader.pl
Found 6032291 lines.

real    0m1.368s
user    0m1.295s
sys     0m0.068s
[mevans@scream ~]$ time erl  -noshell -s reader main
Found 6048490 lines.

real    0m2.090s
user    0m1.965s
sys     0m0.111s


The Erlang code is reporting more lines because I am double counting cases where a line is split between reads. Of course, this would be easy to handle - I just wanted a proof of concept :

-module(reader).
-export([main/0]).
-define(NL,10).

main() ->
    {ok, File} = prim_file:open("data.log", [read,binary]),
    Lines = count_lines(File, 0),
    io:format("Found ~w lines.~n", [Lines]),
    halt().

count_lines(File, Count) ->
    case prim_file:read(File,8192) of
        {ok, Line} ->
            TC = length(binary:split(Line,[<<10>>],[global])),
            count_lines(File, Count+TC);
        _ ->
            Count
    end.


________________________________________
From: [hidden email] [[hidden email]] On Behalf Of Bob Ippolito [[hidden email]]
Sent: Saturday, July 09, 2011 7:33 PM
To: Kenny Stone
Cc: erlang-questions Questions
Subject: Re: [erlang-questions] Erlang read file benchmark

I think what most people want (especially for benchmarks) is something
that doesn't care about encodings and doesn't have a lot of
indirection. The current solution is VERY flexible, which comes at a
severe cost to performance in this case.

If you read the source code you'll see that file:read_file/1 calls
into io:request/2 which eventually (in another process) ends up in
file_io_server:io_request/2 and ends up reading either 128 bytes or
8kb at a time, doing some unicode junk, and ends up calling
io_lib:collect_line/4 to collect each line chunk at a time.

If I was trying to win a benchmark I'd probably go directly to
prim_file, do my own buffering, and use erlang:decode_packet/3 or the
binary module to split on the newlines. If I wanted to make a nicer
API I'd put that in a process to manage the buffering.

On Sat, Jul 9, 2011 at 4:08 PM, Kenny Stone <[hidden email]> wrote:

> Why is it awful?
>
> On Sat, Jul 9, 2011 at 6:07 PM, Bob Ippolito <[hidden email]> wrote:
>>
>> file:read_line does some pretty awful things, I'd expect it to be very
>> slow. That said, there should be a much faster yet still easy way to
>> do this quickly but there isn't one baked into OTP that I know of.
>>
>> On Saturday, July 9, 2011, Michael Truog <[hidden email]> wrote:
>> > He only showed the results on the command-line.  It would be nice to see
>> > results that show runtime without the startup/teardown overhead that the
>> > Erlang VM has, since it has a lot more going on than the perl interpreter.
>> >  I know he briefly mentioned that the difference seemed minimal, but he
>> > posted no results to show that.
>> >
>> > On 07/09/2011 12:15 PM, Evans, Matthew wrote:
>> >> Sorry if this is a duplicate email.
>> >>
>> >> I can understand Erlang being a bit slower than Perl for this. Can't
>> >> see an excuse for such a difference though.
>> >>
>> >> http://agentzh.org/#ErlangFileReadLineBenchmark
>> >>
>> >> Matt
>> >>
>> >> Sent from my iPhone
>> >> _______________________________________________
>> >> erlang-questions mailing list
>> >> [hidden email]
>> >> http://erlang.org/mailman/listinfo/erlang-questions
>> >>
>> >
>> > _______________________________________________
>> > erlang-questions mailing list
>> > [hidden email]
>> > http://erlang.org/mailman/listinfo/erlang-questions
>> >
>> _______________________________________________
>> erlang-questions mailing list
>> [hidden email]
>> http://erlang.org/mailman/listinfo/erlang-questions
>
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Erlang read file benchmark

Martin Dimitrov
In reply to this post by Bob Ippolito

> If you read the source code you'll see that file:read_file/1 calls
> into io:request/2 which eventually (in another process) ends up in
> file_io_server:io_request/2 and ends up reading either 128 bytes or
> 8kb at a time, doing some unicode junk, and ends up calling
> io_lib:collect_line/4 to collect each line chunk at a time.
>
 Isn't Perl interpreter going over the same or similar steps?
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Erlang read file benchmark

Bob Ippolito
On Sat, Jul 9, 2011 at 11:01 PM, Martin Dimitrov <[hidden email]> wrote:
>
>> If you read the source code you'll see that file:read_file/1 calls
>> into io:request/2 which eventually (in another process) ends up in
>> file_io_server:io_request/2 and ends up reading either 128 bytes or
>> 8kb at a time, doing some unicode junk, and ends up calling
>> io_lib:collect_line/4 to collect each line chunk at a time.
>>
>  Isn't Perl interpreter going over the same or similar steps?

No, it's not.

-bob
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Erlang read file benchmark

Evans, Matthew
In reply to this post by Evans, Matthew
I don't think anyone would disagree with the right tool for the job. My complaint is that in any sufficiently large project some form of file I/O is going to happen. If Erlang was, say 50%, slower than Perl I wouldn't care; but an order of magnitude is really off the chart. As I showed you can roll your own code using prim_file that is good enough, but most people looking at Erlang for a solution wouldn't do that. Instead they would see the linked to webpage, laugh, and never look at Erlang again.

Just food for thought

Matt

Sent from my iPhone

On Jul 10, 2011, at 2:30 AM, "Thomas Heller" <[hidden email]> wrote:

> A few years ago there was a very interesting discussion on this topic
> and some very smart people created some really smart Solutions.
>
> Google: Tim Bray and "Wide Finder"
>
> http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder
> http://www.tbray.org/ongoing/When/200x/2008/05/01/Wide-Finder-2
>
> Conclusion was that Erlang is not the best Language for this type of
> work but with some clever tricks becomes "good enough".
>
> http://www.duomark.com/erlang/proposals/gen_stream.html
>
> This has been in cooking in OTP for a while now, Jay Nelson gave a
> Talk on it which you can find here:
>
> http://www.erlang-factory.com/conference/SFBay2011/speakers/JayNelson
>
> Havent used this myself. Choose the right tool for the Job, so if you
> can write it in 20 lines of Perl, dont try writing it in 300 Lines of
> Erlang unless you really really want/need to.
>
> HTH,
> /thomas
>
>
>
> On Sat, Jul 9, 2011 at 9:15 PM, Evans, Matthew <[hidden email]> wrote:
>> Sorry if this is a duplicate email.
>>
>> I can understand Erlang being a bit slower than Perl for this. Can't see an excuse for such a difference though.
>>
>> http://agentzh.org/#ErlangFileReadLineBenchmark
>>
>> Matt
>>
>> Sent from my iPhone
>> _______________________________________________
>> erlang-questions mailing list
>> [hidden email]
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Erlang read file benchmark

Joe Armstrong-2
In reply to this post by Bob Ippolito
How large is the file you want to read lines from?

All the files I want to process are small (ie in relation to memory)
I don't think I've ever called read_line to read a file since I know
its rather slow
I call file:read_file read the entire file into a binary then chunk it
into lines
later.

/Joe


On Sun, Jul 10, 2011 at 1:33 AM, Bob Ippolito <[hidden email]> wrote:

> I think what most people want (especially for benchmarks) is something
> that doesn't care about encodings and doesn't have a lot of
> indirection. The current solution is VERY flexible, which comes at a
> severe cost to performance in this case.
>
> If you read the source code you'll see that file:read_file/1 calls
> into io:request/2 which eventually (in another process) ends up in
> file_io_server:io_request/2 and ends up reading either 128 bytes or
> 8kb at a time, doing some unicode junk, and ends up calling
> io_lib:collect_line/4 to collect each line chunk at a time.
>
> If I was trying to win a benchmark I'd probably go directly to
> prim_file, do my own buffering, and use erlang:decode_packet/3 or the
> binary module to split on the newlines. If I wanted to make a nicer
> API I'd put that in a process to manage the buffering.
>
> On Sat, Jul 9, 2011 at 4:08 PM, Kenny Stone <[hidden email]> wrote:
>> Why is it awful?
>>
>> On Sat, Jul 9, 2011 at 6:07 PM, Bob Ippolito <[hidden email]> wrote:
>>>
>>> file:read_line does some pretty awful things, I'd expect it to be very
>>> slow. That said, there should be a much faster yet still easy way to
>>> do this quickly but there isn't one baked into OTP that I know of.
>>>
>>> On Saturday, July 9, 2011, Michael Truog <[hidden email]> wrote:
>>> > He only showed the results on the command-line.  It would be nice to see
>>> > results that show runtime without the startup/teardown overhead that the
>>> > Erlang VM has, since it has a lot more going on than the perl interpreter.
>>> >  I know he briefly mentioned that the difference seemed minimal, but he
>>> > posted no results to show that.
>>> >
>>> > On 07/09/2011 12:15 PM, Evans, Matthew wrote:
>>> >> Sorry if this is a duplicate email.
>>> >>
>>> >> I can understand Erlang being a bit slower than Perl for this. Can't
>>> >> see an excuse for such a difference though.
>>> >>
>>> >> http://agentzh.org/#ErlangFileReadLineBenchmark
>>> >>
>>> >> Matt
>>> >>
>>> >> Sent from my iPhone
>>> >> _______________________________________________
>>> >> erlang-questions mailing list
>>> >> [hidden email]
>>> >> http://erlang.org/mailman/listinfo/erlang-questions
>>> >>
>>> >
>>> > _______________________________________________
>>> > erlang-questions mailing list
>>> > [hidden email]
>>> > http://erlang.org/mailman/listinfo/erlang-questions
>>> >
>>> _______________________________________________
>>> erlang-questions mailing list
>>> [hidden email]
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Loading...