|
Sorry if this is a duplicate email.
I can understand Erlang being a bit slower than Perl for this. Can't see an excuse for such a difference though. http://agentzh.org/#ErlangFileReadLineBenchmark Matt Sent from my iPhone _______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions |
|
He only showed the results on the command-line. It would be nice to see results that show runtime without the startup/teardown overhead that the Erlang VM has, since it has a lot more going on than the perl interpreter. I know he briefly mentioned that the difference seemed minimal, but he posted no results to show that.
On 07/09/2011 12:15 PM, Evans, Matthew wrote: > Sorry if this is a duplicate email. > > I can understand Erlang being a bit slower than Perl for this. Can't see an excuse for such a difference though. > > http://agentzh.org/#ErlangFileReadLineBenchmark > > Matt > > Sent from my iPhone > _______________________________________________ > erlang-questions mailing list > [hidden email] > http://erlang.org/mailman/listinfo/erlang-questions > _______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions |
|
file:read_line does some pretty awful things, I'd expect it to be very
slow. That said, there should be a much faster yet still easy way to do this quickly but there isn't one baked into OTP that I know of. On Saturday, July 9, 2011, Michael Truog <[hidden email]> wrote: > He only showed the results on the command-line. It would be nice to see results that show runtime without the startup/teardown overhead that the Erlang VM has, since it has a lot more going on than the perl interpreter. I know he briefly mentioned that the difference seemed minimal, but he posted no results to show that. > > On 07/09/2011 12:15 PM, Evans, Matthew wrote: >> Sorry if this is a duplicate email. >> >> I can understand Erlang being a bit slower than Perl for this. Can't see an excuse for such a difference though. >> >> http://agentzh.org/#ErlangFileReadLineBenchmark >> >> Matt >> >> Sent from my iPhone >> _______________________________________________ >> erlang-questions mailing list >> [hidden email] >> http://erlang.org/mailman/listinfo/erlang-questions >> > > _______________________________________________ > erlang-questions mailing list > [hidden email] > http://erlang.org/mailman/listinfo/erlang-questions > erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions |
|
Why is it awful?
On Sat, Jul 9, 2011 at 6:07 PM, Bob Ippolito <[hidden email]> wrote: file:read_line does some pretty awful things, I'd expect it to be very _______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions |
|
I think what most people want (especially for benchmarks) is something
that doesn't care about encodings and doesn't have a lot of indirection. The current solution is VERY flexible, which comes at a severe cost to performance in this case. If you read the source code you'll see that file:read_file/1 calls into io:request/2 which eventually (in another process) ends up in file_io_server:io_request/2 and ends up reading either 128 bytes or 8kb at a time, doing some unicode junk, and ends up calling io_lib:collect_line/4 to collect each line chunk at a time. If I was trying to win a benchmark I'd probably go directly to prim_file, do my own buffering, and use erlang:decode_packet/3 or the binary module to split on the newlines. If I wanted to make a nicer API I'd put that in a process to manage the buffering. On Sat, Jul 9, 2011 at 4:08 PM, Kenny Stone <[hidden email]> wrote: > Why is it awful? > > On Sat, Jul 9, 2011 at 6:07 PM, Bob Ippolito <[hidden email]> wrote: >> >> file:read_line does some pretty awful things, I'd expect it to be very >> slow. That said, there should be a much faster yet still easy way to >> do this quickly but there isn't one baked into OTP that I know of. >> >> On Saturday, July 9, 2011, Michael Truog <[hidden email]> wrote: >> > He only showed the results on the command-line. It would be nice to see >> > results that show runtime without the startup/teardown overhead that the >> > Erlang VM has, since it has a lot more going on than the perl interpreter. >> > I know he briefly mentioned that the difference seemed minimal, but he >> > posted no results to show that. >> > >> > On 07/09/2011 12:15 PM, Evans, Matthew wrote: >> >> Sorry if this is a duplicate email. >> >> >> >> I can understand Erlang being a bit slower than Perl for this. Can't >> >> see an excuse for such a difference though. >> >> >> >> http://agentzh.org/#ErlangFileReadLineBenchmark >> >> >> >> Matt >> >> >> >> Sent from my iPhone >> >> _______________________________________________ >> >> erlang-questions mailing list >> >> [hidden email] >> >> http://erlang.org/mailman/listinfo/erlang-questions >> >> >> > >> > _______________________________________________ >> > erlang-questions mailing list >> > [hidden email] >> > http://erlang.org/mailman/listinfo/erlang-questions >> > >> _______________________________________________ >> erlang-questions mailing list >> [hidden email] >> http://erlang.org/mailman/listinfo/erlang-questions > > erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions |
|
Using prim_file and the binary module produces:
[mevans@scream ~]$ time perl reader.pl Found 6032291 lines. real 0m1.368s user 0m1.295s sys 0m0.068s [mevans@scream ~]$ time erl -noshell -s reader main Found 6048490 lines. real 0m2.090s user 0m1.965s sys 0m0.111s The Erlang code is reporting more lines because I am double counting cases where a line is split between reads. Of course, this would be easy to handle - I just wanted a proof of concept : -module(reader). -export([main/0]). -define(NL,10). main() -> {ok, File} = prim_file:open("data.log", [read,binary]), Lines = count_lines(File, 0), io:format("Found ~w lines.~n", [Lines]), halt(). count_lines(File, Count) -> case prim_file:read(File,8192) of {ok, Line} -> TC = length(binary:split(Line,[<<10>>],[global])), count_lines(File, Count+TC); _ -> Count end. ________________________________________ From: [hidden email] [[hidden email]] On Behalf Of Bob Ippolito [[hidden email]] Sent: Saturday, July 09, 2011 7:33 PM To: Kenny Stone Cc: erlang-questions Questions Subject: Re: [erlang-questions] Erlang read file benchmark I think what most people want (especially for benchmarks) is something that doesn't care about encodings and doesn't have a lot of indirection. The current solution is VERY flexible, which comes at a severe cost to performance in this case. If you read the source code you'll see that file:read_file/1 calls into io:request/2 which eventually (in another process) ends up in file_io_server:io_request/2 and ends up reading either 128 bytes or 8kb at a time, doing some unicode junk, and ends up calling io_lib:collect_line/4 to collect each line chunk at a time. If I was trying to win a benchmark I'd probably go directly to prim_file, do my own buffering, and use erlang:decode_packet/3 or the binary module to split on the newlines. If I wanted to make a nicer API I'd put that in a process to manage the buffering. On Sat, Jul 9, 2011 at 4:08 PM, Kenny Stone <[hidden email]> wrote: > Why is it awful? > > On Sat, Jul 9, 2011 at 6:07 PM, Bob Ippolito <[hidden email]> wrote: >> >> file:read_line does some pretty awful things, I'd expect it to be very >> slow. That said, there should be a much faster yet still easy way to >> do this quickly but there isn't one baked into OTP that I know of. >> >> On Saturday, July 9, 2011, Michael Truog <[hidden email]> wrote: >> > He only showed the results on the command-line. It would be nice to see >> > results that show runtime without the startup/teardown overhead that the >> > Erlang VM has, since it has a lot more going on than the perl interpreter. >> > I know he briefly mentioned that the difference seemed minimal, but he >> > posted no results to show that. >> > >> > On 07/09/2011 12:15 PM, Evans, Matthew wrote: >> >> Sorry if this is a duplicate email. >> >> >> >> I can understand Erlang being a bit slower than Perl for this. Can't >> >> see an excuse for such a difference though. >> >> >> >> http://agentzh.org/#ErlangFileReadLineBenchmark >> >> >> >> Matt >> >> >> >> Sent from my iPhone >> >> _______________________________________________ >> >> erlang-questions mailing list >> >> [hidden email] >> >> http://erlang.org/mailman/listinfo/erlang-questions >> >> >> > >> > _______________________________________________ >> > erlang-questions mailing list >> > [hidden email] >> > http://erlang.org/mailman/listinfo/erlang-questions >> > >> _______________________________________________ >> erlang-questions mailing list >> [hidden email] >> http://erlang.org/mailman/listinfo/erlang-questions > > erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions _______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions |
|
In reply to this post by Bob Ippolito
> If you read the source code you'll see that file:read_file/1 calls > into io:request/2 which eventually (in another process) ends up in > file_io_server:io_request/2 and ends up reading either 128 bytes or > 8kb at a time, doing some unicode junk, and ends up calling > io_lib:collect_line/4 to collect each line chunk at a time. > Isn't Perl interpreter going over the same or similar steps? _______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions |
|
On Sat, Jul 9, 2011 at 11:01 PM, Martin Dimitrov <[hidden email]> wrote:
> >> If you read the source code you'll see that file:read_file/1 calls >> into io:request/2 which eventually (in another process) ends up in >> file_io_server:io_request/2 and ends up reading either 128 bytes or >> 8kb at a time, doing some unicode junk, and ends up calling >> io_lib:collect_line/4 to collect each line chunk at a time. >> > Isn't Perl interpreter going over the same or similar steps? No, it's not. -bob _______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions |
|
In reply to this post by Evans, Matthew
I don't think anyone would disagree with the right tool for the job. My complaint is that in any sufficiently large project some form of file I/O is going to happen. If Erlang was, say 50%, slower than Perl I wouldn't care; but an order of magnitude is really off the chart. As I showed you can roll your own code using prim_file that is good enough, but most people looking at Erlang for a solution wouldn't do that. Instead they would see the linked to webpage, laugh, and never look at Erlang again.
Just food for thought Matt Sent from my iPhone On Jul 10, 2011, at 2:30 AM, "Thomas Heller" <[hidden email]> wrote: > A few years ago there was a very interesting discussion on this topic > and some very smart people created some really smart Solutions. > > Google: Tim Bray and "Wide Finder" > > http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder > http://www.tbray.org/ongoing/When/200x/2008/05/01/Wide-Finder-2 > > Conclusion was that Erlang is not the best Language for this type of > work but with some clever tricks becomes "good enough". > > http://www.duomark.com/erlang/proposals/gen_stream.html > > This has been in cooking in OTP for a while now, Jay Nelson gave a > Talk on it which you can find here: > > http://www.erlang-factory.com/conference/SFBay2011/speakers/JayNelson > > Havent used this myself. Choose the right tool for the Job, so if you > can write it in 20 lines of Perl, dont try writing it in 300 Lines of > Erlang unless you really really want/need to. > > HTH, > /thomas > > > > On Sat, Jul 9, 2011 at 9:15 PM, Evans, Matthew <[hidden email]> wrote: >> Sorry if this is a duplicate email. >> >> I can understand Erlang being a bit slower than Perl for this. Can't see an excuse for such a difference though. >> >> http://agentzh.org/#ErlangFileReadLineBenchmark >> >> Matt >> >> Sent from my iPhone >> _______________________________________________ >> erlang-questions mailing list >> [hidden email] >> http://erlang.org/mailman/listinfo/erlang-questions >> erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions |
|
In reply to this post by Bob Ippolito
How large is the file you want to read lines from?
All the files I want to process are small (ie in relation to memory) I don't think I've ever called read_line to read a file since I know its rather slow I call file:read_file read the entire file into a binary then chunk it into lines later. /Joe On Sun, Jul 10, 2011 at 1:33 AM, Bob Ippolito <[hidden email]> wrote: > I think what most people want (especially for benchmarks) is something > that doesn't care about encodings and doesn't have a lot of > indirection. The current solution is VERY flexible, which comes at a > severe cost to performance in this case. > > If you read the source code you'll see that file:read_file/1 calls > into io:request/2 which eventually (in another process) ends up in > file_io_server:io_request/2 and ends up reading either 128 bytes or > 8kb at a time, doing some unicode junk, and ends up calling > io_lib:collect_line/4 to collect each line chunk at a time. > > If I was trying to win a benchmark I'd probably go directly to > prim_file, do my own buffering, and use erlang:decode_packet/3 or the > binary module to split on the newlines. If I wanted to make a nicer > API I'd put that in a process to manage the buffering. > > On Sat, Jul 9, 2011 at 4:08 PM, Kenny Stone <[hidden email]> wrote: >> Why is it awful? >> >> On Sat, Jul 9, 2011 at 6:07 PM, Bob Ippolito <[hidden email]> wrote: >>> >>> file:read_line does some pretty awful things, I'd expect it to be very >>> slow. That said, there should be a much faster yet still easy way to >>> do this quickly but there isn't one baked into OTP that I know of. >>> >>> On Saturday, July 9, 2011, Michael Truog <[hidden email]> wrote: >>> > He only showed the results on the command-line. It would be nice to see >>> > results that show runtime without the startup/teardown overhead that the >>> > Erlang VM has, since it has a lot more going on than the perl interpreter. >>> > I know he briefly mentioned that the difference seemed minimal, but he >>> > posted no results to show that. >>> > >>> > On 07/09/2011 12:15 PM, Evans, Matthew wrote: >>> >> Sorry if this is a duplicate email. >>> >> >>> >> I can understand Erlang being a bit slower than Perl for this. Can't >>> >> see an excuse for such a difference though. >>> >> >>> >> http://agentzh.org/#ErlangFileReadLineBenchmark >>> >> >>> >> Matt >>> >> >>> >> Sent from my iPhone >>> >> _______________________________________________ >>> >> erlang-questions mailing list >>> >> [hidden email] >>> >> http://erlang.org/mailman/listinfo/erlang-questions >>> >> >>> > >>> > _______________________________________________ >>> > erlang-questions mailing list >>> > [hidden email] >>> > http://erlang.org/mailman/listinfo/erlang-questions >>> > >>> _______________________________________________ >>> erlang-questions mailing list >>> [hidden email] >>> http://erlang.org/mailman/listinfo/erlang-questions >> >> > _______________________________________________ > erlang-questions mailing list > [hidden email] > http://erlang.org/mailman/listinfo/erlang-questions > erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions |
| Powered by Nabble | Edit this page |
