Performance question

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Performance question

Stu Bailey
I found 

binary:replace(BinChunk,<<"\n">>,<<>>,[global]).

significantly slower than 

remove_pattern(BinChunk,<<>>,<<"\n">>).

with

remove_pattern(<<>>,Acc,_BinPat) ->
    Acc;
remove_pattern(Bin,Acc,BinPat)->
    <<Byte:1/binary,Rest/binary>> = Bin,
    case Byte == BinPat of
true -> remove_pattern(Rest,Acc,BinPat);
false -> remove_pattern(Rest,<<Acc/binary,Byte/binary>>,BinPat)
    end.

That was surprising to me.  The built-in binary:replace() was much much slower for larger BinChunk  with lots of <<"\n">> sprinkled through.

Thoughts?

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Performance question

Tyler Margison
I'm not sure if this would affect performance or not, but I believe binary:replace looks at variable-sized patterns whereas your remove_pattern function only work for one-byte patterns. What if the pattern is larger than 1-byte in length? It definitely works for <<"\n">> but I don't think it would work for <<"\r\n">>. Perhaps there is a performance degradation when calculating pattern length? Again, I'm not sure if that's part of the performance problem, but the two functions you propose are not quite synonymous.

On Thu, Nov 6, 2014 at 5:33 PM, Stu Bailey <[hidden email]> wrote:
I found 

binary:replace(BinChunk,<<"\n">>,<<>>,[global]).

significantly slower than 

remove_pattern(BinChunk,<<>>,<<"\n">>).

with

remove_pattern(<<>>,Acc,_BinPat) ->
    Acc;
remove_pattern(Bin,Acc,BinPat)->
    <<Byte:1/binary,Rest/binary>> = Bin,
    case Byte == BinPat of
true -> remove_pattern(Rest,Acc,BinPat);
false -> remove_pattern(Rest,<<Acc/binary,Byte/binary>>,BinPat)
    end.

That was surprising to me.  The built-in binary:replace() was much much slower for larger BinChunk  with lots of <<"\n">> sprinkled through.

Thoughts?

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Performance question

Loïc Hoguin-3
In reply to this post by Stu Bailey
binary:split and binary:replace, unlike other functions of the binary
module, are normal Erlang functions. They also process a list of options
before doing the actual work, so there's an obvious overhead compared to
not doing that. In addition as has been pointed out, your code is more
specialized so that helps too.

On 11/07/2014 03:33 AM, Stu Bailey wrote:

> I found
>
> binary:replace(BinChunk,<<"\n">>,<<>>,[global]).
>
> /significantly /slower than
>
> remove_pattern(BinChunk,<<>>,<<"\n">>).
>
> with
>
> remove_pattern(<<>>,Acc,_BinPat) ->
>      Acc;
> remove_pattern(Bin,Acc,BinPat)->
>      <<Byte:1/binary,Rest/binary>> = Bin,
>      case Byte == BinPat of
> true -> remove_pattern(Rest,Acc,BinPat);
> false -> remove_pattern(Rest,<<Acc/binary,Byte/binary>>,BinPat)
>      end.
>
> That was surprising to me.  The built-in binary:replace() was much much
> slower for larger BinChunk with lots of <<"\n">> sprinkled through.
>
> Thoughts?
>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>

--
Loïc Hoguin
http://ninenines.eu
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Performance question

Stu Bailey
I'm not planning to spend a lot of time on this right now, but the binary:replace(...) was chewing a tremendous amount of system time CPU load (and actually never finished before I got frustrated and killed it) and my function was reporting the CPU load as 99% user time (not system time) and finished in a reasonable time.   I assume the high system time usage for binary:replace(..)  is because binary:replace(...) is doing something manic with system calls for memory management or something?


On Fri, Nov 7, 2014 at 1:44 AM, Loïc Hoguin <[hidden email]> wrote:
binary:split and binary:replace, unlike other functions of the binary module, are normal Erlang functions. They also process a list of options before doing the actual work, so there's an obvious overhead compared to not doing that. In addition as has been pointed out, your code is more specialized so that helps too.

On 11/07/2014 03:33 AM, Stu Bailey wrote:
I found

binary:replace(BinChunk,<<"\n">>,<<>>,[global]).

/significantly /slower than

remove_pattern(BinChunk,<<>>,<<"\n">>).

with

remove_pattern(<<>>,Acc,_BinPat) ->
     Acc;
remove_pattern(Bin,Acc,BinPat)->
     <<Byte:1/binary,Rest/binary>> = Bin,
     case Byte == BinPat of
true -> remove_pattern(Rest,Acc,BinPat);
false -> remove_pattern(Rest,<<Acc/binary,Byte/binary>>,BinPat)
     end.

That was surprising to me.  The built-in binary:replace() was much much
slower for larger BinChunk with lots of <<"\n">> sprinkled through.

Thoughts?


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


--
Loïc Hoguin
http://ninenines.eu


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Performance question

Stu Bailey
FYI,  if you want to try to replicate it, I was processing ~80 chunks of binary where each chunk was about ~250,000,000 bytes.  I think you'll see the difference on just one chunk.  I happen to running on a 8-core MacBook Pro with 16GB Ram and therefore spawned a process per chunk to grab all the resources on all the cores.   With the hand written function, it worked like a charm...yay Erlang! :-)  I love seeing a few lines of code effectively use all processing power available.  Heats the machine up quite a bit too. :-)

On Fri, Nov 7, 2014 at 9:22 AM, Stu Bailey <[hidden email]> wrote:
I'm not planning to spend a lot of time on this right now, but the binary:replace(...) was chewing a tremendous amount of system time CPU load (and actually never finished before I got frustrated and killed it) and my function was reporting the CPU load as 99% user time (not system time) and finished in a reasonable time.   I assume the high system time usage for binary:replace(..)  is because binary:replace(...) is doing something manic with system calls for memory management or something?


On Fri, Nov 7, 2014 at 1:44 AM, Loïc Hoguin <[hidden email]> wrote:
binary:split and binary:replace, unlike other functions of the binary module, are normal Erlang functions. They also process a list of options before doing the actual work, so there's an obvious overhead compared to not doing that. In addition as has been pointed out, your code is more specialized so that helps too.

On 11/07/2014 03:33 AM, Stu Bailey wrote:
I found

binary:replace(BinChunk,<<"\n">>,<<>>,[global]).

/significantly /slower than

remove_pattern(BinChunk,<<>>,<<"\n">>).

with

remove_pattern(<<>>,Acc,_BinPat) ->
     Acc;
remove_pattern(Bin,Acc,BinPat)->
     <<Byte:1/binary,Rest/binary>> = Bin,
     case Byte == BinPat of
true -> remove_pattern(Rest,Acc,BinPat);
false -> remove_pattern(Rest,<<Acc/binary,Byte/binary>>,BinPat)
     end.

That was surprising to me.  The built-in binary:replace() was much much
slower for larger BinChunk with lots of <<"\n">> sprinkled through.

Thoughts?


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


--
Loïc Hoguin
http://ninenines.eu



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Performance question

Loïc Hoguin-3
Based on the code at

https://github.com/erlang/otp/blob/maint/lib/stdlib/src/binary.erl#L268

It does a lot of splitting, and then a lot more splitting, and then call
iolist_to_binary. It looks very inefficient.

Your solution is the fastest way to do it. You also benefit from match
context optimization and so your code is very fast. The only thing that
could make it faster is if memory was allocated only once for the
resulting binary (instead of realloc a few times)... but maybe there's
already an optimization like this?

On 11/07/2014 07:33 PM, Stu Bailey wrote:

> FYI,  if you want to try to replicate it, I was processing ~80 chunks of
> binary where each chunk was about ~250,000,000 bytes.  I think you'll
> see the difference on just one chunk.  I happen to running on a 8-core
> MacBook Pro with 16GB Ram and therefore spawned a process per chunk to
> grab all the resources on all the cores.   With the hand written
> function, it worked like a charm...yay Erlang! :-)  I love seeing a few
> lines of code effectively use all processing power available.  Heats the
> machine up quite a bit too. :-)
>
> On Fri, Nov 7, 2014 at 9:22 AM, Stu Bailey <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     I'm not planning to spend a lot of time on this right now, but the
>     binary:replace(...) was chewing a tremendous amount of system time
>     CPU load (and actually never finished before I got frustrated and
>     killed it) and my function was reporting the CPU load as 99% user
>     time (not system time) and finished in a reasonable time.   I assume
>     the high system time usage for binary:replace(..)  is because
>     binary:replace(...) is doing something manic with system calls for
>     memory management or something?
>
>
>     On Fri, Nov 7, 2014 at 1:44 AM, Loïc Hoguin <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         binary:split and binary:replace, unlike other functions of the
>         binary module, are normal Erlang functions. They also process a
>         list of options before doing the actual work, so there's an
>         obvious overhead compared to not doing that. In addition as has
>         been pointed out, your code is more specialized so that helps too.
>
>         On 11/07/2014 03:33 AM, Stu Bailey wrote:
>
>             I found
>
>             binary:replace(BinChunk,<<"\n"__>>,<<>>,[global]).
>
>             /significantly /slower than
>
>             remove_pattern(BinChunk,<<>>,<__<"\n">>).
>
>             with
>
>             remove_pattern(<<>>,Acc,___BinPat) ->
>                   Acc;
>             remove_pattern(Bin,Acc,BinPat)__->
>                   <<Byte:1/binary,Rest/binary>> = Bin,
>                   case Byte == BinPat of
>             true -> remove_pattern(Rest,Acc,__BinPat);
>             false ->
>             remove_pattern(Rest,<<Acc/__binary,Byte/binary>>,BinPat)
>                   end.
>
>             That was surprising to me.  The built-in binary:replace()
>             was much much
>             slower for larger BinChunk with lots of <<"\n">> sprinkled
>             through.
>
>             Thoughts?
>
>
>             _________________________________________________
>             erlang-questions mailing list
>             [hidden email] <mailto:[hidden email]>
>             http://erlang.org/mailman/__listinfo/erlang-questions
>             <http://erlang.org/mailman/listinfo/erlang-questions>
>
>
>         --
>         Loïc Hoguin
>         http://ninenines.eu
>
>
>

--
Loïc Hoguin
http://ninenines.eu
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Performance question

Stu Bailey
Thank you for the feedback.  That's very helpful.


On Fri, Nov 7, 2014 at 9:44 AM, Loïc Hoguin <[hidden email]> wrote:
Based on the code at

https://github.com/erlang/otp/blob/maint/lib/stdlib/src/binary.erl#L268

It does a lot of splitting, and then a lot more splitting, and then call iolist_to_binary. It looks very inefficient.

Your solution is the fastest way to do it. You also benefit from match context optimization and so your code is very fast. The only thing that could make it faster is if memory was allocated only once for the resulting binary (instead of realloc a few times)... but maybe there's already an optimization like this?

On 11/07/2014 07:33 PM, Stu Bailey wrote:
FYI,  if you want to try to replicate it, I was processing ~80 chunks of
binary where each chunk was about ~250,000,000 bytes.  I think you'll
see the difference on just one chunk.  I happen to running on a 8-core
MacBook Pro with 16GB Ram and therefore spawned a process per chunk to
grab all the resources on all the cores.   With the hand written
function, it worked like a charm...yay Erlang! :-)  I love seeing a few
lines of code effectively use all processing power available.  Heats the
machine up quite a bit too. :-)

On Fri, Nov 7, 2014 at 9:22 AM, Stu Bailey <[hidden email]
<mailto:[hidden email]>> wrote:

    I'm not planning to spend a lot of time on this right now, but the
    binary:replace(...) was chewing a tremendous amount of system time
    CPU load (and actually never finished before I got frustrated and
    killed it) and my function was reporting the CPU load as 99% user
    time (not system time) and finished in a reasonable time.   I assume
    the high system time usage for binary:replace(..)  is because
    binary:replace(...) is doing something manic with system calls for
    memory management or something?


    On Fri, Nov 7, 2014 at 1:44 AM, Loïc Hoguin <[hidden email]
    <mailto:[hidden email]>> wrote:

        binary:split and binary:replace, unlike other functions of the
        binary module, are normal Erlang functions. They also process a
        list of options before doing the actual work, so there's an
        obvious overhead compared to not doing that. In addition as has
        been pointed out, your code is more specialized so that helps too.

        On 11/07/2014 03:33 AM, Stu Bailey wrote:

            I found

            binary:replace(BinChunk,<<"\n"__>>,<<>>,[global]).

            /significantly /slower than

            remove_pattern(BinChunk,<<>>,<__<"\n">>).

            with

            remove_pattern(<<>>,Acc,___BinPat) ->
                  Acc;
            remove_pattern(Bin,Acc,BinPat)__->
                  <<Byte:1/binary,Rest/binary>> = Bin,
                  case Byte == BinPat of
            true -> remove_pattern(Rest,Acc,__BinPat);
            false ->
            remove_pattern(Rest,<<Acc/__binary,Byte/binary>>,BinPat)
                  end.

            That was surprising to me.  The built-in binary:replace()
            was much much
            slower for larger BinChunk with lots of <<"\n">> sprinkled
            through.

            Thoughts?


            _________________________________________________
            erlang-questions mailing list
            [hidden email] <mailto:[hidden email]>
            http://erlang.org/mailman/__listinfo/erlang-questions
            <http://erlang.org/mailman/listinfo/erlang-questions>


        --
        Loïc Hoguin
        http://ninenines.eu




--
Loïc Hoguin
http://ninenines.eu


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Performance question

dmkolesnikov
Hello,

The problem is ‘global’ here. It turns out that “processing” of large binary chunk at once is not very efficient using binary:replace or binary:split. The “custom” algorithm works better. Like you’ve shown this to us.

However, If you are pipelining the binary operation with something else (e.g. parsing, counting, whats-not) then binary bif’s wins. I am not able to speak about binary:replace, I’ve never used that at my code but binary:split without global rocks. It was capable to out-perform a custom accumulator functions.  

Best Regards, 
Dmitry


On 07 Nov 2014, at 19:50, Stu Bailey <[hidden email]> wrote:

Thank you for the feedback.  That's very helpful.


On Fri, Nov 7, 2014 at 9:44 AM, Loïc Hoguin <[hidden email]> wrote:
Based on the code at

https://github.com/erlang/otp/blob/maint/lib/stdlib/src/binary.erl#L268

It does a lot of splitting, and then a lot more splitting, and then call iolist_to_binary. It looks very inefficient.

Your solution is the fastest way to do it. You also benefit from match context optimization and so your code is very fast. The only thing that could make it faster is if memory was allocated only once for the resulting binary (instead of realloc a few times)... but maybe there's already an optimization like this?

On 11/07/2014 07:33 PM, Stu Bailey wrote:
FYI,  if you want to try to replicate it, I was processing ~80 chunks of
binary where each chunk was about ~250,000,000 bytes.  I think you'll
see the difference on just one chunk.  I happen to running on a 8-core
MacBook Pro with 16GB Ram and therefore spawned a process per chunk to
grab all the resources on all the cores.   With the hand written
function, it worked like a charm...yay Erlang! :-)  I love seeing a few
lines of code effectively use all processing power available.  Heats the
machine up quite a bit too. :-)

On Fri, Nov 7, 2014 at 9:22 AM, Stu Bailey <[hidden email]
<mailto:[hidden email]>> wrote:

    I'm not planning to spend a lot of time on this right now, but the
    binary:replace(...) was chewing a tremendous amount of system time
    CPU load (and actually never finished before I got frustrated and
    killed it) and my function was reporting the CPU load as 99% user
    time (not system time) and finished in a reasonable time.   I assume
    the high system time usage for binary:replace(..)  is because
    binary:replace(...) is doing something manic with system calls for
    memory management or something?


    On Fri, Nov 7, 2014 at 1:44 AM, Loïc Hoguin <[hidden email]
    <mailto:[hidden email]>> wrote:

        binary:split and binary:replace, unlike other functions of the
        binary module, are normal Erlang functions. They also process a
        list of options before doing the actual work, so there's an
        obvious overhead compared to not doing that. In addition as has
        been pointed out, your code is more specialized so that helps too.

        On 11/07/2014 03:33 AM, Stu Bailey wrote:

            I found

            binary:replace(BinChunk,<<"\n"__>>,<<>>,[global]).

            /significantly /slower than

            remove_pattern(BinChunk,<<>>,<__<"\n">>).

            with

            remove_pattern(<<>>,Acc,___BinPat) ->
                  Acc;
            remove_pattern(Bin,Acc,BinPat)__->
                  <<Byte:1/binary,Rest/binary>> = Bin,
                  case Byte == BinPat of
            true -> remove_pattern(Rest,Acc,__BinPat);
            false ->
            remove_pattern(Rest,<<Acc/__binary,Byte/binary>>,BinPat)
                  end.

            That was surprising to me.  The built-in binary:replace()
            was much much
            slower for larger BinChunk with lots of <<"\n">> sprinkled
            through.

            Thoughts?


            _________________________________________________
            erlang-questions mailing list
            [hidden email] <mailto:[hidden email]>
            http://erlang.org/mailman/__listinfo/erlang-questions
            <http://erlang.org/mailman/listinfo/erlang-questions>


        --
        Loïc Hoguin
        http://ninenines.eu




--
Loïc Hoguin
http://ninenines.eu

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions