Quantcast

BEAM micro-optimization: term comparison (> 0) versus pattern-matching

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

BEAM micro-optimization: term comparison (> 0) versus pattern-matching

Mikael Pettersson-5
Suppose a frequently called function returns a positive integer (always a fixnum)
and a boolean "success" or "failure" indicator (even in the failure case the return
value is significant and will be used).  Which is better from a performance
perspective:

1. Tag the return value, {success, X} or {failure, Y}, and have callers pattern-match
   on that

or,

2. Indicate failure by negating Y, and have callers match on

      X when X > 0 -> success case;
      MinusY -> failure case % MinusY =< 0 is implicit

Option 2 avoids the consing overheads of option 1, but I'm worried that the X > 0
guard may be less optimized than a plain structural pattern-match.  The last time
I checked, term-comparisons would check inline for identity, and otherwise call
the general cmp() function which would then do a lot of type-casing.

In my pure Erlang code I'm using option 1, but I'm reimplementing the function
as a NIF, and would like to avoid the consing overheads is that's indeed better.


/Mikael
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BEAM micro-optimization: term comparison (> 0) versus pattern-matching

John Doe
Just've checked, the first one is approx 2 times faster on 19.1.6
1> L = lists:seq(1, 1000000).
2> N = -1.
3> T = {error, -1}.
4> F1 = fun() -> lists:foreach(fun(_) -> case T of {success, Z} -> Z; {error, Z} -> Z end end, L) end.
5> F2 = fun() -> lists:foreach(fun(_) -> case N of X when X > 0 -> X; _ -> N end end, L) end.
6> timer:tc(F1).      
{1089950,ok}
7>      
7> timer:tc(F2).
{
1870456,ok}

2017-04-15 21:13 GMT+03:00 Mikael Pettersson <[hidden email]>:
Suppose a frequently called function returns a positive integer (always a fixnum)
and a boolean "success" or "failure" indicator (even in the failure case the return
value is significant and will be used).  Which is better from a performance
perspective:

1. Tag the return value, {success, X} or {failure, Y}, and have callers pattern-match
   on that

or,

2. Indicate failure by negating Y, and have callers match on

      X when X > 0 -> success case;
      MinusY -> failure case % MinusY =< 0 is implicit

Option 2 avoids the consing overheads of option 1, but I'm worried that the X > 0
guard may be less optimized than a plain structural pattern-match.  The last time
I checked, term-comparisons would check inline for identity, and otherwise call
the general cmp() function which would then do a lot of type-casing.

In my pure Erlang code I'm using option 1, but I'm reimplementing the function
as a NIF, and would like to avoid the consing overheads is that's indeed better.


/Mikael
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BEAM micro-optimization: term comparison (> 0) versus pattern-matching

José Valim-2
John, did you measure in the shell?

Code in the shell is evaluated and it won't give the same result as if the code would be compiled as part of a module.

Ideally you want to define a module, compile it, and then execute a function that runs the benchmarking.

José Valim
Skype: jv.ptec
Founder and Director of R&D

On Sun, Apr 16, 2017 at 1:53 AM, John Doe <[hidden email]> wrote:
Just've checked, the first one is approx 2 times faster on 19.1.6
1> L = lists:seq(1, 1000000).
2> N = -1.
3> T = {error, -1}.
4> F1 = fun() -> lists:foreach(fun(_) -> case T of {success, Z} -> Z; {error, Z} -> Z end end, L) end.
5> F2 = fun() -> lists:foreach(fun(_) -> case N of X when X > 0 -> X; _ -> N end end, L) end.
6> timer:tc(F1).      
{1089950,ok}
7>      
7> timer:tc(F2).
{
1870456,ok}

2017-04-15 21:13 GMT+03:00 Mikael Pettersson <[hidden email]>:
Suppose a frequently called function returns a positive integer (always a fixnum)
and a boolean "success" or "failure" indicator (even in the failure case the return
value is significant and will be used).  Which is better from a performance
perspective:

1. Tag the return value, {success, X} or {failure, Y}, and have callers pattern-match
   on that

or,

2. Indicate failure by negating Y, and have callers match on

      X when X > 0 -> success case;
      MinusY -> failure case % MinusY =< 0 is implicit

Option 2 avoids the consing overheads of option 1, but I'm worried that the X > 0
guard may be less optimized than a plain structural pattern-match.  The last time
I checked, term-comparisons would check inline for identity, and otherwise call
the general cmp() function which would then do a lot of type-casing.

In my pure Erlang code I'm using option 1, but I'm reimplementing the function
as a NIF, and would like to avoid the consing overheads is that's indeed better.


/Mikael
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BEAM micro-optimization: term comparison (> 0) versus pattern-matching

Mikael Pettersson-5
In reply to this post by Mikael Pettersson-5
Mikael Pettersson writes:
 > Suppose a frequently called function returns a positive integer (always a fixnum)
 > and a boolean "success" or "failure" indicator (even in the failure case the return
 > value is significant and will be used).  Which is better from a performance
 > perspective:
 >
 > 1. Tag the return value, {success, X} or {failure, Y}, and have callers pattern-match
 >    on that
 >
 > or,
 >
 > 2. Indicate failure by negating Y, and have callers match on
 >
 >       X when X > 0 -> success case;
 >       MinusY -> failure case % MinusY =< 0 is implicit
 >
 > Option 2 avoids the consing overheads of option 1, but I'm worried that the X > 0
 > guard may be less optimized than a plain structural pattern-match.  The last time
 > I checked, term-comparisons would check inline for identity, and otherwise call
 > the general cmp() function which would then do a lot of type-casing.
 >
 > In my pure Erlang code I'm using option 1, but I'm reimplementing the function
 > as a NIF, and would like to avoid the consing overheads is that's indeed better.

I put together a micro-benchmark module (appended below), and have
the following observations:

- Option 2 above is about 15-25% faster than option 1 above.
- Adding an is_integer/1 guard before the X > 0 in option 2 makes the code slower.
- A major cost in option 1 is having to do case analysis on two atoms (because the
  runtime representation of an atom isn't known until the code is loaded into a
  running VM).  However, replacing the atoms with the fixnums 0 and 1 makes the
  code much slower.
- The best version I've found so far (function 'k' in the bechmark module) eliminates
  the tags, represents one case by wrapping the integer in a cons cell [X | []], and
  the other case as the plain integer.  [X | []] is slightly better than {X} since the
  latter requires an additional load and test in the pattern-matching path.

On my desktop box (OTP-19.3, Ivy Bridge @ 3.5 GHz), typical output from the benchmark is:

> test:all(100000000).
[{f,{2564769,2}},
 {g,{1806006,2}},
 {h,{1999038,2}},
 {i,{3295790,2}},
 {j,{1779385,2}},
 {k,{1504119,2}},
 {l,{1526827,2}}]

f is option 1 above, g is option 2.  h is g with added is_integer/1 check.
i is f but with integer tags not atoms.  j wraps the two cases as [X|[]] and {Y}.
k is j but with Y left unwrapped.  l is j but swaps the order and checks with
is_integer/1 before matching on [Y|_].

/Mikael

==snip==
-module(test).
-export([all/1, f/1, g/1, h/1, i/1, j/1, k/1, l/1]).

all(N) ->
  L = [ {f, fun f/1}
      , {g, fun g/1}
      , {h, fun h/1}
      , {i, fun i/1}
      , {j, fun j/1}
      , {k, fun k/1}
      , {l, fun l/1}
      ],
  [{A, F(N)} || {A, F} <- L].

f(N) -> timer:tc(fun() -> f2(N, [], []) end).
g(N) -> timer:tc(fun() -> g2(N, [], []) end).
h(N) -> timer:tc(fun() -> h2(N, [], []) end).
i(N) -> timer:tc(fun() -> i2(N, [], []) end).
j(N) -> timer:tc(fun() -> j2(N, [], []) end).
k(N) -> timer:tc(fun() -> k2(N, [], []) end).
l(N) -> timer:tc(fun() -> l2(N, [], []) end).

%% f: tag with {success, X} or {failure, Y}

f2(0, X, _Y) -> X;
f2(N, X, Y) ->
  case f3(N) of
    {success, A} -> f2(N - 1, A, Y);
    {failure, B} -> f2(N - 1, X, B)
  end.

f3(N) ->
  case N band 1 of
    0 -> {success, N};
    _ -> {failure, N}
  end.

%% g: indicate success or failure via the sign

g2(0, X, _Y) -> X;
g2(N, X, Y) ->
  case g3(N) of
    A when A > 0 -> g2(N - 1, A, Y);
    MinusB -> g2(N - 1, X, -MinusB)
  end.

g3(N) ->
  case N band 1 of
    0 -> N;
    _ -> -N
  end.

%% h: like g, but check is_integer before checking > 0

h2(0, X, _Y) -> X;
h2(N, X, Y) ->
  case g3(N) of
    A when is_integer(A), A > 0 -> h2(N - 1, A, Y);
    MinusB -> h2(N - 1, X, -MinusB)
  end.

%% i: like f, but tag with integers 0 and 1 instead

i2(0, X, _Y) -> X;
i2(N, X, Y) ->
  case i3(N) of
    {0, A} -> i2(N - 1, A, Y);
    {1, B} -> i2(N - 1, X, B)
  end.

i3(N) ->
  case N band 1 of
    0 -> {0, N};
    _ -> {1, N}
  end.

%% j: like f but drop the tags, instead tag as [X|[]] for success or {Y} for failure

j2(0, X, _Y) -> X;
j2(N, X, Y) ->
  case j3(N) of
    [A | _] -> j2(N - 1, A, Y);
    {B} -> j2(N - 1, X, B)
  end.

j3(N) ->
  case N band 1 of
    0 -> [N | []];
    _ -> {N}
  end.

%% k: like j, but only tag the success case not the failure case

k2(0, X, _Y) -> X;
k2(N, X, Y) ->
  case k3(N) of
    [A | _] -> k2(N - 1, A, Y);
    B -> k2(N - 1, X, B)
  end.

k3(N) ->
  case N band 1 of
    0 -> [N | []];
    _ -> N
  end.

%% l: like k but the other way around: tag the failure case [Y|[]],
%% check for success with is_integer/1

l2(0, X, _Y) -> X;
l2(N, X, Y) ->
  case l3(N) of
    A when is_integer(A) -> l2(N - 1, A, Y);
    [B | _] -> l2(N - 1, X, B)
  end.

l3(N) ->
  case N band 1 of
    0 -> N;
    _ -> [N | []]
  end.
==snip==
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BEAM micro-optimization: term comparison (> 0) versus pattern-matching

John Doe
In reply to this post by José Valim-2
You are right. In the compiled code comparison is slightly faster than matching.

-module(test).

-export([test/2]).

test(N, T) ->
 L = lists:seq(1, 10000000),
 F1 = fun() -> lists:foreach(fun(_) -> case T of {success, Z}  
-> Z; {error, Z} -> Z end end, L) end,
 erlang:display(timer:tc(F1)),
 F2 = fun() -> lists:foreach(fun(_) -> case N of X when X > 0  
-> X; _ -> N end end, L) end,
 erlang:display(timer:tc(F2)).

1> c(test).
{ok,test}
2> test:test(-1, {error, -1}).
{292472,ok}
{250206,ok}








2017-04-16 13:58 GMT+03:00 José Valim <[hidden email]>:
John, did you measure in the shell?

Code in the shell is evaluated and it won't give the same result as if the code would be compiled as part of a module.

Ideally you want to define a module, compile it, and then execute a function that runs the benchmarking.

José Valim
Skype: jv.ptec
Founder and Director of R&D

On Sun, Apr 16, 2017 at 1:53 AM, John Doe <[hidden email]> wrote:
Just've checked, the first one is approx 2 times faster on 19.1.6
1> L = lists:seq(1, 1000000).
2> N = -1.
3> T = {error, -1}.
4> F1 = fun() -> lists:foreach(fun(_) -> case T of {success, Z} -> Z; {error, Z} -> Z end end, L) end.
5> F2 = fun() -> lists:foreach(fun(_) -> case N of X when X > 0 -> X; _ -> N end end, L) end.
6> timer:tc(F1).      
{1089950,ok}
7>      
7> timer:tc(F2).
{
1870456,ok}

2017-04-15 21:13 GMT+03:00 Mikael Pettersson <[hidden email]>:
Suppose a frequently called function returns a positive integer (always a fixnum)
and a boolean "success" or "failure" indicator (even in the failure case the return
value is significant and will be used).  Which is better from a performance
perspective:

1. Tag the return value, {success, X} or {failure, Y}, and have callers pattern-match
   on that

or,

2. Indicate failure by negating Y, and have callers match on

      X when X > 0 -> success case;
      MinusY -> failure case % MinusY =< 0 is implicit

Option 2 avoids the consing overheads of option 1, but I'm worried that the X > 0
guard may be less optimized than a plain structural pattern-match.  The last time
I checked, term-comparisons would check inline for identity, and otherwise call
the general cmp() function which would then do a lot of type-casing.

In my pure Erlang code I'm using option 1, but I'm reimplementing the function
as a NIF, and would like to avoid the consing overheads is that's indeed better.


/Mikael
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions




_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: BEAM micro-optimization: term comparison (> 0) versus pattern-matching

Richard A. O'Keefe-2
In reply to this post by Mikael Pettersson-5

> On 16/04/2017, at 6:13 AM, Mikael Pettersson <[hidden email]> wrote:
>
> Suppose a frequently called function returns a positive integer (always a fixnum)
> and a boolean "success" or "failure" indicator (even in the failure case the return
> value is significant and will be used).  Which is better from a performance
> perspective:

Measure it.  I will be astonished if it makes any practical difference.
You are probably likely to get more benefit from inlining, or caching,
or somehow reducing the number of calls.

>
> 1. Tag the return value, {success, X} or {failure, Y}, and have callers pattern-match
>   on that
>
> or,
>
> 2. Indicate failure by negating Y, and have callers match on
>
>      X when X > 0 -> success case;
>      MinusY -> failure case % MinusY =< 0 is implicit

You have omitted option 3:

   frequently_called(Inputs, Success, Failure) ->
      ... success(Fixnum) ...
    ;
      ... failure(Other_Fixnum) ...

And option 4:

  - Have the success case(s) return the usual number.
  - Have the failure case(s) raise an exception containing the other number.

And option 5:
  - Have the success case return the usual number.
  - Have the failure case return {error,X} or {X} or something else

And options 6 to 8 that I haven't thought of yet...

>
> Option 2 avoids the consing overheads of option 1, but I'm worried that the X > 0
> guard may be less optimized than a plain structural pattern-match.  The last time
> I checked, term-comparisons would check inline for identity, and otherwise call
> the general cmp() function which would then do a lot of type-casing.

I am much more worried about the possibility of things going wrong.
I'm worried about returning zero or the wrong sign.
I'm worried about making it harder to get the most out of the Dialyzer.

Note that what is optimised and how well is something that *changes* over time.

Just to give you an idea of what's possible here, an optimisation I looked into
(for a different language) once goes like this:

If there are many calls to f(...) in contexts like
case f(...) of {t1,...} -> ... | {tn,...} -> ... end
then make two copies of f.  In one of the copies,
... {tk,E1,...,Ex}
turns into R1,...,Rx := E1,...,Ex; return k
and the case turns into
    switch (f(...)) { ... case k: /* data are in R1,...,Rx */ ... }
This also applies to
   {tk,X1,...,Xx} = f(...)
which is a degenerate 'case',
Use the other copy when the call is not in such a context.
This was inspired by the compiler for Janus, IIRC and *partly*
motivated by multiple value returns in Common Lisp and Scheme.

I'm not saying that this will ever be done by an Erlang compiler,
but it *could* be done by an Erlang compiler, and it would make the
tuple version the cheapest available.

> In my pure Erlang code I'm using option 1, but I'm reimplementing the function
> as a NIF, and would like to avoid the consing overheads is that's indeed better.

I presume that under "consing overheads" you are including the cost of
cleaning up the dead tuples in garbage collection.

But seriously, there is no substitute for measurement.
If the function is costly enough to be worth reimplementing as a NIF,
this aspect of it is likely to be quite unimportant, but only
measurement can tell you for sure.

You have given us no idea what this function does.
Have you profiled the arguments it is given to see if it is often
called with similar enough arguments to make some kind of caching
attractive?


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Loading...