How does a match context of an erlang bitstring work?

classic Classic list List threaded Threaded
2 messages Options
ami
Reply | Threaded
Open this post in threaded view
|

How does a match context of an erlang bitstring work?

ami
Hi all

I'd read [efficiency guide][1] and [erlang-questions mailing list archive][2] &
all of the available books in erlang. But I haven't found the precise description of efficient
binaries pattern matching. Though, I haven't read sources yet :) But I hope
that people, who already have read them, would read this post. Here are my questions.


 1. How many match contexts does an erlang binary have?

 a) if we match parts of a binary sequentially and just once

        A = <<1,2,3,4>>.
        <<A1,A2,A3,A4>> = A.

 Do we have just one binary match context(moving from the beginning of A to the end), or four?

 b) if we match parts of a binary sequentially from the beginning to the end for the first time
    and(sequentially again) from the beginning to the end for the second time

        B = <<1,2,3,4>>.
        <<B1,B2,B3,B4>> = B.
        <<B11,B22,B33,B44>> = B.

  Do we have just a single match context, which is moving from the beginning of B to the end
  of B and then moving again from the beginning of B to the end of B,

  or

  we have 2 match contexts, one is moving from the beginning of B to the end of B,
  and another - again from the beginning of B to the end of B (as first can't move
  to the beginning again)

  or we have 8 match contexts?


 2. According to [documentation][3], if I write:

        my_binary_to_list(<<H,T/binary>>) ->
            [H|my_binary_to_list(T)];
        my_binary_to_list(<<>>) -> [].

 there will be only 1 match context for the whole recursion tree, even though, this
 function isn't tail-recursive.

 a) Did I get it right, that there would be only 1 match context in this case?

 b) Did I get right, that if I match an erlang binary sequentially(from the beginning to
 the end), it doesn't matter which recursion type(tail or body) has to be used?(from the binary-matching efficiency point of view)

 c) What if I'm going to process erlang binary NOT sequentially, say, I'm travelling through
   a binary - first I match first byte, then 1000th, then 5th, then 10001th, then 10th...

   In this case,
   
   d1) If I used body-recursion, how many matching contexts for this binary would I have -
   one or >1?
   
   d2) if I used tail-recursion, how many matching contexts for this binary would I have -
   one or >1?


 3. If I pass a large binary(say 1 megabyte) via tail recursion, Will all the 1 megabyte data be copied? Or only a some kind of pointer to the beginning of this binary is being passed between calls?

 4. Does it matter which binary I'm matching - big or small - a match context will be created for binary of any size or only for large ones?
 

  [1]: http://www.erlang.org/doc/efficiency_guide/users_guide.html
  [2]: http://erlang.org/pipermail/erlang-questions/
  [3]: http://www.erlang.org/doc/efficiency_guide/binaryhandling.html#id65745


Thanks a lot in advance  & kind regards,
Alex

Reply | Threaded
Open this post in threaded view
|

How does a match context of an erlang bitstring work?

Björn Gustavsson-3
On Thu, Jun 12, 2014 at 4:47 PM, ami <m3oucat> wrote:

> Hi all
>
> I'd read [efficiency guide][1] and [erlang-questions mailing list archive][2] &
> all of the available books in erlang. But I haven't found the precise description of efficient
> binaries pattern matching. Though, I haven't read sources yet :) But I hope
> that people, who already have read them, would read this post. Here are my questions.
>
>
>  1. How many match contexts does an erlang binary have?
>

In general, a match context is created for every "<<"
and destroyed at ">>".

The exception is if the last thing matched is a binary,
in which case the match context can be retained in
certain cases (delayed sub-binary creation). For
example:

<<..., T/binary>> = Var,
<<...>> = T

>  a) if we match parts of a binary sequentially and just once
>
>         A = <<1,2,3,4>>.
>         <<A1,A2,A3,A4>> = A.
>
>  Do we have just one binary match context(moving from the beginning of A to the end), or four?

One.

>  b) if we match parts of a binary sequentially from the beginning to the end for the first time
>     and(sequentially again) from the beginning to the end for the second time
>
>         B = <<1,2,3,4>>.
>         <<B1,B2,B3,B4>> = B.
>         <<B11,B22,B33,B44>> = B.
>
>   Do we have just a single match context, which is moving from the beginning of B to the end
>   of B and then moving again from the beginning of B to the end of B,

No.

>   or
>
>   we have 2 match contexts, one is moving from the beginning of B to the end of B,
>   and another - again from the beginning of B to the end of B (as first can't move
>   to the beginning again)

Yes.

>   or we have 8 match contexts?

No.

>
>  2. According to [documentation][3], if I write:
>
>         my_binary_to_list(<<H,T/binary>>) ->
>             [H|my_binary_to_list(T)];
>         my_binary_to_list(<<>>) -> [].
>
>  there will be only 1 match context for the whole recursion tree, even though, this
>  function isn't tail-recursive.
>
>  a) Did I get it right, that there would be only 1 match context in this case?
>
Yes.

>  b) Did I get right, that if I match an erlang binary sequentially(from the beginning to
>  the end), it doesn't matter which recursion type(tail or body) has to be used?(from the binary-matching efficiency point of view)

Yes.

>
>  c) What if I'm going to process erlang binary NOT sequentially, say, I'm travelling through
>    a binary - first I match first byte, then 1000th, then 5th, then 10001th, then 10th...
>
>    In this case,
>
>    d1) If I used body-recursion, how many matching contexts for this binary would I have -
>    one or >1?
>
>    d2) if I used tail-recursion, how many matching contexts for this binary would I have -
>    one or >1?

It depends on your code. Potentially one match context could be enough.

Use the bin_opt_info option to find out.

>
>  3. If I pass a large binary(say 1 megabyte) via tail recursion, Will all the 1 megabyte data be copied? Or only a some kind of pointer to the beginning of this binary is being passed between calls?

Only a pointer. It doesn't matter whether you use tail-recursion or not.

In general, binaries are not copied, especially not big binaries.
(Binaries no more than 64 bytes in size may be stored on the
process heap and they will be copied when sent to another
process or stored in ETS tables.)

>  4. Does it matter which binary I'm matching - big or small - a match context will be created for binary of any size or only for large ones?

It doesn't matter.

/Bjorn

--
Bj?rn Gustavsson, Erlang/OTP, Ericsson AB