Bit Syntax and compiler Erlang/OTP R21.1

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Bit Syntax and compiler Erlang/OTP R21.1

Valentin Micic-6
Hi all,

Recently I’ve made a silly mistake. I wrote:

case Payload of
   <<_:4/binary-unit:8, _:255, _:7/binary-unit:8, 0:16>>   -> Payload;
   _                                                       -> throw( drop )
end

Instead of:

case Payload of
   <<_:4/binary-unit:8, 255:8, _:7/binary-unit:8, 0:16>>   -> Payload;
   _                                                       -> throw( drop )
end


Considering that overall pattern (which erroneously references 255 bits long field, instead of an octet with a value of 255 ) is not aligned to 8-bit boundary, is it unreasonable to expect the compiler to report this as a potential problem, or  at least generate a warning (support for bit-fields notwithstanding).

What am I missing here?

V/


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Bit Syntax and compiler Erlang/OTP R21.1

Fred Hebert-2
On Tue, Aug 20, 2019 at 8:26 AM Valentin Micic <[hidden email]> wrote:
Hi all,

Recently I’ve made a silly mistake. I wrote:

case Payload of
   <<_:4/binary-unit:8, _:255, _:7/binary-unit:8, 0:16>>   -> Payload;
   _                                                       -> throw( drop )
end


Considering that overall pattern (which erroneously references 255 bits long field, instead of an octet with a value of 255 ) is not aligned to 8-bit boundary, is it unreasonable to expect the compiler to report this as a potential problem, or  at least generate a warning (support for bit-fields notwithstanding).

What am I missing here?

 
There are informally two kinds of binaries: 8-bit aligned binaries (regular ones) are those people call 'binaries', and then you have bitstrings. Bitstrings don't need any alignment whatsoever. Your pattern can be made to work by using any fitting bitstring. For example:

1> <<_:4/binary-unit:8, _:255, _:7/binary-unit:8, 0:16>> = <<0:(32+255+7*8+16)>>.
<<0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
  0,...>>

But more generally, the binary/bitstring distinction will make sense when pattern matching:

3> <<_:4/binary-unit:8, _/binary>> = <<0:(32+255+7*8+16)>>.                      
** exception error: no match of right hand side value <<0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                                                        0,...>>
4> <<_:4/binary-unit:8, _/bitstring>> = <<0:(32+255+7*8+16)>>.
<<0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
  0,...>>
5> <<_:4/binary-unit:8, _/bits>> = <<0:(32+255+7*8+16)>>.  
<<0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
  0,...>>

Note that bits is shorthand for bitstring. Do note as well that you can always cheat the pattern match by specifying units:

6> <<_:4/binary-unit:8, _/binary-unit:1>> = <<0:(32+255+7*8+16)>>.
<<0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
  0,...>>

Nothing in Erlang actually mandates exact alignment, it's just that with the default widths of various types impacts pattern matching. Dialyzer, however, does enforce some semantic values. Here's a sample module:

-module(chk).
-export([f/0, f/1, g/0, g/1]).

f() -> f(<<0:17>>).
g() -> g(<<0:17>>).

-spec f(binary()) -> ok.
f(Bin) ->
    <<_/binary>> = Bin,
    ok.

-spec g(binary()) -> ok.
g(Bin) ->
    <<_/bits>> = Bin,
    ok.

If you run dialyzer on it, you'll find out the following:

chk.erl:4: Function f/0 has no local return
chk.erl:4: The call chk:f(<<_:17>>) will never return since the success typing is (binary()) -> 'ok' and the contract is (binary()) -> 'ok'
chk.erl:5: Function g/0 has no local return
chk.erl:5: The call chk:g(<<_:17>>) breaks the contract (binary()) -> 'ok'

the type binary(), to Dialyzer, implies the 8-bit alignment you're looking after. The bitstring() type will not care for alignment. This is because Dialyzer supports defining binary types as:
 <<>>             %% empty binary
 <<_:M>>          %% fixed-size binary, where M is a positive integer
 <<_:_*N>>        %% variable-size binary with an alignment on N
 <<_:M, _:_*N>>   %% binary of at least M size, with a variable-sized tail aligned on N
Essentially, binary() is defined as <<_:_*8>> and bitstring() is defined as <<_:_*1>>. This lets you encode whatever check semantics you'd like within type specifications, and Dialyzer can try to figure it out for you. But nothing, by default, would necessarily warrant compiler warnings since alignment on 8 bits is not mandated by the runtime.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Bit Syntax and compiler Erlang/OTP R21.1

Valentin Micic-6
Hi Fred,

I should have used a more appropriate erlang term “bitstring" as opposed to “bit field” in my email — thank you for correcting this.

I’ve been using bitstrings in the past, well, not really as a bunch of “loose” bits, but always as a part of some other binary pattern aligned to 8-bit boundary.

If you cannot write 17 loose bits to a file, or, better yet, if you cannot send 13 loose bits over a socket, one has to wonder how useful are non-aligned bitstrings (and by this I mean “loose” bits). 

And it gets worse. Consider this:

(tsdb_1_1@macbookv-3)433> term_to_binary( <<0:8>> ).

<<131,109,0,0,0,1,0>>

(tsdb_1_1@macbookv-3)434> term_to_binary( <<0:1>> ).

<<131,77,0,0,0,1,1,0>>


It follows that it takes more memory to store 1 loose bit than 8 aligned bits. 

And just to prove that if it walks like a duck, and quacks like a duck, it's probably… an elephant.


(tsdb_1_1@macbookv-3)449> is_binary( <<0:8>> ).

true

(tsdb_1_1@macbookv-3)448> is_binary( <<0:1>> ).

false


But, if you put two elephants next to each other, you get — a duck!

is_binary( <<0:5, 0:3>> ). 

true


Given all this, why would anyone find bitstrings useful?

But, the above notwithstanding, I understood your point that run-time does not mandate any kind of alignment, hence compiler has nothing to report.
Makes sense — Thank you.

V/

On 20 Aug 2019, at 15:13, Fred Hebert <[hidden email]> wrote:

On Tue, Aug 20, 2019 at 8:26 AM Valentin Micic <[hidden email]> wrote:
Hi all,

Recently I’ve made a silly mistake. I wrote:

case Payload of
   <<_:4/binary-unit:8, _:255, _:7/binary-unit:8, 0:16>>   -> Payload;
   _                                                       -> throw( drop )
end


Considering that overall pattern (which erroneously references 255 bits long field, instead of an octet with a value of 255 ) is not aligned to 8-bit boundary, is it unreasonable to expect the compiler to report this as a potential problem, or  at least generate a warning (support for bit-fields notwithstanding).

What am I missing here?

 
There are informally two kinds of binaries: 8-bit aligned binaries (regular ones) are those people call 'binaries', and then you have bitstrings. Bitstrings don't need any alignment whatsoever. Your pattern can be made to work by using any fitting bitstring. For example:

1> <<_:4/binary-unit:8, _:255, _:7/binary-unit:8, 0:16>> = <<0:(32+255+7*8+16)>>.
<<0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
  0,...>>

But more generally, the binary/bitstring distinction will make sense when pattern matching:

3> <<_:4/binary-unit:8, _/binary>> = <<0:(32+255+7*8+16)>>.                      
** exception error: no match of right hand side value <<0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                                                        0,...>>
4> <<_:4/binary-unit:8, _/bitstring>> = <<0:(32+255+7*8+16)>>.
<<0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
  0,...>>
5> <<_:4/binary-unit:8, _/bits>> = <<0:(32+255+7*8+16)>>.  
<<0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
  0,...>>

Note that bits is shorthand for bitstring. Do note as well that you can always cheat the pattern match by specifying units:

6> <<_:4/binary-unit:8, _/binary-unit:1>> = <<0:(32+255+7*8+16)>>.
<<0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
  0,...>>

Nothing in Erlang actually mandates exact alignment, it's just that with the default widths of various types impacts pattern matching. Dialyzer, however, does enforce some semantic values. Here's a sample module:

-module(chk).
-export([f/0, f/1, g/0, g/1]).

f() -> f(<<0:17>>).
g() -> g(<<0:17>>).

-spec f(binary()) -> ok.
f(Bin) ->
    <<_/binary>> = Bin,
    ok.

-spec g(binary()) -> ok.
g(Bin) ->
    <<_/bits>> = Bin,
    ok.

If you run dialyzer on it, you'll find out the following:

chk.erl:4: Function f/0 has no local return
chk.erl:4: The call chk:f(<<_:17>>) will never return since the success typing is (binary()) -> 'ok' and the contract is (binary()) -> 'ok'
chk.erl:5: Function g/0 has no local return
chk.erl:5: The call chk:g(<<_:17>>) breaks the contract (binary()) -> 'ok'

the type binary(), to Dialyzer, implies the 8-bit alignment you're looking after. The bitstring() type will not care for alignment. This is because Dialyzer supports defining binary types as:
 <<>>             %% empty binary
 <<_:M>>          %% fixed-size binary, where M is a positive integer
 <<_:_*N>>        %% variable-size binary with an alignment on N
 <<_:M, _:_*N>>   %% binary of at least M size, with a variable-sized tail aligned on N
Essentially, binary() is defined as <<_:_*8>> and bitstring() is defined as <<_:_*1>>. This lets you encode whatever check semantics you'd like within type specifications, and Dialyzer can try to figure it out for you. But nothing, by default, would necessarily warrant compiler warnings since alignment on 8 bits is not mandated by the runtime.



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Bit Syntax and compiler Erlang/OTP R21.1

empro2
On Tue, 20 Aug 2019 17:04:26 +0200
Valentin Micic <[hidden email]> wrote:

> (tsdb_1_1@macbookv-3)448> is_binary( <<0:1>> ).
>
> false

"is elephant?" is encoded in Erlang as: is_bitstring/1

75> is_bitstring(<<0:1>>).
is_bitstring(<<0:1>>).
true


> But, if you put two elephants next to each other, you get
> — a duck!
>
> is_binary( <<0:5, 0:3>> ).
>
> true

Read "is_binary/1" as
"is_bitstring/1 and (bitlength_of(Elephants) mod 8 == 0)"

:-)


> Given all this, why would anyone find bitstrings useful?

That is why the default bitstring is a bytestring or
octetstring or binary:

98> is_binary(<<"blah">>) and is_binary(<<1, 2, 3>>).
is_binary(<<"blah">>) and is_binary(<<1, 2, 3>>).
true

As long as you do not tell the compiler to take bites of
memory that are no bytes, with that ":", the two of you seem
to agree there.

Might help when extracting flag bits? (or when talking to
some 9-bit byte PDP ... or was that 11? ... ;-) Anyway I
would rather deprecate and obsolete away dot notation for
record elements ...

/Michael

--

Normality is merely a question of quantity,
not of quality.









_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Bit Syntax and compiler Erlang/OTP R21.1

Valentin Micic-6

On 20 Aug 2019, at 18:55, [hidden email] wrote:

On Tue, 20 Aug 2019 17:04:26 +0200
Valentin Micic <[hidden email]> wrote:

(tsdb_1_1@macbookv-3)448> is_binary( <<0:1>> ).

false

"is elephant?" is encoded in Erlang as: is_bitstring/1

I see…  nevertheless, I prefer is_duck/1, but thank you :-)


75> is_bitstring(<<0:1>>).
is_bitstring(<<0:1>>).
true


But, if you put two elephants next to each other, you get
— a duck!

is_binary( <<0:5, 0:3>> ).

true

Read "is_binary/1" as
"is_bitstring/1 and (bitlength_of(Elephants) mod 8 == 0)"

:-)

Given all this, why would anyone find bitstrings useful?

That is why the default bitstring is a bytestring or
octetstring or binary:

98> is_binary(<<"blah">>) and is_binary(<<1, 2, 3>>).
is_binary(<<"blah">>) and is_binary(<<1, 2, 3>>).
true

As long as you do not tell the compiler to take bites of
memory that are no bytes, with that ":", the two of you seem
to agree there.

Might help when extracting flag bits? (or when talking to
some 9-bit byte PDP ... or was that 11? ... ;-) Anyway I
would rather deprecate and obsolete away dot notation for
record elements …


Yes, you could use bit-syntax to extract flags, however, I doubt this will ever be presented to Erlang as a stream of bits (well, at least not nowadays) — unless, of course,  Erlang runs on top of the underlying hardware  that natively supports 9-bit bytes, of course. And how many of these are available to us today?

By contrast, the record dot notation is just a syntactic sugar, not a data type. You could chose not to use it (I never do, so no love lost if, or when they remove it). However, a defective usage of record dot syntax may not accidentally introduce a semantic problem, as compiler would definitely help you with identifying a problematic syntax (well,  I know, strictly speaking this is not quite true, but go with it, as this is true for record syntax in general, but not to dot notation as such).  Unlike situation I’ve described, where I have accidentally introduced a semantic, but not a syntax error (for the reasons Frank explained). 

Oh, please, don’t get me wrong, I like the ability to write <<World:2, Group:2, User:2, _:2>> = <<SomeFlags>>.
However, with this construct, you could still extract your flags, but this would not require any new data type (such as bitstring) —  e.g. is_binary(<<World:2, Group:2, User:2, _:2>> =  <<SomeFlags>> will merrily report true.

Contrast this with <<World:2, Group:2, User:2>> = Some6BitBitstring;

which, In order to work, must have Some6BitBitstring set to exactly a six-bit Bitstring, which could only be set from within Erlang runtime, thus fairly artificial construct with respect to the environment, e.g. OS, socket, files, etc.

Thus, my assertion is that bitstring is a pretty pointless data type. There.

/Michael

--

Normality is merely a question of quantity,
not of quality.


But, according to Hegelian dialectics, quantity transforms into quality and vice-versa :-)










_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Bit Syntax and compiler Erlang/OTP R21.1

Fred Hebert-2

On Tue, Aug 20, 2019 at 2:44 PM Valentin Micic <[hidden email]> wrote:

Oh, please, don’t get me wrong, I like the ability to write <<World:2, Group:2, User:2, _:2>> = <<SomeFlags>>.
However, with this construct, you could still extract your flags, but this would not require any new data type (such as bitstring) —  e.g. is_binary(<<World:2, Group:2, User:2, _:2>> =  <<SomeFlags>> will merrily report true.

Contrast this with <<World:2, Group:2, User:2>> = Some6BitBitstring;

which, In order to work, must have Some6BitBitstring set to exactly a six-bit Bitstring, which could only be set from within Erlang runtime, thus fairly artificial construct with respect to the environment, e.g. OS, socket, files, etc.

Thus, my assertion is that bitstring is a pretty pointless data type. There.

Well, let's take a TCP packet header for example; if you follow the current wikipedia definition with experimental RFCs, there are 3 reserved bits, followed by all the flags, set on 9 bits. You could either need to pattern match all the flags individually, but you can also write, thanks to Erlang and bitstrings, a function that accepts a pattern like <<..., _Reserved:3, Flags:9/bits, ...>> and then ask decode_flags(Flags) to return you a list of atoms for all the values that are set.

This might actually be desirable because some flags change meaning according to other flags; the ECE flag's value can be triple depending on whether it is 0, or whether it is 1 and the current value of the SYN flag being set (1 or 0).

You can obviously work around this, but if you do it C-style, you're likely to have to gobble the reserved bits along with the flags, and then extract the experimental flag from the reserved bits, and carry it as an extra value somewhere.

Mostly, there's interesting stuff that you can do with bitstrings that are matched-out sections of 8-bit aligned binaries, regardless of where you are reading them from and what the initial machine alignment was.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Bit Syntax and compiler Erlang/OTP R21.1

Steve Strong
In reply to this post by Valentin Micic-6
Or take SDI, where each word is 10bits... bitstring is invaluable.

Sent from my iPhone

> On 20 Aug 2019, at 20:21, Fred Hebert <[hidden email]> wrote:
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Bit Syntax and compiler Erlang/OTP R21.1

Dániel Szoboszlay
Bit strings could come very handy for compression algorithms too, say Huffman coding a byte stream. Yes, at the end you may have to pad the bitstring to get whole bytes that you can save to a file or send over the network, but until then it's very useful that your buffer can have an arbitrary length. Bitstrings are rarely needed, but when you need them, it's Godsent that the language has excellent support for this data type!

On Tue, 20 Aug 2019 at 21:24, Steve Strong <[hidden email]> wrote:
Or take SDI, where each word is 10bits... bitstring is invaluable.

Sent from my iPhone

> On 20 Aug 2019, at 20:21, Fred Hebert <[hidden email]> wrote:
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Bit Syntax and compiler Erlang/OTP R21.1

Valentin Micic-6

On 20 Aug 2019, at 22:13, Dániel Szoboszlay <[hidden email]> wrote:

Bit strings could come very handy for compression algorithms too, say Huffman coding a byte stream. Yes, at the end you may have to pad the bitstring to get whole bytes that you can save to a file or send over the network, but until then it's very useful that your buffer can have an arbitrary length. Bitstrings are rarely needed, but when you need them, it's Godsent that the language has excellent support for this data type!

For what is worth, I agree — compression algorithm implementation may benefit from Bitstrings.


On Tue, 20 Aug 2019 at 21:24, Steve Strong <[hidden email]> wrote:
Or take SDI, where each word is 10bits... bitstring is invaluable.

Sent from my iPhone

> On 20 Aug 2019, at 20:21, Fred Hebert <[hidden email]> wrote:
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Bit Syntax and compiler Erlang/OTP R21.1

Richard O'Keefe
In reply to this post by Valentin Micic-6
Valentin Micic wrote
  
:> If you cannot write 17 loose bits to a file, or,
:> better yet, if you cannot send 13 loose bits over a socket,
:> one has to wonder how useful are non-aligned bitstrings

This is a very odd thing to say.  PL/I has had bit strings
since about 1965.  Common Lisp has bit strings.  Some Scheme
implementations have bit strings.  APL has bit arrays of any
size and shape.  SQL 92 had BIT(n) and BIT(n) VARYING just
like PL/I -- surprise surprise -- but SQL 2003 dropped them,
while Postgres still supports them.  Ada has bit strings,
in the guise of packed arrays of Boolean, replacing Pascal's
sets (which are fixed size bit strings). Do I need to point
to bit string support in Java and C#?

You may not be able to send 13 loose bits over a socket, but
you *can* have a 13-bit field in a packet, and why should it
be hard to construct that 13-bit field or to pack it?  And of
course if you are running Erlang on a Raspberry Pi, you can
send or receive a message of *any* number of bits through the
Pi's GPIO pins (with the aid of a NIF).



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Bit Syntax and compiler Erlang/OTP R21.1

Valentin Micic-6


On 21 Aug 2019, at 02:06, Richard O'Keefe <[hidden email]> wrote:

Valentin Micic wrote
  
:> If you cannot write 17 loose bits to a file, or,
:> better yet, if you cannot send 13 loose bits over a socket,
:> one has to wonder how useful are non-aligned bitstrings

This is a very odd thing to say.  PL/I has had bit strings
since about 1965.  Common Lisp has bit strings.  Some Scheme
implementations have bit strings.  APL has bit arrays of any
size and shape.  SQL 92 had BIT(n) and BIT(n) VARYING just
like PL/I -- surprise surprise -- but SQL 2003 dropped them,
while Postgres still supports them.  Ada has bit strings,
in the guise of packed arrays of Boolean, replacing Pascal's
sets (which are fixed size bit strings). Do I need to point
to bit string support in Java and C#?

You may not be able to send 13 loose bits over a socket, but
you *can* have a 13-bit field in a packet, and why should it
be hard to construct that 13-bit field or to pack it? 


Well, as I’ve said, I never had anything against 13-bit field in packet… I was struggling to find a situation where one would benefit from being able to write <<Field:13>> instead of <<Field:13, _:3>>, that is — until Daniel Szoboszlay mentioned compression. Another example from my direct experience (oh irony), which came back to my mind only after Daniels observation,  is 7-bit character encoding used by SMS… this would have been way, way easier to implement using bitstrings.

I’ve seen the light, thank you.
Life moves on.
 
V/



And of
course if you are running Erlang on a Raspberry Pi, you can
send or receive a message of *any* number of bits through the
Pi's GPIO pins (with the aid of a NIF).



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions