Dialyzer and numeric range values

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Dialyzer and numeric range values

zxq9-2
If I give a typespec

-type foo() :: 10..14.

Dialyzer will show me

10 | 11 | 12 | 13 | 14

I can see this is inclusive.

If I give a typespec like

4..65

Dialyzer will show me

1..255

If I give a typespec

3..500

Dialyzer will show me

1..1114111

Which is hugely different than what I want.

Which also winds up meaning that 0..16#ffff, which should be the legal range for ports, winds up being interpreted as type char(), which happens to be 0..16#10ffff which is totally bizarre.

In fact, the docs for inet show me this definition:

ip6_address() =
    {0..65535,
     0..65535,
     0..65535,
     0..65535,
     0..65535,
     0..65535,
     0..65535,
     0..65535}

( http://erlang.org/doc/man/inet.html )

which I am shown Dialyzer interpreting as

{char(), char(), char(), char(), char(), char(), char(), char()}

and I can see the actual definition of char in the docs is indeed

-type char() :: 0..16#10ffff.

( erlang.org/documentation/doc-5.8/doc/reference_manual/typespec.html )

What is going on here? Is this a new thing with R20's Dialyzer, or have I just never noticed these multi-byte leaps in value range resolution?

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Dialyzer and numeric range values

Fred Hebert-2
On 11/01, zxq9 wrote:
>What is going on here? Is this a new thing with R20's Dialyzer, or have
>I just never noticed these multi-byte leaps in value range resolution?
>

You never noticed it. Dialyzer kind of keeps the right to expand ranges
arbitrarily. You can see in the source code that anything set of types
above 12 gets merged into one:
https://github.com/erlang/otp/blob/fe1df7fc6bf050cb6c9bbd99eb9393c426b62f67/lib/hipe/cerl/erl_types.erl#L1997-L2011

The same kind of stuff also happens with atoms or even types in general
(https://github.com/erlang/otp/blob/fe1df7fc6bf050cb6c9bbd99eb9393c426b62f67/lib/hipe/cerl/erl_types.erl#L4860-L4866)
which can be merged into the any() type.

It's frankly a bit of a bummer but I figure it's something Dialyzer does
to be less memory-hungry, at the cost of accuracy.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Dialyzer and numeric range values

zxq9-2
On 2017年11月01日 水曜日 07:27:35 Fred Hebert wrote:

> On 11/01, zxq9 wrote:
> >What is going on here? Is this a new thing with R20's Dialyzer, or have
> >I just never noticed these multi-byte leaps in value range resolution?
>
> You never noticed it. Dialyzer kind of keeps the right to expand ranges
> arbitrarily. You can see in the source code that anything set of types
> above 12 gets merged into one:
> https://github.com/erlang/otp/blob/fe1df7fc6bf050cb6c9bbd99eb9393c426b62f67/lib/hipe/cerl/erl_types.erl#L1997-L2011
>
> The same kind of stuff also happens with atoms or even types in general
> (https://github.com/erlang/otp/blob/fe1df7fc6bf050cb6c9bbd99eb9393c426b62f67/lib/hipe/cerl/erl_types.erl#L4860-L4866)
> which can be merged into the any() type.
>
> It's frankly a bit of a bummer but I figure it's something Dialyzer does
> to be less memory-hungry, at the cost of accuracy.

Hah! Wow. That's a pretty darn steep range curve on numerics!

Incidentally, also one of the only apt uses of an `if` I've seen in a while.

I've stayed away from Dialyzer internals for a reason (I get sort of obsessed about such things) but I wonder if it wouldn't be possible to establish numeric ranges as test ranges instead of enumerations?

It seems like this is the case, actually, but obviously I am missing something if they can't set ?int_range(X, Y) directly and must instead only ever set either ?int_range(1, ?MAX_BYTE) or ?int_range(1, ?MAX_CHAR).
https://github.com/erlang/otp/blob/fe1df7fc6bf050cb6c9bbd99eb9393c426b62f67/lib/hipe/cerl/erl_types.erl#L342

The only place this approach seems to hold is called t_from_range_unsafe/2.
https://github.com/erlang/otp/blob/fe1df7fc6bf050cb6c9bbd99eb9393c426b62f67/lib/hipe/cerl/erl_types.erl#L2018
I assume "unsafe" for a reason, though the expected tests actually do exist.
https://github.com/erlang/otp/blob/fe1df7fc6bf050cb6c9bbd99eb9393c426b62f67/lib/hipe/cerl/erl_types.erl#L2101

Hm. So I'm just puzzled at the reasoning behind the need to enumerate ranges, and then the presence code that doesn't need ranges to be enumerated. Even more to the point, why not a list of arbitrary ranges? Too slow? Hard to imagine speed is the primary concern. Perhaps simply some work that never quite got polished off because, well, Dialyzer is already super useful in 90% of cases?

Anyway, thanks for pointing that out. Quite interesting.

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Dialyzer and numeric range values

Tobias Lindahl-4
Types need to collapse to a higher level of abstraction at some point or the fixpoint iteration of the analysis might not terminate. Note that there are an infinite number of ranges.

In order for the analysis to keep track of when to collapse the ranges instead of expanding them, the structure of the analysis would have to take into account some kind of history of range expansion (e.g., expand the range X number of times, then collapse).

I spent quite some time on trying to use the ranges in Dialyzer in a better way, but as I recall it, the non-termination was the reason for collapsing early.

We used a more precise range analysis in the HiPE compiler for optimization, and it is certainly possible to get good results for it, but it is not as straightforward as it might seem.




2017-11-01 16:03 GMT+01:00 zxq9 <[hidden email]>:
On 2017年11月01日 水曜日 07:27:35 Fred Hebert wrote:
> On 11/01, zxq9 wrote:
> >What is going on here? Is this a new thing with R20's Dialyzer, or have
> >I just never noticed these multi-byte leaps in value range resolution?
>
> You never noticed it. Dialyzer kind of keeps the right to expand ranges
> arbitrarily. You can see in the source code that anything set of types
> above 12 gets merged into one:
> https://github.com/erlang/otp/blob/fe1df7fc6bf050cb6c9bbd99eb9393c426b62f67/lib/hipe/cerl/erl_types.erl#L1997-L2011
>
> The same kind of stuff also happens with atoms or even types in general
> (https://github.com/erlang/otp/blob/fe1df7fc6bf050cb6c9bbd99eb9393c426b62f67/lib/hipe/cerl/erl_types.erl#L4860-L4866)
> which can be merged into the any() type.
>
> It's frankly a bit of a bummer but I figure it's something Dialyzer does
> to be less memory-hungry, at the cost of accuracy.

Hah! Wow. That's a pretty darn steep range curve on numerics!

Incidentally, also one of the only apt uses of an `if` I've seen in a while.

I've stayed away from Dialyzer internals for a reason (I get sort of obsessed about such things) but I wonder if it wouldn't be possible to establish numeric ranges as test ranges instead of enumerations?

It seems like this is the case, actually, but obviously I am missing something if they can't set ?int_range(X, Y) directly and must instead only ever set either ?int_range(1, ?MAX_BYTE) or ?int_range(1, ?MAX_CHAR).
https://github.com/erlang/otp/blob/fe1df7fc6bf050cb6c9bbd99eb9393c426b62f67/lib/hipe/cerl/erl_types.erl#L342

The only place this approach seems to hold is called t_from_range_unsafe/2.
https://github.com/erlang/otp/blob/fe1df7fc6bf050cb6c9bbd99eb9393c426b62f67/lib/hipe/cerl/erl_types.erl#L2018
I assume "unsafe" for a reason, though the expected tests actually do exist.
https://github.com/erlang/otp/blob/fe1df7fc6bf050cb6c9bbd99eb9393c426b62f67/lib/hipe/cerl/erl_types.erl#L2101

Hm. So I'm just puzzled at the reasoning behind the need to enumerate ranges, and then the presence code that doesn't need ranges to be enumerated. Even more to the point, why not a list of arbitrary ranges? Too slow? Hard to imagine speed is the primary concern. Perhaps simply some work that never quite got polished off because, well, Dialyzer is already super useful in 90% of cases?

Anyway, thanks for pointing that out. Quite interesting.

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions