Quantcast

Dialyzer underspecs and overspecs options

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Dialyzer underspecs and overspecs options

Dmitry Kakurin-2

Hi all,

I've started using Dialyzer and tried to enable as many warnings as possible.

Two of them that seem especially useful are underspecs and overspecs.

But after struggling with warnings produced by them for a while I've realized that what I really need is a subset of each.

In the example below (sorry, I'm using Elixir syntax but hope it's still obvious) the 2 warnings I want to enable are under_in and over_out. They are the ones that will break callers relying on the spec. While over_in and under_out warnings are more of a hints to the function author about current implementation and are much less serious.

The current Dialyzer options allow me to enable either both over_* warnings or both under_* warnings. But that's not useful.

Is there a way to configure Dialyzer the way I want?


Here is the demo code:


  # suboptimal implementation, fine for caller
  @spec over_in(x :: integer) :: nil
  def over_in(x) when is_number(x) do
    nil
  end

  # broken caller contract, could return unexpected value
  @spec over_out(x :: number) :: integer
  def over_out(x) when is_number(x) do
    x
  end

  # broken caller contract, rejects input allowed by the spec
  @spec under_in(x :: number) :: nil
  def under_in(x) when is_integer(x) do
    nil
  end

  # current implementation detail, does not concern caller
  @spec under_out(x :: integer) :: number
  def under_out(x) when is_integer(x) do
    x
  end

and here are the warnings:


demo.ex:4: Type specification
'Elixir.Virt':over_in(x::integer()) -> 'nil' is a subtype of the success typing: 
'Elixir.Virt':over_in(number()) -> 'nil'

demo.ex:9: Type specification
'Elixir.Virt':over_out(x::number()) -> integer() is a subtype of the success typing:
'Elixir.Virt':over_out(number()) -> number()

demo.ex:14: Type specification
'Elixir.Virt':under_in(x::number()) -> 'nil' is a supertype of the success typing:
'Elixir.Virt':under_in(integer()) -> 'nil'

demo.ex:19: The specification for 'Elixir.Virt':under_out/1 states that the function might also return
float() but the inferred return is
integer()

Thanks, Dmitry.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Dialyzer underspecs and overspecs options

Alex S.
What you want is provided as a default behaviour. Overspecs you shouldn’t really enable for production needs, it’s «debug the Dialyzer» feature.


24 марта 2017 г., в 21:42, Dmitry Kakurin <[hidden email]> написал(а):

Hi all,
I've started using Dialyzer and tried to enable as many warnings as possible.
Two of them that seem especially useful are underspecs and overspecs.
But after struggling with warnings produced by them for a while I've realized that what I really need is a subset of each.
In the example below (sorry, I'm using Elixir syntax but hope it's still obvious) the 2 warnings I want to enable are under_in and over_out. They are the ones that will break callers relying on the spec. While over_in and under_out warnings are more of a hints to the function author about current implementation and are much less serious.
The current Dialyzer options allow me to enable either both over_* warnings or both under_* warnings. But that's not useful.
Is there a way to configure Dialyzer the way I want?

Here is the demo code:

  # suboptimal implementation, fine for caller
  @spec over_in(x :: integer) :: nil
  def over_in(x) when is_number(x) do
    nil
  end

  # broken caller contract, could return unexpected value
  @spec over_out(x :: number) :: integer
  def over_out(x) when is_number(x) do
    x
  end

  # broken caller contract, rejects input allowed by the spec
  @spec under_in(x :: number) :: nil
  def under_in(x) when is_integer(x) do
    nil
  end

  # current implementation detail, does not concern caller
  @spec under_out(x :: integer) :: number
  def under_out(x) when is_integer(x) do
    x
  end

and here are the warnings:

demo.ex:4: Type specification
'Elixir.Virt':over_in(x::integer()) -> 'nil' is a subtype of the success typing: 
'Elixir.Virt':over_in(number()) -> 'nil'

demo.ex:9: Type specification
'Elixir.Virt':over_out(x::number()) -> integer() is a subtype of the success typing:
'Elixir.Virt':over_out(number()) -> number()

demo.ex:14: Type specification
'Elixir.Virt':under_in(x::number()) -> 'nil' is a supertype of the success typing:
'Elixir.Virt':under_in(integer()) -> 'nil'

demo.ex:19: The specification for 'Elixir.Virt':under_out/1 states that the function might also return
float() but the inferred return is
integer()

Thanks, Dmitry.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Dialyzer underspecs and overspecs options

Jesper Louis Andersen-2
In reply to this post by Dmitry Kakurin-2
On Fri, Mar 24, 2017 at 11:37 PM Dmitry Kakurin <[hidden email]> wrote:
Two of them that seem especially useful are underspecs and overspecs.

If you think about it, an overspecification is not really a problem with the code. The dialyzer uses inference analysis and makes a best effort at determining the type of a given function. Your contract specifies your usage of the function. When a overspec happens, it is because your contract is more restrictive than the inference. This is usually not a problem. A small change in your function body might change the inferred type entirely. But unless you want to go change your contract all the time, it may be better to keep it more restrictive. Lots of functions are more general than their intended use. Another problem is when the dialyzer fails to make a precise inference, so falls back to type any(). Enabling overspecs will make the system complain all the time in this situation, but it may not be a problem.

Underspecs, on the other hand, could spell trouble. Here the inference is more restrictive, and thus there are inputs to the function which is within the contract, but won't make the code terminate as expected according to the inference scheme. If your code never makes use of the extended contract, there is no problem, but as soon as some user tries to do so, things will go bad. The dialyzer will usually catch such a use quickly, but it is nicer to be strict in the contract if possible.

There are, however, situations where underspecs isn't that desirable as a property either. Suppose you are returning a map, such as #{ a => 37 }. The dialyzer can infer its success type to be #{ a := pos_integer() }[0]. But you may have written a specification where you say a map() is returned. This is more general than the inferred type, and as such it is an underspec violation. Your contract isn't wrong, but on the other hand, it isn't precise either. A common problem that also hits a lot are when you are declaring a binary(), but the dialyzer is able to figure out a more precise type scheme, for instance <<_:64, _:_*24>>. You will perhaps be more happy with just saying "this is a binary".

My usual recommendation is to start with the default set of warnings. Remove every error there, because the slogan is

   The dialyzer is never wrong!

and this is absolutely true. Once you have a dialyzer clean piece of code with the default warnings, you start turning on underspecs, no_return, and so on. But often this is more for understanding if there is anything that sticks out as being too general.


[0] This is strictly not true. If you *always* return 37, then it figures out the type is exactly the same response every time and tags its type as #{ a := 37 }. But for this exposition, I'm assuming your code returns "some positive integer" where 37 is an example.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Dialyzer underspecs and overspecs options

Dmitry Kakurin-2
In reply to this post by Alex S.

What you want is provided as a default behaviour.


Hi Alex,


My observations don't match your statement above:

* Removing -Wunderspecs switch makes warning in both under_ methods disappear

* Removing -Woverspecs switch makes warning in both over_ methods disappear

* Removing both -W makes all 4 warnings disappear


And I'll reply to Jasper's email on usefulness of both underspecs and overspecs.


Thank you, Dmitry.

From: Alex S. <[hidden email]>
Sent: Monday, March 27, 2017 1:09 AM
To: Dmitry Kakurin
Cc: [hidden email]
Subject: Re: [erlang-questions] Dialyzer underspecs and overspecs options
 
What you want is provided as a default behaviour. Overspecs you shouldn’t really enable for production needs, it’s «debug the Dialyzer» feature.


24 марта 2017 г., в 21:42, Dmitry Kakurin <[hidden email]> написал(а):

Hi all,
I've started using Dialyzer and tried to enable as many warnings as possible.
Two of them that seem especially useful are underspecs and overspecs.
But after struggling with warnings produced by them for a while I've realized that what I really need is a subset of each.
In the example below (sorry, I'm using Elixir syntax but hope it's still obvious) the 2 warnings I want to enable are under_in and over_out. They are the ones that will break callers relying on the spec. While over_in and under_out warnings are more of a hints to the function author about current implementation and are much less serious.
The current Dialyzer options allow me to enable either both over_* warnings or both under_* warnings. But that's not useful.
Is there a way to configure Dialyzer the way I want?

Here is the demo code:

  # suboptimal implementation, fine for caller
  @spec over_in(x :: integer) :: nil
  def over_in(x) when is_number(x) do
    nil
  end

  # broken caller contract, could return unexpected value
  @spec over_out(x :: number) :: integer
  def over_out(x) when is_number(x) do
    x
  end

  # broken caller contract, rejects input allowed by the spec
  @spec under_in(x :: number) :: nil
  def under_in(x) when is_integer(x) do
    nil
  end

  # current implementation detail, does not concern caller
  @spec under_out(x :: integer) :: number
  def under_out(x) when is_integer(x) do
    x
  end

and here are the warnings:

demo.ex:4: Type specification
'Elixir.Virt':over_in(x::integer()) -> 'nil' is a subtype of the success typing: 
'Elixir.Virt':over_in(number()) -> 'nil'

demo.ex:9: Type specification
'Elixir.Virt':over_out(x::number()) -> integer() is a subtype of the success typing:
'Elixir.Virt':over_out(number()) -> number()

demo.ex:14: Type specification
'Elixir.Virt':under_in(x::number()) -> 'nil' is a supertype of the success typing:
'Elixir.Virt':under_in(integer()) -> 'nil'

demo.ex:19: The specification for 'Elixir.Virt':under_out/1 states that the function might also return
float() but the inferred return is
integer()

Thanks, Dmitry.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Dialyzer underspecs and overspecs options

Dmitry Kakurin-2
In reply to this post by Jesper Louis Andersen-2

Hi Jesper,


We need to separate what constitutes a specified contract violation for function inputs and function outputs.

I have reworked my demo example (below) and produced warnings (also below) to make it absolutely clear.


Let SpecIn be a set of @spec inputs and RealIn be a set of inputs as inferred by Dialyzer from real code, then:

* The Input Contract is satisfied when SpecIn <= RealIn (where <= is a non-strict subset operation). See over_in in demo code below.

* The Input Contract violation is detected by -Wunderspecs option when SpecIn > RealIn. See under_in below.

It is easy to see in the code: 

* It's OK for over_in to declare that it only accepts :a and :b while it also happens to accept :c. Maybe suboptimal, but fine.

* It's NOT OK for under_in to claim that it accepts :a, :b and :c and break if :c is passed. Rejecting :c would break the caller.


Let SpecOut be a set of @spec outputs and RealOut be a set of outputs as inferred by Dialyzer from real code, then:

* The Output Contract is satisfied when SpecOut >= RealOut (where >= is a non-strict superset operation). See under_out below.

* The Output Contract violation is detected by -Woverspecs option when SpecOut < RealOut. See over_out below.

It is easy to see in the code:

* It's OK for under_out to declare that it returns :a, :b and :c, while currently it only returns :a and :b. Maybe future implementations will return :c as well.
* It's NOT OK for over_out to declare that it returns :a and :b, but to also return :c sometimes. Returning :c would break the caller.

We can see the perfect asymmetry for inputs and outputs.

These are pretty-much covariance and contravariance rules: you cannot accept less than you claim and you cannot produce more than you claim.


Also we see how:

* -Wunderspecs option detects Input Contract violations and produces false positives for acceptable Output contracts

* -Woverspecs option detects Output Contract violations and produces false positives for acceptable Input contracts


That's why I'd like to enable underspec_in and overspec_out options (if they existed), as they are the 2 options detecting real problems (out of 4 possible combinations).


  # suboptimal implementation, fine for caller
  @spec over_in(x :: :a | :b) :: nil
  def over_in(x) when x == :a or x == :b or x == :c do
    nil
  end

  # BROKEN caller contract, rejects input allowed by the spec
  @spec under_in(x :: :a | :b | :c) :: nil
  def under_in(x) when x == :a or x == :b do
    nil
  end

  # current implementation detail, does not concern caller
  @spec under_out() :: :a | :b | :c
  def under_out() do
    produce_ab()
  end

  # BROKEN caller contract, could return unexpected value
  @spec over_out() :: :a | :b
  def over_out() do
    produce_abc()
  end

  # helpers
  
  @spec produce_ab() :: :a | :b
  def produce_ab() do
    case :rand.uniform(2do
      1 -> :a
      _ -> :b
    end
  end

  @spec produce_abc() :: :a | :b | :c
  def produce_abc() do
    case :rand.uniform(3do
      1 -> :a
      2 -> :b
      _ -> :c
    end
  end

and the warnings are:


demo.ex:6: Type specification
'Elixir.Virt':over_in('a' | 'b') -> 'nil' is a subtype of the success typing:
'Elixir.Virt':over_in('a' | 'b' | 'c') -> 'nil'

demo.ex:12: Type specification
'Elixir.Virt':under_in('a' | 'b' | 'c') -> 'nil' is a supertype of the success typing:
'Elixir.Virt':under_in('a' | 'b') -> 'nil'

demo.ex:18: The specification for
'Elixir.Virt':under_out/0 states that the function might also return
'c' but the inferred return is
'a' | 'b'

demo.ex:24: Type specification
'Elixir.Virt':over_out() -> 'a' | 'b' is a subtype of the success typing:
'Elixir.Virt':over_out() -> 'a' | 'b' | 'c'

Thank you, Dmitry


From: Jesper Louis Andersen <[hidden email]>
Sent: Tuesday, March 28, 2017 8:46:00 AM
To: Dmitry Kakurin; [hidden email]
Subject: Re: [erlang-questions] Dialyzer underspecs and overspecs options
 
On Fri, Mar 24, 2017 at 11:37 PM Dmitry Kakurin <[hidden email]> wrote:
Two of them that seem especially useful are underspecs and overspecs.

If you think about it, an overspecification is not really a problem with the code. The dialyzer uses inference analysis and makes a best effort at determining the type of a given function. Your contract specifies your usage of the function. When a overspec happens, it is because your contract is more restrictive than the inference. This is usually not a problem. A small change in your function body might change the inferred type entirely. But unless you want to go change your contract all the time, it may be better to keep it more restrictive. Lots of functions are more general than their intended use. Another problem is when the dialyzer fails to make a precise inference, so falls back to type any(). Enabling overspecs will make the system complain all the time in this situation, but it may not be a problem.

Underspecs, on the other hand, could spell trouble. Here the inference is more restrictive, and thus there are inputs to the function which is within the contract, but won't make the code terminate as expected according to the inference scheme. If your code never makes use of the extended contract, there is no problem, but as soon as some user tries to do so, things will go bad. The dialyzer will usually catch such a use quickly, but it is nicer to be strict in the contract if possible.

There are, however, situations where underspecs isn't that desirable as a property either. Suppose you are returning a map, such as #{ a => 37 }. The dialyzer can infer its success type to be #{ a := pos_integer() }[0]. But you may have written a specification where you say a map() is returned. This is more general than the inferred type, and as such it is an underspec violation. Your contract isn't wrong, but on the other hand, it isn't precise either. A common problem that also hits a lot are when you are declaring a binary(), but the dialyzer is able to figure out a more precise type scheme, for instance <<_:64, _:_*24>>. You will perhaps be more happy with just saying "this is a binary".

My usual recommendation is to start with the default set of warnings. Remove every error there, because the slogan is

   The dialyzer is never wrong!

and this is absolutely true. Once you have a dialyzer clean piece of code with the default warnings, you start turning on underspecs, no_return, and so on. But often this is more for understanding if there is anything that sticks out as being too general.


[0] This is strictly not true. If you *always* return 37, then it figures out the type is exactly the same response every time and tags its type as #{ a := 37 }. But for this exposition, I'm assuming your code returns "some positive integer" where 37 is an example.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Dialyzer underspecs and overspecs options

Jesper Louis Andersen-2
On Tue, Mar 28, 2017 at 9:03 PM Dmitry Kakurin <[hidden email]> wrote:
These are pretty-much covariance and contravariance rules: you cannot accept less than you claim and you cannot produce more than you claim.


Ah of course.

There is no way to specify the "cross" of the two options as of now I think. What you want is basically to check for contravariance in inputs and covariance in outputs if I read what you say correctly. This is normal errors for a subtype hierarchy where you have functions as first class values. That is the relation is that

    S1 -> T1 <: S2 -> T2

Given that

    T1 <: T2 (covariance)
    S2 <: S1 (contravariance)

Do note that the dialyzer is an approximate analysis however. Consider these functions:

f(a) -> true;
f(b) -> false;
f(c) -> true.
 
g(a) -> true;
g(b) -> true;
g(c) -> false;
g(d) -> true;
g(e) -> false;
g(f) -> true;
g(g) -> true;
g(h) -> true;
g(i) -> false;
g(j) -> false;
g(k) -> true;
g(l) -> true;
g(m) -> false;
g(n) -> true;
g(o) -> false.

The dialyzer infers the following types for these:

-spec f('a' | 'b' | 'c') -> boolean().
-spec g(atom()) -> boolean().

The point is that 'g' contains "too much" information so the dialyzer just moves one up in the type lattice rather than running with a precise specification. Clearly g/1 fails if called as 'g(x)' for instance, but this is not represented in the type specification. Rather, the dialyzer has approximated that the function is probably from atom() to boolean().

This is what usually makes underspecs/overspecs a bit like a false positive in the system. As long as you travel along a subtyping chain in the lattice, you may be witnessing an approximation.

Where we are sure of a mistake is if you suddenly have an input that doesn't match the type lattice at all. Say we called 'g(<<>>)' which would obviously be wrong since g doesn't accept binary() as input.

It may also be worth mentioning that a success type f : A -> B is defined in a specific manner:

  Suppose f(X) reduces to a value V. Then V is among type B and X is among type A.

It is subtly different from a standard static type system. Note that 'g/1' above fulfills this notion: if it terminates with a boolean() value, then its input was definitely an atom(). Also note that the succes type any() -> any is valid for 'g/1'. But it doesn't establish a relation of being well-typed and thus having certain desirable meta-theoretic properties.

On the other hand, the key aspect is that while it over-approximates, we can say something about a function application such as g(<<>>) because we know a priori it will fail. This is program defect and we better report such defects!

If you are interested, Lindahl and Sagonas has a paper on the implementation, and its exposition is better than mine given here:

https://it.uu.se/research/group/hipe/papers/succ_types.pdf

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Dialyzer underspecs and overspecs options

Dmitry Kakurin-2

There is no way to specify the "cross" of the two options as of now I think. 


How realistic do you think it would be to add this functionality to Dialyzer?


Thank you, Dmitry.

From: Jesper Louis Andersen <[hidden email]>
Sent: Tuesday, March 28, 2017 4:20:47 PM
To: Dmitry Kakurin; [hidden email]
Subject: Re: [erlang-questions] Dialyzer underspecs and overspecs options
 
On Tue, Mar 28, 2017 at 9:03 PM Dmitry Kakurin <[hidden email]> wrote:
These are pretty-much covariance and contravariance rules: you cannot accept less than you claim and you cannot produce more than you claim.


Ah of course.

There is no way to specify the "cross" of the two options as of now I think. What you want is basically to check for contravariance in inputs and covariance in outputs if I read what you say correctly. This is normal errors for a subtype hierarchy where you have functions as first class values. That is the relation is that

    S1 -> T1 <: S2 -> T2

Given that

    T1 <: T2 (covariance)
    S2 <: S1 (contravariance)

Do note that the dialyzer is an approximate analysis however. Consider these functions:

f(a) -> true;
f(b) -> false;
f(c) -> true.
 
g(a) -> true;
g(b) -> true;
g(c) -> false;
g(d) -> true;
g(e) -> false;
g(f) -> true;
g(g) -> true;
g(h) -> true;
g(i) -> false;
g(j) -> false;
g(k) -> true;
g(l) -> true;
g(m) -> false;
g(n) -> true;
g(o) -> false.

The dialyzer infers the following types for these:

-spec f('a' | 'b' | 'c') -> boolean().
-spec g(atom()) -> boolean().

The point is that 'g' contains "too much" information so the dialyzer just moves one up in the type lattice rather than running with a precise specification. Clearly g/1 fails if called as 'g(x)' for instance, but this is not represented in the type specification. Rather, the dialyzer has approximated that the function is probably from atom() to boolean().

This is what usually makes underspecs/overspecs a bit like a false positive in the system. As long as you travel along a subtyping chain in the lattice, you may be witnessing an approximation.

Where we are sure of a mistake is if you suddenly have an input that doesn't match the type lattice at all. Say we called 'g(<<>>)' which would obviously be wrong since g doesn't accept binary() as input.

It may also be worth mentioning that a success type f : A -> B is defined in a specific manner:

  Suppose f(X) reduces to a value V. Then V is among type B and X is among type A.

It is subtly different from a standard static type system. Note that 'g/1' above fulfills this notion: if it terminates with a boolean() value, then its input was definitely an atom(). Also note that the succes type any() -> any is valid for 'g/1'. But it doesn't establish a relation of being well-typed and thus having certain desirable meta-theoretic properties.

On the other hand, the key aspect is that while it over-approximates, we can say something about a function application such as g(<<>>) because we know a priori it will fail. This is program defect and we better report such defects!

If you are interested, Lindahl and Sagonas has a paper on the implementation, and its exposition is better than mine given here:

https://it.uu.se/research/group/hipe/papers/succ_types.pdf

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Loading...