Must and May convention

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
59 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Must and May convention

Joe Armstrong-2
For several years I've been using a convention in my hobby
projects. It's what I call the must-may convention.

I'm wondering if it should be widely used.

What is it?

There are two commonly used conventions for handling bad arguments to
a function. We can return {ok, Val} or {error, Reason} or we can
return a value if the arguments are correct, and raise an exception
otherwise.

The problem is that when I read code and see a function like
'foo:bar(a,12)' I have no idea if it obeys one of these conventions or
does something completely different. I have to read the code to find
out.

My convention is to prefix the function name with 'must_' or 'may_'

If I see must_ in the name I know the arguments to the function must
be correct and that the function will return {ok,Val} | {error,Why}

If I see may_ in the name I know the arguments must be correct but if
they are not an exception will be generated, with meaningful data in
the exception so I can see what went wrong.

If a function follows neither of these conventions then it can do
whatever it feels like.

        - example -

  I have a library of interface functions,
  something like:

   must_read_file(F) ->
       case file:read_file(F) of
            {ok, Bin} ->
        Bin;
    {error, _} ->
        io:format("Could not read file ~p~n",[F]),
        exit({must,violation,read_file, F})
       end.

   may_read_file(F) -> file:read_file(F).

Now my source code becomes more readable

    test(F) ->
        Bin = must_read_file(F),
...

or
    test(F) ->
       case may_read_file(F) of
            {ok, B} ->
        ..
            {error, _} ->
    ..
end.

This turns out to be very convenient - I read many files
in my programs, so it's nice to know that must_read_file
will print a nice error message and terminate
if I give it a bad filename.

Note: I can get the program to crash by writing

   {ok, B} = file:read_file(F)

But I don't get a nice error message telling me the filename.

Any takers?

Cheers

/Joe
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Must and May convention

Attila Rajmund Nohl
2017-09-27 11:08 GMT+02:00 Joe Armstrong <[hidden email]>:
[...]

> This turns out to be very convenient - I read many files
> in my programs, so it's nice to know that must_read_file
> will print a nice error message and terminate
> if I give it a bad filename.
>
> Note: I can get the program to crash by writing
>
>    {ok, B} = file:read_file(F)
>
> But I don't get a nice error message telling me the filename.
>
> Any takers?

My problem with all kinds of very common prefixes is that it breaks
function name autocompletion. Or at least makes it harder to use when
I have dozens/hundreds/thousands of function names starting with the
same 3-5 characters - either I have to type those 3-5 characters all
the time plus any more characters to find unique prefix for
autocompletion or have to autocomplete twice (one for the may/must
difference, one for the actual function name). Eclipse (for Java)
helpfully shows the method signature including return value and thrown
exceptions - I think this problem should be solved by the IDEs and not
by naming conventions.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Must and May convention

Dmytro Lytovchenko
In older and more advanced languages this is done with function attributes. They don't interfere with naming, and are available for IDE and for compiler to see. Something like:

attr([throw]) fun read_file(Arg) -> ... exit(file_not_found).
attr([nothrow, ok_error]) fun read_file2(Arg) -> {error, file_not_found}.

2017-09-27 11:25 GMT+02:00 Attila Rajmund Nohl <[hidden email]>:
2017-09-27 11:08 GMT+02:00 Joe Armstrong <[hidden email]>:
[...]
> This turns out to be very convenient - I read many files
> in my programs, so it's nice to know that must_read_file
> will print a nice error message and terminate
> if I give it a bad filename.
>
> Note: I can get the program to crash by writing
>
>    {ok, B} = file:read_file(F)
>
> But I don't get a nice error message telling me the filename.
>
> Any takers?

My problem with all kinds of very common prefixes is that it breaks
function name autocompletion. Or at least makes it harder to use when
I have dozens/hundreds/thousands of function names starting with the
same 3-5 characters - either I have to type those 3-5 characters all
the time plus any more characters to find unique prefix for
autocompletion or have to autocomplete twice (one for the may/must
difference, one for the actual function name). Eclipse (for Java)
helpfully shows the method signature including return value and thrown
exceptions - I think this problem should be solved by the IDEs and not
by naming conventions.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Must and May convention

Karlo Kuna
In reply to this post by Attila Rajmund Nohl
could we have something in type specifications that indicates throw case:

-spec f_name( Args ) -> .....; throw(TE | TE1 | ....).

On Wed, Sep 27, 2017 at 11:25 AM, Attila Rajmund Nohl <[hidden email]> wrote:
2017-09-27 11:08 GMT+02:00 Joe Armstrong <[hidden email]>:
[...]
> This turns out to be very convenient - I read many files
> in my programs, so it's nice to know that must_read_file
> will print a nice error message and terminate
> if I give it a bad filename.
>
> Note: I can get the program to crash by writing
>
>    {ok, B} = file:read_file(F)
>
> But I don't get a nice error message telling me the filename.
>
> Any takers?

My problem with all kinds of very common prefixes is that it breaks
function name autocompletion. Or at least makes it harder to use when
I have dozens/hundreds/thousands of function names starting with the
same 3-5 characters - either I have to type those 3-5 characters all
the time plus any more characters to find unique prefix for
autocompletion or have to autocomplete twice (one for the may/must
difference, one for the actual function name). Eclipse (for Java)
helpfully shows the method signature including return value and thrown
exceptions - I think this problem should be solved by the IDEs and not
by naming conventions.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Must and May convention

Roman Galeev
Imo you can do it with:
must(M,F,A) ->
case erlang:apply(M,F,A) of
{ok, Res} -> Res;
_Err ->
io:format("Error ~p:~p(~p) ~p",[M, F, A, _Err]),
exit({must, F, A,})
end.

In this case, you don't need to make new functions out of existing ones, and still get a nice error message.

On Wed, Sep 27, 2017 at 11:30 AM, Karlo Kuna <[hidden email]> wrote:
could we have something in type specifications that indicates throw case:

-spec f_name( Args ) -> .....; throw(TE | TE1 | ....).

On Wed, Sep 27, 2017 at 11:25 AM, Attila Rajmund Nohl <[hidden email]> wrote:
2017-09-27 11:08 GMT+02:00 Joe Armstrong <[hidden email]>:
[...]
> This turns out to be very convenient - I read many files
> in my programs, so it's nice to know that must_read_file
> will print a nice error message and terminate
> if I give it a bad filename.
>
> Note: I can get the program to crash by writing
>
>    {ok, B} = file:read_file(F)
>
> But I don't get a nice error message telling me the filename.
>
> Any takers?

My problem with all kinds of very common prefixes is that it breaks
function name autocompletion. Or at least makes it harder to use when
I have dozens/hundreds/thousands of function names starting with the
same 3-5 characters - either I have to type those 3-5 characters all
the time plus any more characters to find unique prefix for
autocompletion or have to autocomplete twice (one for the may/must
difference, one for the actual function name). Eclipse (for Java)
helpfully shows the method signature including return value and thrown
exceptions - I think this problem should be solved by the IDEs and not
by naming conventions.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions




--
With best regards,
     Roman Galeev,
     +420 702 817 968

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Must and May convention

Joe Armstrong-2
But the source would be ugly

I want to write 

    must_read_file(F)

and not

    must(file,read_file,[F])

/Joe

On Wed, Sep 27, 2017 at 11:39 AM, Roman Galeev <[hidden email]> wrote:
Imo you can do it with:
must(M,F,A) ->
case erlang:apply(M,F,A) of
{ok, Res} -> Res;
_Err ->
io:format("Error ~p:~p(~p) ~p",[M, F, A, _Err]),
exit({must, F, A,})
end.

In this case, you don't need to make new functions out of existing ones, and still get a nice error message.

On Wed, Sep 27, 2017 at 11:30 AM, Karlo Kuna <[hidden email]> wrote:
could we have something in type specifications that indicates throw case:

-spec f_name( Args ) -> .....; throw(TE | TE1 | ....).

On Wed, Sep 27, 2017 at 11:25 AM, Attila Rajmund Nohl <[hidden email]> wrote:
2017-09-27 11:08 GMT+02:00 Joe Armstrong <[hidden email]>:
[...]
> This turns out to be very convenient - I read many files
> in my programs, so it's nice to know that must_read_file
> will print a nice error message and terminate
> if I give it a bad filename.
>
> Note: I can get the program to crash by writing
>
>    {ok, B} = file:read_file(F)
>
> But I don't get a nice error message telling me the filename.
>
> Any takers?

My problem with all kinds of very common prefixes is that it breaks
function name autocompletion. Or at least makes it harder to use when
I have dozens/hundreds/thousands of function names starting with the
same 3-5 characters - either I have to type those 3-5 characters all
the time plus any more characters to find unique prefix for
autocompletion or have to autocomplete twice (one for the may/must
difference, one for the actual function name). Eclipse (for Java)
helpfully shows the method signature including return value and thrown
exceptions - I think this problem should be solved by the IDEs and not
by naming conventions.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions




--
With best regards,
     Roman Galeev,
     <a href="tel:+420%20702%20817%20968" value="+420702817968" target="_blank">+420 702 817 968

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Must and May convention

Roman Galeev
Well, Erlang sources aren't the prettiest anyway, IMO mostly due to function composition syntax.

On Wed, Sep 27, 2017 at 12:10 PM, Joe Armstrong <[hidden email]> wrote:
But the source would be ugly

I want to write 

    must_read_file(F)

and not

    must(file,read_file,[F])

/Joe

On Wed, Sep 27, 2017 at 11:39 AM, Roman Galeev <[hidden email]> wrote:
Imo you can do it with:
must(M,F,A) ->
case erlang:apply(M,F,A) of
{ok, Res} -> Res;
_Err ->
io:format("Error ~p:~p(~p) ~p",[M, F, A, _Err]),
exit({must, F, A,})
end.

In this case, you don't need to make new functions out of existing ones, and still get a nice error message.

On Wed, Sep 27, 2017 at 11:30 AM, Karlo Kuna <[hidden email]> wrote:
could we have something in type specifications that indicates throw case:

-spec f_name( Args ) -> .....; throw(TE | TE1 | ....).

On Wed, Sep 27, 2017 at 11:25 AM, Attila Rajmund Nohl <[hidden email]> wrote:
2017-09-27 11:08 GMT+02:00 Joe Armstrong <[hidden email]>:
[...]
> This turns out to be very convenient - I read many files
> in my programs, so it's nice to know that must_read_file
> will print a nice error message and terminate
> if I give it a bad filename.
>
> Note: I can get the program to crash by writing
>
>    {ok, B} = file:read_file(F)
>
> But I don't get a nice error message telling me the filename.
>
> Any takers?

My problem with all kinds of very common prefixes is that it breaks
function name autocompletion. Or at least makes it harder to use when
I have dozens/hundreds/thousands of function names starting with the
same 3-5 characters - either I have to type those 3-5 characters all
the time plus any more characters to find unique prefix for
autocompletion or have to autocomplete twice (one for the may/must
difference, one for the actual function name). Eclipse (for Java)
helpfully shows the method signature including return value and thrown
exceptions - I think this problem should be solved by the IDEs and not
by naming conventions.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions




--
With best regards,
     Roman Galeev,
     <a href="tel:+420%20702%20817%20968" value="+420702817968" target="_blank">+420 702 817 968

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions





--
With best regards,
     Roman Galeev,
     +420 702 817 968

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Must and May convention

Valentin Micic-2
In reply to this post by Attila Rajmund Nohl

On 27 Sep 2017, at 11:25 AM, Attila Rajmund Nohl wrote:

> 2017-09-27 11:08 GMT+02:00 Joe Armstrong <[hidden email]>:
> [...]
>> This turns out to be very convenient - I read many files
>> in my programs, so it's nice to know that must_read_file
>> will print a nice error message and terminate
>> if I give it a bad filename.
>>
>> Note: I can get the program to crash by writing
>>
>>   {ok, B} = file:read_file(F)
>>
>> But I don't get a nice error message telling me the filename.
>>
>> Any takers?
>
> My problem with all kinds of very common prefixes is that it breaks
> function name autocompletion. Or at least makes it harder to use when
> I have dozens/hundreds/thousands of function names starting with the
> same 3-5 characters - either I have to type those 3-5 characters all
> the time plus any more characters to find unique prefix for
> autocompletion or have to autocomplete twice (one for the may/must
> difference, one for the actual function name). Eclipse (for Java)
> helpfully shows the method signature including return value and thrown
> exceptions - I think this problem should be solved by the IDEs and not
> by naming conventions.
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

Maybe you could solve this by using suffix instead of prefix?
Thus, instead of "must" or "may",  you could  just use suffix "throws", if functions will raise an exception.

read( … )  for no exception
read_throws(…) for functions raising exception.

V/

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Must and May convention

Karlo Kuna
In reply to this post by Roman Galeev
> But the source would be ugly

parse transformations could help

anyway i am against using prefixes (or suffixes) as they encode information that might be useful for tools as well as humans 

i view code as interface to the tools and not to the humans. Ideally one should write piece of code with tests and documentation
and after review period it should be used and forgotten, and only interface from  documentation should be consulted 

so i would prefer to put information about throws also in function specifications 
   

On Wed, Sep 27, 2017 at 12:12 PM, Roman Galeev <[hidden email]> wrote:
Well, Erlang sources aren't the prettiest anyway, IMO mostly due to function composition syntax.

On Wed, Sep 27, 2017 at 12:10 PM, Joe Armstrong <[hidden email]> wrote:
But the source would be ugly

I want to write 

    must_read_file(F)

and not

    must(file,read_file,[F])

/Joe

On Wed, Sep 27, 2017 at 11:39 AM, Roman Galeev <[hidden email]> wrote:
Imo you can do it with:
must(M,F,A) ->
case erlang:apply(M,F,A) of
{ok, Res} -> Res;
_Err ->
io:format("Error ~p:~p(~p) ~p",[M, F, A, _Err]),
exit({must, F, A,})
end.

In this case, you don't need to make new functions out of existing ones, and still get a nice error message.

On Wed, Sep 27, 2017 at 11:30 AM, Karlo Kuna <[hidden email]> wrote:
could we have something in type specifications that indicates throw case:

-spec f_name( Args ) -> .....; throw(TE | TE1 | ....).

On Wed, Sep 27, 2017 at 11:25 AM, Attila Rajmund Nohl <[hidden email]> wrote:
2017-09-27 11:08 GMT+02:00 Joe Armstrong <[hidden email]>:
[...]
> This turns out to be very convenient - I read many files
> in my programs, so it's nice to know that must_read_file
> will print a nice error message and terminate
> if I give it a bad filename.
>
> Note: I can get the program to crash by writing
>
>    {ok, B} = file:read_file(F)
>
> But I don't get a nice error message telling me the filename.
>
> Any takers?

My problem with all kinds of very common prefixes is that it breaks
function name autocompletion. Or at least makes it harder to use when
I have dozens/hundreds/thousands of function names starting with the
same 3-5 characters - either I have to type those 3-5 characters all
the time plus any more characters to find unique prefix for
autocompletion or have to autocomplete twice (one for the may/must
difference, one for the actual function name). Eclipse (for Java)
helpfully shows the method signature including return value and thrown
exceptions - I think this problem should be solved by the IDEs and not
by naming conventions.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions




--
With best regards,
     Roman Galeev,
     <a href="tel:+420%20702%20817%20968" value="+420702817968" target="_blank">+420 702 817 968

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions





--
With best regards,
     Roman Galeev,
     <a href="tel:+420%20702%20817%20968" value="+420702817968" target="_blank">+420 702 817 968


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Must and May convention

Loïc Hoguin-3
In reply to this post by Joe Armstrong-2
On 09/27/2017 11:08 AM, Joe Armstrong wrote:

> For several years I've been using a convention in my hobby
> projects. It's what I call the must-may convention.
>
> I'm wondering if it should be widely used.
>
> What is it?
>
> There are two commonly used conventions for handling bad arguments to
> a function. We can return {ok, Val} or {error, Reason} or we can
> return a value if the arguments are correct, and raise an exception
> otherwise.
>
> The problem is that when I read code and see a function like
> 'foo:bar(a,12)' I have no idea if it obeys one of these conventions or
> does something completely different. I have to read the code to find
> out.
>
> My convention is to prefix the function name with 'must_' or 'may_'

I've been debating this in my head for a long time. I came to the
conclusion that 99% of the time I do not want to handle errors.
Therefore 99% of the functions should not return an error.

What happens for the 1% of the time where I do want to handle an error
and the function doesn't allow it? Well, I catch the exception. And
that's why I started using more meaningful exceptions for these cases.
For example, Cowboy 2.0 has the following kind of code when it fails to
validate input:

     try
         cow_qs:parse_qs(Qs)
     catch _:_ →
         erlang:raise(exit, {request_error, qs,
             'Malformed query string; application/x-www-form-urlencoded
expected.'
         }, erlang:get_stacktrace())
     end.

99% of the time I don't care about it because Cowboy will properly
notice it's an input error and will return a 400 automatically (instead
of 500 for other crashes). It still contains the full details of the
error should I wish to debug it, and if it is necessary to provide more
details to the user I can catch it and do something with it.

(The exception probably won't make it as a documented feature in 2.0 due
to lack of time but I will rectify this in future releases.)

This strategy also helps with writing clearer code because I don't need
to have nested case statements, I can just have one try/catch with
multiple catch clauses to identify the errors I do want to catch, and
let the others go through.

     try
         Qs = cowboy_req:parse_qs(Req),
         Cookies = cowboy_req:parse_cookies(Req),
         doit(Qs, Cookies)
     catch
         exit:{request_error, qs, _} ->
             bad_qs(Req);
         exit:{request_error, {header, <<"cookie">>}, _} ->
             bad_cookie(Req)
     end

Write for the happy path and handle all errors I care about in the same
place. Goodbye nested cases for error handling!

I have also been using exceptions in a "different" way for parsing
Asciidoc files. Asciidoc input is *always* correct, there can not be a
malformed Asciidoc file (as far as parsing is concerned). When input
looks wrong it's a paragraph.

I can therefore in that case simply write functions for parsing each
possible elements, and try them one by one on the input until I find a
parsing function that doesn't crash. If it doesn't crash, then that
means I found the type of block for the input. If it crashes, I try the
next type of block.

So I have a function like this defining the block types:

block(St) →
     skip(fun empty_line/1, St),
     oneof([
         fun eof/1,
         %% Section titles.
         fun section_title/1,
         fun long_section_title/1,
         %% Block macros.
         fun block_id/1,
         fun comment_line/1,
         fun block_macro/1,
         %% Lists.
         fun bulleted_list/1,
         fun numbered_list/1,
...

And then one of those parse functions would be like this for example:

comment_line(St) →
     «"//", C, Comment0/bits» = read_line(St),
     true = ?IS_WS(C),
     Comment = trim(Comment0),
     %% Good!
     {comment_line, #{}, Comment, ann(St)}.

If it crashes, then it's not a comment line!

The oneof function is of course defined like this:

oneof([], St) →
     throw({error, St}); %% @todo
oneof([Parse|Tail], St=#state{reader=ReaderPid}) →
     Ln = asciideck_line_reader:get_position(ReaderPid),
     try
         Parse(St)
     catch _:_ →
         asciideck_line_reader:set_position(ReaderPid, Ln),
         oneof(Tail, St)
     end.

This allows me to do some parsec-like parsing by abusing exceptions. But
the great thing about it is that I don't need to worry about error
handling here again, I just try calling parse functions until one
doesn't crash.

So to go back to the topic at hand, I would say forget about the
distinction between must and may, and truly embrace "happy path"
programming and make smart use of exceptions. Deal with errors in one
place instead of having nested cases/many functions. There are of course
other ways to do this, but only exceptions let you do this both in the
local process and in a separate process, depending on your needs.

(I will now expect horrified replies from purists. Do not disappoint.)

Cheers,

--
Loïc Hoguin
https://ninenines.eu
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Must and May convention

Stephen Han

Personally, I have been using ‘safe_’ notation as ‘must_’.

For ‘may_’, I just use regular function name. For example, safe_format_message vs format_message.

 

However, those ‘safe_’ functions are private functions in the module since it looks ugly. If some of the functions are required to do strict error checking then I put wrapper around ‘safe_’ function.

 

I wasn’t that much of favor on generic exception handler because sometimes I prefer let program to crash before GA the code.

 

Sent from Mail for Windows 10

 

From: [hidden email]
Sent: Wednesday, September 27, 2017 3:46 AM
To: [hidden email]; [hidden email]
Subject: Re: [erlang-questions] Must and May convention

 

On 09/27/2017 11:08 AM, Joe Armstrong wrote:

> For several years I've been using a convention in my hobby

> projects. It's what I call the must-may convention.

>

> I'm wondering if it should be widely used.

>

> What is it?

>

> There are two commonly used conventions for handling bad arguments to

> a function. We can return {ok, Val} or {error, Reason} or we can

> return a value if the arguments are correct, and raise an exception

> otherwise.

>

> The problem is that when I read code and see a function like

> 'foo:bar(a,12)' I have no idea if it obeys one of these conventions or

> does something completely different. I have to read the code to find

> out.

>

> My convention is to prefix the function name with 'must_' or 'may_'

 

I've been debating this in my head for a long time. I came to the

conclusion that 99% of the time I do not want to handle errors.

Therefore 99% of the functions should not return an error.

 

What happens for the 1% of the time where I do want to handle an error

and the function doesn't allow it? Well, I catch the exception. And

that's why I started using more meaningful exceptions for these cases.

For example, Cowboy 2.0 has the following kind of code when it fails to

validate input:

 

     try

         cow_qs:parse_qs(Qs)

     catch _:_

         erlang:raise(exit, {request_error, qs,

             'Malformed query string; application/x-www-form-urlencoded

expected.'

         }, erlang:get_stacktrace())

     end.

 

99% of the time I don't care about it because Cowboy will properly

notice it's an input error and will return a 400 automatically (instead

of 500 for other crashes). It still contains the full details of the

error should I wish to debug it, and if it is necessary to provide more

details to the user I can catch it and do something with it.

 

(The exception probably won't make it as a documented feature in 2.0 due

to lack of time but I will rectify this in future releases.)

 

This strategy also helps with writing clearer code because I don't need

to have nested case statements, I can just have one try/catch with

multiple catch clauses to identify the errors I do want to catch, and

let the others go through.

 

     try

         Qs = cowboy_req:parse_qs(Req),

         Cookies = cowboy_req:parse_cookies(Req),

         doit(Qs, Cookies)

     catch

         exit:{request_error, qs, _} ->

             bad_qs(Req);

         exit:{request_error, {header, <<"cookie">>}, _} ->

             bad_cookie(Req)

     end

 

Write for the happy path and handle all errors I care about in the same

place. Goodbye nested cases for error handling!

 

I have also been using exceptions in a "different" way for parsing

Asciidoc files. Asciidoc input is *always* correct, there can not be a

malformed Asciidoc file (as far as parsing is concerned). When input

looks wrong it's a paragraph.

 

I can therefore in that case simply write functions for parsing each

possible elements, and try them one by one on the input until I find a

parsing function that doesn't crash. If it doesn't crash, then that

means I found the type of block for the input. If it crashes, I try the

next type of block.

 

So I have a function like this defining the block types:

 

block(St)

     skip(fun empty_line/1, St),

     oneof([

         fun eof/1,

         %% Section titles.

         fun section_title/1,

         fun long_section_title/1,

         %% Block macros.

         fun block_id/1,

         fun comment_line/1,

         fun block_macro/1,

         %% Lists.

         fun bulleted_list/1,

         fun numbered_list/1,

...

 

And then one of those parse functions would be like this for example:

 

comment_line(St)

     «"//", C, Comment0/bits» = read_line(St),

     true = ?IS_WS(C),

     Comment = trim(Comment0),

     %% Good!

     {comment_line, #{}, Comment, ann(St)}.

 

If it crashes, then it's not a comment line!

 

The oneof function is of course defined like this:

 

oneof([], St)

     throw({error, St}); %% @todo

oneof([Parse|Tail], St=#state{reader=ReaderPid})

     Ln = asciideck_line_reader:get_position(ReaderPid),

     try

         Parse(St)

     catch _:_

         asciideck_line_reader:set_position(ReaderPid, Ln),

         oneof(Tail, St)

     end.

 

This allows me to do some parsec-like parsing by abusing exceptions. But

the great thing about it is that I don't need to worry about error

handling here again, I just try calling parse functions until one

doesn't crash.

 

So to go back to the topic at hand, I would say forget about the

distinction between must and may, and truly embrace "happy path"

programming and make smart use of exceptions. Deal with errors in one

place instead of having nested cases/many functions. There are of course

other ways to do this, but only exceptions let you do this both in the

local process and in a separate process, depending on your needs.

 

(I will now expect horrified replies from purists. Do not disappoint.)

 

Cheers,

 

--

Loïc Hoguin

https://ninenines.eu

_______________________________________________

erlang-questions mailing list

[hidden email]

http://erlang.org/mailman/listinfo/erlang-questions

 


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Must and May convention

zxq9-2
In reply to this post by Loïc Hoguin-3
On 2017年09月27日 水曜日 12:46:19 Loïc Hoguin wrote:

> On 09/27/2017 11:08 AM, Joe Armstrong wrote:
> > For several years I've been using a convention in my hobby
> > projects. It's what I call the must-may convention.
> >
> > I'm wondering if it should be widely used.
> >
> > What is it?
> >
> > There are two commonly used conventions for handling bad arguments to
> > a function. We can return {ok, Val} or {error, Reason} or we can
> > return a value if the arguments are correct, and raise an exception
> > otherwise.
> >
> > The problem is that when I read code and see a function like
> > 'foo:bar(a,12)' I have no idea if it obeys one of these conventions or
> > does something completely different. I have to read the code to find
> > out.
> >
> > My convention is to prefix the function name with 'must_' or 'may_'
>
> I've been debating this in my head for a long time. I came to the
> conclusion that 99% of the time I do not want to handle errors.
> Therefore 99% of the functions should not return an error.

Taking this observation a step further...

I've got a guideline that has never made it into English yet (along with some coding guidelines and a few other things I should re-write...) that states that programs must always be refactored iteratively to aggregate side effects where possible and leave as much code functionally pure as can me managed.

The Rules:
- Pure functions are always crashers.
- Side-effecty functions retun the type `{ok, Value} | {error, Reason}`
- A side effect is anything that touches a resource external to the current function.

Some programs are full of side effects -- doing lots of network and file system I/O while running a GUI. Others are not so side-effecty. The case where you REALLY get viral side-effect proliferation is use of ETS tables (shared binaries is actually another case, but not included in the rule because the abstraction generally holds well enough). But even in these cases we can usually break the pure bits out somewhat cleanly, at least once we understand what the program really needs to do.

That bit right there, "understand what the program really needs to do", is the truly hard part of getting any of this right. Or anything right.

When a project starts from scratch you don't understand the details yet, otherwise typing speed would equate to development time and that's just never the case. So we start out with a very high proportion of {ok, V} | {error, R} type functions initially because we don't know anything about anything and side effects wind up getting scattered about because we just didn't have a very clear view of what was going on. When inheriting messy, legacy code you understand even LESS because you don't understand what the program should do and you also don't understand whatever it is currently doing until you diddle with it a bit.

And that's totally OK.

But only at first.

That's just to get us over the hump so that something works, the task is handled, and if a bus hit That One Guy tomorrow then we could continue along and at least have something running.

To avert a lifetime of nightmares, lost hair and broken marriages due to a death-march-style maintenance cycle, though, we pre-emptively attack the program again with a refactoring effort aimed specifically at unifying types and side-effect hygiene. It is common that you'll have two flavors of basically the same thing in different areas, especially if you've got more than two people working on the project. That's got to get fixed. It is also common, as noted above, that side effects are scattered about for various reasons.

Once we've shaken the easy bits out we sometimes add a list of pure functions to the top of each module as a module attribute:

-pure([foo/1, bar/2, baz/0]).

Those should not only be pure, provable, and excellent targets for the initial property testing effort to get warmed up on, but are also known to crash when they get bad inputs. And of course everything should, by this point, Dialyze cleanly. Also, it isn't impossible to write a tool that keeps track of impure calls and checks that the pure functions only ever make calls to other pure functions (vast swathes of the stdlib define abstract data types, and nearly all of these calls are pure).

What are the impure functions?

The service loop is always impure -- it deals with the mailbox. Socket handling functions (which may be the service loop as well). Anything that writes to a file. Anything that sends a message. Interface functions that wrap message passing. Anything that talks to wx or console I/O. Etc.

The outcome is that side effects traditionally get collected in:
- Interface functions
- Service loops (and mini-service loops / service states)
- File I/O wrapper
- Socket I/O wrapper
- User interfaec I/O wrapper

The last three are "wrappers" because by writing a wrapper we give ourselves a place to aggregate side effecty calls instead of making them in deeper code (out at the edges of the call graph). A message may come over a socket or into the service loop that requires some processing and then writing to a file, for example, but this doesn't mean that we write to the file out there at the bottom of the call graph. Instead, we call to get the processing done, then return the value back into the service loop (or somewhere close to it, like a message interpreter function), and then the next line will be a call to either write the file or a call to a file writer that is known to be side-effecty.

Just about everything else can be pure. Most of the time. (Of course, "processing a value" may involve network communication, or RPC, or asking some other stateful process to do the processing, and any of these can prevent a function from being pure. But it is rare that these are the majority of functions in a program.) That means almost everything can be written as a crashable function -- because the ones that return {ok, V} | {error, R} should have already known what they were dealing with before they called the pure functions.

One side effect of this overall process is that, at least in writing customer facing software, we discover errors straight away and fix them. Most of the bugs are the really simple kind:

"If I enter a number for a name, the window disappears and reappears with empty fields."
(The windows process crashed and restarted back where it was.)

or, more often

"If I enter a number as a name the name disappears after I click the 'submit' button."
(Something deeper in the system crashed and the final update to the GUI was never sent.)

We IMMEDIATELY know that we didn't type check there properly and some other part of the code died with the "bad" data once it was noticed -- and the user just saw a momentary hiccup and fixed whatever was wrong on their own. So this wasn't the end of the world or a big scary red X showing up on the screen with mysterious numbers and inscribed error messages or whatever. But it WAS bad and unexpected behavior for the most important person in the program's universe. A quick check of the crash log bears out what we thought, and that problem is from then on handled properly and never heard from again.

When this sort of problem becomes really confusing to debug is the cases where we've gotten too fancy with exception handling and played loose with types. That input value may have traveled quite far into the system before something died, and figuring it out is a bit more tricky then without a dead-obvious crash message letting you know about it.

Blah blah blah...

We are all looking at roughly the same things here. Joe likes to prefix function names. That's probably a good system, but it doesn't work well for people who use autocompletion (people still do that?). Is that a tooling conflict? Aren't Joe's function names THEMSELVES a sort of tool? How about the -pure declaration? That's great -- but what we really want, actually, is a way to declare a function pure so that Dialyzer could know about it, as part of the -spec for a function. That would be awesome. What happens for us is that functions near the top of a module tend to be side-effecty and functions at the bottom tend to be pure -- so we just sort of know what terrain we are navigating because we know the layout that results as an outcome of following our little side-effect focused refactoring. Also, in documentation we know the difference immediatly because of our own return typing convention: anything that returns naked values is a crasher, period.

It looks like none of the approaches is particularly perfect, though. I really wish Dialyzer accepted (and checked) explicit declarations of purity. I don't know what syntax would be good for this, but its something I would like to have. Also -- it would allow for people to maybe use their pure functions in guards, which is a frequent request I hear come up quite a bit.

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Must and May convention

Vans S
I dont think its been mentioned, elixir does this with the ! postfix.

{ok, Bin} | {error, Reason} = File:read("my_file")

Bin | throw/1 = File:read!("my_file")


Exactly as you said Mr. Armstrong, the former is more clear off the bat but the latter gives you a nice error (with the filename!).

Which do I prefer?  It seems both are useful in certain cases, and one should not replace the other as the absolute truth. If 
an absolute truth were to be arrived at, then I would vote for the option to have both! A way to call any STDLIB function and have it return a tuple, or throw/exception.



On Wednesday, September 27, 2017 9:51 PM, zxq9 <[hidden email]> wrote:


On 2017年09月27日 水曜日 12:46:19 Loïc Hoguin wrote:

> On 09/27/2017 11:08 AM, Joe Armstrong wrote:
> > For several years I've been using a convention in my hobby
> > projects. It's what I call the must-may convention.
> >
> > I'm wondering if it should be widely used.
> >
> > What is it?
> >
> > There are two commonly used conventions for handling bad arguments to
> > a function. We can return {ok, Val} or {error, Reason} or we can
> > return a value if the arguments are correct, and raise an exception
> > otherwise.
> >
> > The problem is that when I read code and see a function like
> > 'foo:bar(a,12)' I have no idea if it obeys one of these conventions or
> > does something completely different. I have to read the code to find
> > out.
> >
> > My convention is to prefix the function name with 'must_' or 'may_'
>
> I've been debating this in my head for a long time. I came to the
> conclusion that 99% of the time I do not want to handle errors.
> Therefore 99% of the functions should not return an error.

Taking this observation a step further...

I've got a guideline that has never made it into English yet (along with some coding guidelines and a few other things I should re-write...) that states that programs must always be refactored iteratively to aggregate side effects where possible and leave as much code functionally pure as can me managed.

The Rules:
- Pure functions are always crashers.
- Side-effecty functions retun the type `{ok, Value} | {error, Reason}`
- A side effect is anything that touches a resource external to the current function.

Some programs are full of side effects -- doing lots of network and file system I/O while running a GUI. Others are not so side-effecty. The case where you REALLY get viral side-effect proliferation is use of ETS tables (shared binaries is actually another case, but not included in the rule because the abstraction generally holds well enough). But even in these cases we can usually break the pure bits out somewhat cleanly, at least once we understand what the program really needs to do.

That bit right there, "understand what the program really needs to do", is the truly hard part of getting any of this right. Or anything right.

When a project starts from scratch you don't understand the details yet, otherwise typing speed would equate to development time and that's just never the case. So we start out with a very high proportion of {ok, V} | {error, R} type functions initially because we don't know anything about anything and side effects wind up getting scattered about because we just didn't have a very clear view of what was going on. When inheriting messy, legacy code you understand even LESS because you don't understand what the program should do and you also don't understand whatever it is currently doing until you diddle with it a bit.

And that's totally OK.

But only at first.

That's just to get us over the hump so that something works, the task is handled, and if a bus hit That One Guy tomorrow then we could continue along and at least have something running.

To avert a lifetime of nightmares, lost hair and broken marriages due to a death-march-style maintenance cycle, though, we pre-emptively attack the program again with a refactoring effort aimed specifically at unifying types and side-effect hygiene. It is common that you'll have two flavors of basically the same thing in different areas, especially if you've got more than two people working on the project. That's got to get fixed. It is also common, as noted above, that side effects are scattered about for various reasons.

Once we've shaken the easy bits out we sometimes add a list of pure functions to the top of each module as a module attribute:

-pure([foo/1, bar/2, baz/0]).

Those should not only be pure, provable, and excellent targets for the initial property testing effort to get warmed up on, but are also known to crash when they get bad inputs. And of course everything should, by this point, Dialyze cleanly. Also, it isn't impossible to write a tool that keeps track of impure calls and checks that the pure functions only ever make calls to other pure functions (vast swathes of the stdlib define abstract data types, and nearly all of these calls are pure).

What are the impure functions?

The service loop is always impure -- it deals with the mailbox. Socket handling functions (which may be the service loop as well). Anything that writes to a file. Anything that sends a message. Interface functions that wrap message passing. Anything that talks to wx or console I/O. Etc.

The outcome is that side effects traditionally get collected in:
- Interface functions
- Service loops (and mini-service loops / service states)
- File I/O wrapper
- Socket I/O wrapper
- User interfaec I/O wrapper

The last three are "wrappers" because by writing a wrapper we give ourselves a place to aggregate side effecty calls instead of making them in deeper code (out at the edges of the call graph). A message may come over a socket or into the service loop that requires some processing and then writing to a file, for example, but this doesn't mean that we write to the file out there at the bottom of the call graph. Instead, we call to get the processing done, then return the value back into the service loop (or somewhere close to it, like a message interpreter function), and then the next line will be a call to either write the file or a call to a file writer that is known to be side-effecty.

Just about everything else can be pure. Most of the time. (Of course, "processing a value" may involve network communication, or RPC, or asking some other stateful process to do the processing, and any of these can prevent a function from being pure. But it is rare that these are the majority of functions in a program.) That means almost everything can be written as a crashable function -- because the ones that return {ok, V} | {error, R} should have already known what they were dealing with before they called the pure functions.

One side effect of this overall process is that, at least in writing customer facing software, we discover errors straight away and fix them. Most of the bugs are the really simple kind:

"If I enter a number for a name, the window disappears and reappears with empty fields."
(The windows process crashed and restarted back where it was.)

or, more often

"If I enter a number as a name the name disappears after I click the 'submit' button."
(Something deeper in the system crashed and the final update to the GUI was never sent.)

We IMMEDIATELY know that we didn't type check there properly and some other part of the code died with the "bad" data once it was noticed -- and the user just saw a momentary hiccup and fixed whatever was wrong on their own. So this wasn't the end of the world or a big scary red X showing up on the screen with mysterious numbers and inscribed error messages or whatever. But it WAS bad and unexpected behavior for the most important person in the program's universe. A quick check of the crash log bears out what we thought, and that problem is from then on handled properly and never heard from again.

When this sort of problem becomes really confusing to debug is the cases where we've gotten too fancy with exception handling and played loose with types. That input value may have traveled quite far into the system before something died, and figuring it out is a bit more tricky then without a dead-obvious crash message letting you know about it.

Blah blah blah...

We are all looking at roughly the same things here. Joe likes to prefix function names. That's probably a good system, but it doesn't work well for people who use autocompletion (people still do that?). Is that a tooling conflict? Aren't Joe's function names THEMSELVES a sort of tool? How about the -pure declaration? That's great -- but what we really want, actually, is a way to declare a function pure so that Dialyzer could know about it, as part of the -spec for a function. That would be awesome. What happens for us is that functions near the top of a module tend to be side-effecty and functions at the bottom tend to be pure -- so we just sort of know what terrain we are navigating because we know the layout that results as an outcome of following our little side-effect focused refactoring. Also, in documentation we know the difference immediatly because of our own return typing convention: anything that returns naked values is a crasher, period.

It looks like none of the approaches is particularly perfect, though. I really wish Dialyzer accepted (and checked) explicit declarations of purity. I don't know what syntax would be good for this, but its something I would like to have. Also -- it would allow for people to maybe use their pure functions in guards, which is a frequent request I hear come up quite a bit.

-Craig

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Must and May convention

zxq9-2
On 2017年09月28日 木曜日 05:43:33 you wrote:

> I dont think its been mentioned, elixir does this with the ! postfix.
>
> {ok, Bin} | {error, Reason} = File:read("my_file")
>
> Bin | throw/1 = File:read!("my_file")
>
>
> Exactly as you said Mr. Armstrong, the former is more clear off the bat but the latter gives you a nice error (with the filename!).
>
> Which do I prefer?  It seems both are useful in certain cases, and one should not replace the other as the absolute truth. If
> an absolute truth were to be arrived at, then I would vote for the option to have both! A way to call any STDLIB function and have it return a tuple, or throw/exception.

Elixir may do that, but I think adding a magical glyph to the syntax is almost universally a mistake. Not because this one case is right or wrong, but because the thought process underlying it causes languages to eventually suffocate themselves -- as you may have begun noticing... And no, adding a bang at the end of a function call is NOT the same as naming two separate functions, unless bangs are acceptable characters to unclude in function names (are they?).

If they are, some fun names might result:

  launch!the!missiles!(BadGuys)

  clean_your_room!dammit!(Kids)

If they are legal characters, though, then what you are referring to is a mere coding convention -- arguably of less utility that matching wrapped values VS receiving naked ones, which has a concrete runtime effect (and is understood by the type system).

If they are not legal characters and are actually a part of the core language (separate from the type system language) then... that's something I think should be contemplated for a HUGE amount of time before dropping into the language (things like that should be argued for years on end before just being picked as some new idea -- unless it is a research language, then, meh).

That said, the point I was making is not that we should have both because everything is all the same and all functions should be treated equally and we really really need more diversity in the way we handle function call returns. The point I was making is that some functions are CLEARLY PURE and that is just an innate part of their nature. Other functions have side effects and, systems using runtime prayers/monadic returns aside, there is just no way to make a side effecty function pure.

On that note, in a practical sense, when we return {Reply, NewState} in a gen_server we are sort of doing what IO monads do for Haskell. There is NO GUARANTEE from the perspective of the called function that the Reply value will actually be returned to the caller. That is obviously the intent, but it is by no means clear. This property also makes it very convenient to hook such functions up to a property tester. On the other hand, the tuple returned itself can be viewed as a naked value.

Returning naked values mandates purity or total crashability of the system. Nothing in between. Unless you want to play "confuse the new guy".

When we have side effects, such as file operations, there is no difference between calling

  {ok, Bin} = file:read_file(Path)

and

  Bin = file!:read_file(Path)

That's a frivolous reason to add a glyph to the language. The assertion at the time of return already crashes on an error.

And no, you don't "need both". That's WHY the return value is matched in the first place -- to leave it up to the caller how they want to handle a surprise {error, Reason}.

Knowing that you are going to get a wrapped value is exactly how you know you are dealing with side effects somewhere, rendering both the return value issue moot and the question of whether the thing you're calling is pure or has a side effect.

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Must and May convention

Karlo Kuna
In reply to this post by zxq9-2
I really wish Dialyzer accepted (and checked) explicit declarations of purity.

i could not agree more 
that would be useful feature and amazing time saver!

I am currently working on a toll that creates DB of function properties, and motivation was exactly finding non pure functions in any given 
project. 

On Thu, Sep 28, 2017 at 3:50 AM, zxq9 <[hidden email]> wrote:
On 2017年09月27日 水曜日 12:46:19 Loïc Hoguin wrote:
> On 09/27/2017 11:08 AM, Joe Armstrong wrote:
> > For several years I've been using a convention in my hobby
> > projects. It's what I call the must-may convention.
> >
> > I'm wondering if it should be widely used.
> >
> > What is it?
> >
> > There are two commonly used conventions for handling bad arguments to
> > a function. We can return {ok, Val} or {error, Reason} or we can
> > return a value if the arguments are correct, and raise an exception
> > otherwise.
> >
> > The problem is that when I read code and see a function like
> > 'foo:bar(a,12)' I have no idea if it obeys one of these conventions or
> > does something completely different. I have to read the code to find
> > out.
> >
> > My convention is to prefix the function name with 'must_' or 'may_'
>
> I've been debating this in my head for a long time. I came to the
> conclusion that 99% of the time I do not want to handle errors.
> Therefore 99% of the functions should not return an error.

Taking this observation a step further...

I've got a guideline that has never made it into English yet (along with some coding guidelines and a few other things I should re-write...) that states that programs must always be refactored iteratively to aggregate side effects where possible and leave as much code functionally pure as can me managed.

The Rules:
- Pure functions are always crashers.
- Side-effecty functions retun the type `{ok, Value} | {error, Reason}`
- A side effect is anything that touches a resource external to the current function.

Some programs are full of side effects -- doing lots of network and file system I/O while running a GUI. Others are not so side-effecty. The case where you REALLY get viral side-effect proliferation is use of ETS tables (shared binaries is actually another case, but not included in the rule because the abstraction generally holds well enough). But even in these cases we can usually break the pure bits out somewhat cleanly, at least once we understand what the program really needs to do.

That bit right there, "understand what the program really needs to do", is the truly hard part of getting any of this right. Or anything right.

When a project starts from scratch you don't understand the details yet, otherwise typing speed would equate to development time and that's just never the case. So we start out with a very high proportion of {ok, V} | {error, R} type functions initially because we don't know anything about anything and side effects wind up getting scattered about because we just didn't have a very clear view of what was going on. When inheriting messy, legacy code you understand even LESS because you don't understand what the program should do and you also don't understand whatever it is currently doing until you diddle with it a bit.

And that's totally OK.

But only at first.

That's just to get us over the hump so that something works, the task is handled, and if a bus hit That One Guy tomorrow then we could continue along and at least have something running.

To avert a lifetime of nightmares, lost hair and broken marriages due to a death-march-style maintenance cycle, though, we pre-emptively attack the program again with a refactoring effort aimed specifically at unifying types and side-effect hygiene. It is common that you'll have two flavors of basically the same thing in different areas, especially if you've got more than two people working on the project. That's got to get fixed. It is also common, as noted above, that side effects are scattered about for various reasons.

Once we've shaken the easy bits out we sometimes add a list of pure functions to the top of each module as a module attribute:

-pure([foo/1, bar/2, baz/0]).

Those should not only be pure, provable, and excellent targets for the initial property testing effort to get warmed up on, but are also known to crash when they get bad inputs. And of course everything should, by this point, Dialyze cleanly. Also, it isn't impossible to write a tool that keeps track of impure calls and checks that the pure functions only ever make calls to other pure functions (vast swathes of the stdlib define abstract data types, and nearly all of these calls are pure).

What are the impure functions?

The service loop is always impure -- it deals with the mailbox. Socket handling functions (which may be the service loop as well). Anything that writes to a file. Anything that sends a message. Interface functions that wrap message passing. Anything that talks to wx or console I/O. Etc.

The outcome is that side effects traditionally get collected in:
- Interface functions
- Service loops (and mini-service loops / service states)
- File I/O wrapper
- Socket I/O wrapper
- User interfaec I/O wrapper

The last three are "wrappers" because by writing a wrapper we give ourselves a place to aggregate side effecty calls instead of making them in deeper code (out at the edges of the call graph). A message may come over a socket or into the service loop that requires some processing and then writing to a file, for example, but this doesn't mean that we write to the file out there at the bottom of the call graph. Instead, we call to get the processing done, then return the value back into the service loop (or somewhere close to it, like a message interpreter function), and then the next line will be a call to either write the file or a call to a file writer that is known to be side-effecty.

Just about everything else can be pure. Most of the time. (Of course, "processing a value" may involve network communication, or RPC, or asking some other stateful process to do the processing, and any of these can prevent a function from being pure. But it is rare that these are the majority of functions in a program.) That means almost everything can be written as a crashable function -- because the ones that return {ok, V} | {error, R} should have already known what they were dealing with before they called the pure functions.

One side effect of this overall process is that, at least in writing customer facing software, we discover errors straight away and fix them. Most of the bugs are the really simple kind:

"If I enter a number for a name, the window disappears and reappears with empty fields."
(The windows process crashed and restarted back where it was.)

or, more often

"If I enter a number as a name the name disappears after I click the 'submit' button."
(Something deeper in the system crashed and the final update to the GUI was never sent.)

We IMMEDIATELY know that we didn't type check there properly and some other part of the code died with the "bad" data once it was noticed -- and the user just saw a momentary hiccup and fixed whatever was wrong on their own. So this wasn't the end of the world or a big scary red X showing up on the screen with mysterious numbers and inscribed error messages or whatever. But it WAS bad and unexpected behavior for the most important person in the program's universe. A quick check of the crash log bears out what we thought, and that problem is from then on handled properly and never heard from again.

When this sort of problem becomes really confusing to debug is the cases where we've gotten too fancy with exception handling and played loose with types. That input value may have traveled quite far into the system before something died, and figuring it out is a bit more tricky then without a dead-obvious crash message letting you know about it.

Blah blah blah...

We are all looking at roughly the same things here. Joe likes to prefix function names. That's probably a good system, but it doesn't work well for people who use autocompletion (people still do that?). Is that a tooling conflict? Aren't Joe's function names THEMSELVES a sort of tool? How about the -pure declaration? That's great -- but what we really want, actually, is a way to declare a function pure so that Dialyzer could know about it, as part of the -spec for a function. That would be awesome. What happens for us is that functions near the top of a module tend to be side-effecty and functions at the bottom tend to be pure -- so we just sort of know what terrain we are navigating because we know the layout that results as an outcome of following our little side-effect focused refactoring. Also, in documentation we know the difference immediatly because of our own return typing convention: anything that returns naked values is a crasher, period.

It looks like none of the approaches is particularly perfect, though. I really wish Dialyzer accepted (and checked) explicit declarations of purity. I don't know what syntax would be good for this, but its something I would like to have. Also -- it would allow for people to maybe use their pure functions in guards, which is a frequent request I hear come up quite a bit.

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Must and May convention

zxq9-2
On 2017年09月28日 木曜日 10:01:12 you wrote:
> > I really wish Dialyzer accepted (and checked) explicit declarations of
> purity.
>
> i could not agree more
> that would be useful feature and amazing time saver!
>
> I am currently working on a toll that creates DB of function properties,
> and motivation was exactly finding non pure functions in any given
> project.

I've messed around with this a bit and, not liking syntaxtic additions
(for the most part), I've played around a little with this idea. The one
I've come up with that allows building a checkable graph is what I'm doing
already:

-pure([f/1, g/0, h/3]).

So that works just like an -export attribute and when the compiler rolls
over it you actually get a nice list in module_info:

1> zuuid:module_info(attributes).
[{vsn,[161185231735429547750212483364357911358]},
 {author,"Craig Everett <[hidden email]>"},
 {behavior,[application]},
 {pure,[{v3,1},
        {v3,2},
        {v3_hash,2},
        {v5,1},
        {v5,2},
        {v5_hash,2},
        {read_uuid,1},
        {read_uuid_string,1},
        {read_mac,1},
        {read_mac_string,1},
        {string,1},
        {string,2},
        {binary,1},
        {binary,2},
        {strhexs_to_uuid,1},
        {strhexs_to_mac,1},
        {strhexs_to_integers,1},
        {bins_to_strhexs,1},
        {binary_to_strhex,1}]}]

Quite easy to build a graph around this sort of data. And it comes only at
the cost of actually including a -pure declaration.

The problem, of course, is actually making a -pure declaration and keeping
it in sync with the module code over time -- and that this is invisible to
Dialyzer right now.

That said, if no unsafe calls or actions are taken in a function Dialyzer
could infer which functions are pure and help generate such a list. Even
better, of course would be if it knew the difference and examined each
function to build the graph of pureness automatically...

But I digress.

Save the (maybe not easy) task of making Dialyzer able to infer purity
(and this is impossible anyway when Dialyzer hits a wall of ambiguity
such as a call to M:F(A) or apply(M, F, A) -- which are pretty important!),
it would even nicer if we had a pure function spec declaration form.

-pure f() -> term().

And just leave the original

-spec f() -> {ok, Value :: term()} | {error, Reason :: term()}.

form alone.

That shouldn't break any old code, and leave a safe path to updating the
stdlib and internals of existing projects... without anything significant
changing until people are ready for it.

And no new syntax.

Where a new bit of syntax may be nice is if, for example, a way to declare
*what* side effects a function has or might have. I haven't any good idea
how to go about that because I've never thought of it or seen a system that
declares categories of side-effects... but its an interesting idea that
might help make "unit testing" of modules that have side effects actually
mean something (for once) and move it closer to the usefulness of actual
user testing (which is amazing at finding the boneheaded, easy to fix,
90% of bugs that are concrete and repeatable that unit tests are for some
reason consistently blind to).

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Must and May convention

Roman Galeev
> -pure([f/1, g/0, h/3]).

So, you declare f/1 is pure, but how do you really know it is pure? Meaning it should be a way to prove f/1 is pure, at least with some tool like dialyzer.

On Thu, Sep 28, 2017 at 10:27 AM, zxq9 <[hidden email]> wrote:
On 2017年09月28日 木曜日 10:01:12 you wrote:
> > I really wish Dialyzer accepted (and checked) explicit declarations of
> purity.
>
> i could not agree more
> that would be useful feature and amazing time saver!
>
> I am currently working on a toll that creates DB of function properties,
> and motivation was exactly finding non pure functions in any given
> project.

I've messed around with this a bit and, not liking syntaxtic additions
(for the most part), I've played around a little with this idea. The one
I've come up with that allows building a checkable graph is what I'm doing
already:

-pure([f/1, g/0, h/3]).

So that works just like an -export attribute and when the compiler rolls
over it you actually get a nice list in module_info:

1> zuuid:module_info(attributes).
[{vsn,[161185231735429547750212483364357911358]},
 {author,"Craig Everett <[hidden email]>"},
 {behavior,[application]},
 {pure,[{v3,1},
        {v3,2},
        {v3_hash,2},
        {v5,1},
        {v5,2},
        {v5_hash,2},
        {read_uuid,1},
        {read_uuid_string,1},
        {read_mac,1},
        {read_mac_string,1},
        {string,1},
        {string,2},
        {binary,1},
        {binary,2},
        {strhexs_to_uuid,1},
        {strhexs_to_mac,1},
        {strhexs_to_integers,1},
        {bins_to_strhexs,1},
        {binary_to_strhex,1}]}]

Quite easy to build a graph around this sort of data. And it comes only at
the cost of actually including a -pure declaration.

The problem, of course, is actually making a -pure declaration and keeping
it in sync with the module code over time -- and that this is invisible to
Dialyzer right now.

That said, if no unsafe calls or actions are taken in a function Dialyzer
could infer which functions are pure and help generate such a list. Even
better, of course would be if it knew the difference and examined each
function to build the graph of pureness automatically...

But I digress.

Save the (maybe not easy) task of making Dialyzer able to infer purity
(and this is impossible anyway when Dialyzer hits a wall of ambiguity
such as a call to M:F(A) or apply(M, F, A) -- which are pretty important!),
it would even nicer if we had a pure function spec declaration form.

-pure f() -> term().

And just leave the original

-spec f() -> {ok, Value :: term()} | {error, Reason :: term()}.

form alone.

That shouldn't break any old code, and leave a safe path to updating the
stdlib and internals of existing projects... without anything significant
changing until people are ready for it.

And no new syntax.

Where a new bit of syntax may be nice is if, for example, a way to declare
*what* side effects a function has or might have. I haven't any good idea
how to go about that because I've never thought of it or seen a system that
declares categories of side-effects... but its an interesting idea that
might help make "unit testing" of modules that have side effects actually
mean something (for once) and move it closer to the usefulness of actual
user testing (which is amazing at finding the boneheaded, easy to fix,
90% of bugs that are concrete and repeatable that unit tests are for some
reason consistently blind to).

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



--
With best regards,
     Roman Galeev,
     +420 702 817 968

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Must and May convention

Karlo Kuna
In reply to this post by zxq9-2
i have slightly different approach: 

i don't want any new syntax right now so i'm marking side effecting construct in erlang 
for example: put/2, and send ( ! ) i consider side effecting so they go in seed list [{M:F:Arity}] 
then i go and try to infer which functions in OTP sources are pure or not and put that information in DB
and then i can consult DB for projects that use OTP stdlib and such, update DB with these and go up

it is bottom up approach but it shows promise. I am aware that BIFs nad NIFs are problem right now 
and -pure declarations could help there a lot! 

also it would be extremely nice to have a way to declare and classify side effects, i'm also looking into that 
but at this time have no concrete solutions  

On Thu, Sep 28, 2017 at 10:27 AM, zxq9 <[hidden email]> wrote:
On 2017年09月28日 木曜日 10:01:12 you wrote:
> > I really wish Dialyzer accepted (and checked) explicit declarations of
> purity.
>
> i could not agree more
> that would be useful feature and amazing time saver!
>
> I am currently working on a toll that creates DB of function properties,
> and motivation was exactly finding non pure functions in any given
> project.

I've messed around with this a bit and, not liking syntaxtic additions
(for the most part), I've played around a little with this idea. The one
I've come up with that allows building a checkable graph is what I'm doing
already:

-pure([f/1, g/0, h/3]).

So that works just like an -export attribute and when the compiler rolls
over it you actually get a nice list in module_info:

1> zuuid:module_info(attributes).
[{vsn,[161185231735429547750212483364357911358]},
 {author,"Craig Everett <[hidden email]>"},
 {behavior,[application]},
 {pure,[{v3,1},
        {v3,2},
        {v3_hash,2},
        {v5,1},
        {v5,2},
        {v5_hash,2},
        {read_uuid,1},
        {read_uuid_string,1},
        {read_mac,1},
        {read_mac_string,1},
        {string,1},
        {string,2},
        {binary,1},
        {binary,2},
        {strhexs_to_uuid,1},
        {strhexs_to_mac,1},
        {strhexs_to_integers,1},
        {bins_to_strhexs,1},
        {binary_to_strhex,1}]}]

Quite easy to build a graph around this sort of data. And it comes only at
the cost of actually including a -pure declaration.

The problem, of course, is actually making a -pure declaration and keeping
it in sync with the module code over time -- and that this is invisible to
Dialyzer right now.

That said, if no unsafe calls or actions are taken in a function Dialyzer
could infer which functions are pure and help generate such a list. Even
better, of course would be if it knew the difference and examined each
function to build the graph of pureness automatically...

But I digress.

Save the (maybe not easy) task of making Dialyzer able to infer purity
(and this is impossible anyway when Dialyzer hits a wall of ambiguity
such as a call to M:F(A) or apply(M, F, A) -- which are pretty important!),
it would even nicer if we had a pure function spec declaration form.

-pure f() -> term().

And just leave the original

-spec f() -> {ok, Value :: term()} | {error, Reason :: term()}.

form alone.

That shouldn't break any old code, and leave a safe path to updating the
stdlib and internals of existing projects... without anything significant
changing until people are ready for it.

And no new syntax.

Where a new bit of syntax may be nice is if, for example, a way to declare
*what* side effects a function has or might have. I haven't any good idea
how to go about that because I've never thought of it or seen a system that
declares categories of side-effects... but its an interesting idea that
might help make "unit testing" of modules that have side effects actually
mean something (for once) and move it closer to the usefulness of actual
user testing (which is amazing at finding the boneheaded, easy to fix,
90% of bugs that are concrete and repeatable that unit tests are for some
reason consistently blind to).

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Fwd: Must and May convention

Karlo Kuna
In reply to this post by Joe Armstrong-2

> -pure([f/1, g/0, h/3]).

So, you declare f/1 is pure, but how do you really know it is pure? Meaning it should be a way to prove f/1 is pure, at least with some tool like dialyzer

if f/1 is NIF there is little hope of proving that 
so in general i think ti should be either provable (meaning it is not NIF and uses only pure functions) or declared as such 

i think we should go with "trust" here. If function is NIF it should be declared as pure, otherwise tool is to assume it is not 

On Thu, Sep 28, 2017 at 10:41 AM, Roman Galeev <[hidden email]> wrote:
> -pure([f/1, g/0, h/3]).

So, you declare f/1 is pure, but how do you really know it is pure? Meaning it should be a way to prove f/1 is pure, at least with some tool like dialyzer.

On Thu, Sep 28, 2017 at 10:27 AM, zxq9 <[hidden email]> wrote:
On 2017年09月28日 木曜日 10:01:12 you wrote:
> > I really wish Dialyzer accepted (and checked) explicit declarations of
> purity.
>
> i could not agree more
> that would be useful feature and amazing time saver!
>
> I am currently working on a toll that creates DB of function properties,
> and motivation was exactly finding non pure functions in any given
> project.

I've messed around with this a bit and, not liking syntaxtic additions
(for the most part), I've played around a little with this idea. The one
I've come up with that allows building a checkable graph is what I'm doing
already:

-pure([f/1, g/0, h/3]).

So that works just like an -export attribute and when the compiler rolls
over it you actually get a nice list in module_info:

1> zuuid:module_info(attributes).
[{vsn,[161185231735429547750212483364357911358]},
 {author,"Craig Everett <[hidden email]>"},
 {behavior,[application]},
 {pure,[{v3,1},
        {v3,2},
        {v3_hash,2},
        {v5,1},
        {v5,2},
        {v5_hash,2},
        {read_uuid,1},
        {read_uuid_string,1},
        {read_mac,1},
        {read_mac_string,1},
        {string,1},
        {string,2},
        {binary,1},
        {binary,2},
        {strhexs_to_uuid,1},
        {strhexs_to_mac,1},
        {strhexs_to_integers,1},
        {bins_to_strhexs,1},
        {binary_to_strhex,1}]}]

Quite easy to build a graph around this sort of data. And it comes only at
the cost of actually including a -pure declaration.

The problem, of course, is actually making a -pure declaration and keeping
it in sync with the module code over time -- and that this is invisible to
Dialyzer right now.

That said, if no unsafe calls or actions are taken in a function Dialyzer
could infer which functions are pure and help generate such a list. Even
better, of course would be if it knew the difference and examined each
function to build the graph of pureness automatically...

But I digress.

Save the (maybe not easy) task of making Dialyzer able to infer purity
(and this is impossible anyway when Dialyzer hits a wall of ambiguity
such as a call to M:F(A) or apply(M, F, A) -- which are pretty important!),
it would even nicer if we had a pure function spec declaration form.

-pure f() -> term().

And just leave the original

-spec f() -> {ok, Value :: term()} | {error, Reason :: term()}.

form alone.

That shouldn't break any old code, and leave a safe path to updating the
stdlib and internals of existing projects... without anything significant
changing until people are ready for it.

And no new syntax.

Where a new bit of syntax may be nice is if, for example, a way to declare
*what* side effects a function has or might have. I haven't any good idea
how to go about that because I've never thought of it or seen a system that
declares categories of side-effects... but its an interesting idea that
might help make "unit testing" of modules that have side effects actually
mean something (for once) and move it closer to the usefulness of actual
user testing (which is amazing at finding the boneheaded, easy to fix,
90% of bugs that are concrete and repeatable that unit tests are for some
reason consistently blind to).

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



--
With best regards,
     Roman Galeev,
     <a href="tel:+420%20702%20817%20968" value="+420702817968" target="_blank">+420 702 817 968

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions




_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Must and May convention

zxq9-2
On 2017年09月28日 木曜日 10:49:21 Karlo Kuna wrote:

> > > -pure([f/1, g/0, h/3]).
> >
> > So, you declare f/1 is pure, but how do you really know it is pure?
> > Meaning it should be a way to prove f/1 is pure, at least with some tool
> > like dialyzer
> >
>
> if f/1 is NIF there is little hope of proving that
> so in general i think ti should be either provable (meaning it is not NIF
> and uses only pure functions) or declared as such
>
> i think we should go with "trust" here. If function is NIF it should be
> declared as pure, otherwise tool is to assume it is not

Right. This is actually the core issue with Dialyzer in general, because
it is a permissive typer that assumes anything is acceptable unless it
can prove by inference that something is impossible OR the user has
written a typespec to guide it and specify the wrongness. Those
specifications are essentially arbitrary in most cases.

This is the exact opposite approach used by GHC, for example. But for
the way Erlang and OTP work at their core it is necessary to have
permissive type checks guided by annotations.

NOTHING SAYS THOSE ANNOTATIONS MIGHT NOT BE WRONG.

This is also true of -pure declarations.

What you CAN prove, however, is that no definitely IMPURE functions
are called from within a function marked as -pure.

Which means Dialyzer would emit an error on any -pure function calling
a function not also considered -pure.

As noted before, much of the stdlib is pure, so quite a lot of code
could become provably pure quite quickly.

No messaging, no external resource calls, no ets, no nifs, certain
functions in io and io_lib are out, etc.

There may be a way to sneak a side effect in despite this, but I think
you could get CLOSE to having a provable graph of pure function calls.

BUT

That is true ONLY IF they are all actually fully qualified calls.
And this is also the problem with trying to make Dialyzer more strict.
(Also, dynamic upgrades are another impossible case to prove.)

Any time you do something like call apply(M, F, A) or Module:foo() you
suddenly can't know for sure what just happened. That module could have
come in dynamically. The only thing Dialyzer can know about it is, outside
of the runtime, the module itself is checked and it has a declaration.
But that's not the same thing as proving what the return type or external
effects of the ambiguous call site might be, so its return handler has
to be specced also. This is pretty much what happens throughout every
gen_server you ever write: the dispatching handle_* functions are quite
ambiguous, but the implementation functions below can be specified and
the interface functions that make the gen_*:call or gen_*:cast calls
can be specified as accepting and returning specific types -- so you
constrain the possibilities tremendously.

This happens every time you write a set of callbacks.
Part of what -callback declarations in behavior defining modules do for
you is typespec a callback -- but you can write callbacks without any
of that (and lots of people do) and that leaves Dialyzer blind. And
consider, of course, that the generic callback declarations for things
like gen_server are deliberately and arbitrarily broad. So to constrain
those you have to guard against type explosions in the interface and
implementation functions.

Nothing is going to fix that but hints like -pure and -spec and -callback.

And that means Dialyzer must continue to be permissive by default and
trust the programmer to be accurate about typespecs in their code.

So "proofing" is not attainable, no matter what, if you mean it in the
strict sense. But seriously, this is an industrial language hellbent on
getting code out the door that solves human problems. So forget about
silly notions of type completeness. That's for academia. In the real world
we have confusing problems and convoluted tools -- this one is
dramatically less convoluted than MOST tools in this domain of solutions
and it can AT LEAST give us a warning when it can prove that you've done
something that is clearly in error.

The kind of super late binding that dynamic module loading and dynamic
call sites imply prohibit proving much about them -- at least in Erlang.
Yet they still work quite well.

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
123