Erlang Syntax and "Patterns" (Again)

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Erlang Syntax and "Patterns" (Again)

Steve Davis
I do believe that Erlang’s syntax is concise and beautiful. You can’t deny the power of the bit syntax, right?

Also, how else do you express ideas in maybe 1/3 of the lines you’d need in any imperative language.

I do have an issue that “strings are just lists of integers” was maybe a wrong path that led to many criticisms. Pattern matching a decode from a binary message is simplified greatly were “strings” defined to be binaries not lists (as I have spoken on before).

I do think maps are really handy (although slightly clumsy), and suspect that frames as proposed by ROK may have been more powerful.

I do believe that Garrett has a great handle on appropriate decomposition and maintainabllity.

Elixir is great to get Ruby-type ppl on board, but I wish they’d write their libraries in Erlang! Maybe one day they’ll “see the light”.

I wish I was able to provide real solutions to any critique above; but they are trivial compared to the power of the platform, because (as we all know) beyond the syntax wrangles there’s OTP and truly outrageous practical applicability enabled by BEAM.

…OK - got a few things off my chest.

Fair assessment?

/s

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Erlang Syntax and "Patterns" (Again)

Richard A. O'Keefe-2


On 17/03/16 1:31 pm, Steve Davis wrote:
> I do have an issue that “strings are just lists of integers” was maybe a wrong path that led to many criticisms. Pattern matching a decode from a binary message is simplified greatly were “strings” defined to be binaries not lists (as I have spoken on before).

Anything unfamiliar, whether right or wrong, is going to "lead to many
criticisms".
Many functional languages have represented strings as lists of atoms or
integers,
and there are two important reasons for this:
  (1) lists have a very large number of operations defined on them and are
     extremely easy to process, so this made strings easy to process
  (2) the space argument (you mean you take 8 bytes per character shock
horror)
     had some force in the days of 4 MB machines; in the days of 21-bit
characters
     and 16 GB machines, it has much less force.  But if space was an issue,
     strings were the wrong data type anyway.  (In fact strings are
wrong any time
     you have structure in the contents that you need to be aware of.)

I don't understand your statement about pattern matching from binary
messages.

It would be absurd to think that ANY abstract data type had to be
represented in
one and ONLY one way.  The classical pointer-to-NUL-terminated-array-of-char
is not the only and not always the best representation for strings in C,
and the
classical list-of-character-codes is not the only and not always the
best representation
of strings in Erlang.  Yawn.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Erlang Syntax and "Patterns" (Again)

Loïc Hoguin-3
In reply to this post by Steve Davis
On 03/17/2016 01:31 AM, Steve Davis wrote:
> Fair assessment?

No.

The key point is that Erlang is unfamiliar and so people are not drawn
to it. Erlang is unfamiliar because of syntax. Erlang is unfamiliar
because it's a concurrent environment. Erlang is unfamiliar because it's
not OO. Erlang is unfamiliar because it's used for writing unfamiliar
applications (how many people are programming databases or distributed
systems? And I don't mean deploying Redis).

I suspect Erlang is unfamiliar also because it comes from Sweden, and
Swedish people are too humble for their own good. It doesn't really
matter that something is the best if you don't make any claims to that
effect. To be noticed you need to make a fool of yourself and exaggerate
how good your product is (and how bad the other products are). And to
make your product familiar you have to do this a lot. And with confidence.

MongoDB is a great example. Even if you never used it you are probably
familiar with it. You probably also know it will lose your data. Your
manager doesn't, though, and that's why you end up using it. Or you
don't believe it will affect you (humans have a hard time accepting that
the worst could happen; software error, computer crash, or even death).
Perhaps you're also just batshit crazy.

None of the details matter. Erlang can be the best, or just good enough,
it will make very little difference. Syntax details only start to matter
when you already decided to learn Erlang, and even then maybe only 5% of
people will give up because of it (that's my hope that only 5% of
developers are that superficial).

All that matters is reaching out to people and trying to appeal to them.
But even the latter is not that important. Bad publicity is also good
publicity.

The best thing that could happen to Erlang right now is for WhatsApp to
have a worldwide outage and to publicly blame it on a flaw in Erlang/OTP
itself.

--
Loïc Hoguin
http://ninenines.eu
Author of The Erlanger Playbook,
A book about software development using Erlang
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Erlang Syntax and "Patterns" (Again)

Richard A. O'Keefe-2
On 17/03/16 2:26 pm, Loïc Hoguin wrote:

>  Erlang is unfamiliar because it's a concurrent environment.

I sometimes think you don't appreciate Erlang until you've struggled
with concurrency
yourself.  Last week I was working on some notes for a course I'm
teaching next
semester, and needed a concurrent component with a certain interface.  I
struggled
for two *days* trying to make it work.  I learned more GDB commands than
I ever
wished to learn.  Eventually I got there.  To make sure I wasn't crazy,
along the
way I implemented in Erlang: 24 line module, 10 line core, first thing I
thought of
worked, and it was *obviously* right.  I keep on having this kind of
experience.

There are all sorts of things where I'd rather use some other language
than Erlang,
but for concurrency, give me Erlang every time.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Erlang Syntax and "Patterns" (Again)

Steve Davis
In reply to this post by Loïc Hoguin-3

> On Mar 16, 2016, at 8:26 PM, Loïc Hoguin <[hidden email]> wrote:
>
> The best thing that could happen to Erlang right now is for WhatsApp to have a worldwide outage and to publicly blame it on a flaw in Erlang/OTP itself.

Given the maturity and coherence of the platform, I suspect that outcome would never happen.

However, I have been able to use WhatsApp to succinctly “sell" Erlang/OTP.

q: but will it scale?
a: WhatsApp
q: WhatsApp?
a: same platform
q: Ah... OK

/s
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Erlang Syntax and "Patterns" (Again)

Steve Davis
In reply to this post by Steve Davis
> ROK said:
> Yawn.
(What am I doing trying to argue with ROK??? Am I MAD?)

1) Why is it people rant about "string handling" in Erlang?

2) Principle of least surprise:
1> [H|T] = [22,87,65,84,33].
[22,87,65,84,33]
2> H.
22
3> T.
"WAT!”

3) A codec should be perfectly reversible i.e. X = encode(decode(X)). Without tagging, merely parsing out a string as a list is not perfectly reversible.

4) What is the right way to implement the function is_string(List) correctly?

*ducks*
/s


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Erlang Syntax and "Patterns" (Again)

Fred Hebert-2
On 03/17, Steve Davis wrote:
>3) A codec should be perfectly reversible i.e. X = encode(decode(X)).
>Without tagging, merely parsing out a string as a list is not perfectly
>reversible.
>
>4) What is the right way to implement the function is_string(List) correctly?
>

Those two are kind of funny tricky because 'String' as a standalone type
does not convery enough information yet.

What you need to be aware of, for example, is encoding: is it ISO-8859-1
(latin1), a flavor of unicode (UTF-8, UTF-16, UTF-32, UCS-* etc.), Other
variants such as CP-1252, plain ASCII, and so on. Erlang lists also let
you specify strings as raw unicode codepoint sequences rather than under
any specific encoding.

Many of these will share the same basic format, such that "Hello,
World!" shows up the same in most of these (if you omit byte-order
marks) such that you cannot *detect* that information from the data, it
has to be carried from the input or specification in most cases.  Then,
once it's in place, you can tag it appropriately or make sure you know
the meaning.

is_string(List) may tell you true or false, but the result there does
not tell you whether you can do anything with it in your libraries,
merge them together, or make sure they have been normalized to fixed
point.

Strings are trickier than that, sadly.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Erlang Syntax and "Patterns" (Again)

Garry Hodgson-2
In reply to this post by Richard A. O'Keefe-2
On 3/16/16 11:28 PM, Richard A. O'Keefe wrote:

> I sometimes think you don't appreciate Erlang until you've struggled
> with concurrency
> yourself.  Last week I was working on some notes for a course I'm
> teaching next
> semester, and needed a concurrent component with a certain interface.  
> I struggled
> for two *days* trying to make it work.  I learned more GDB commands
> than I ever
> wished to learn.  Eventually I got there.  To make sure I wasn't
> crazy, along the
> way I implemented in Erlang: 24 line module, 10 line core, first thing
> I thought of
> worked, and it was *obviously* right.  I keep on having this kind of
> experience.
I can relate. While some of our core components are in Erlang, we use
Java (shudder)
and Python in other parts. Among those other parts are api proxies that
translate
between our standard api's and vendor-specific ones for various security
appliances.
Building those proxies in Python is a joy. They're easy, simple, succinct.

Until you need concurrency, and scale. The threading and multiprocessing
modules
are simple, but you find yourself doing klunky things to solve basic
problems, and
even when it works you worry about what isn't being handled that will
bite you when
things get dodgy. I have confidence that our Erlang code will handle Bad
Things with
grace. Our Python code? Probably/maybe/hope for the best.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Erlang Syntax and "Patterns" (Again)

Steve Davis
In reply to this post by Fred Hebert-2
Indeed! Which is, I think, why they are best left as the “opaque" binaries that they are when parsing… and leaving the final decision about their content to the presentation layer that they are implicitly targeting.

/s

> On Mar 17, 2016, at 7:24 AM, Fred Hebert <[hidden email]> wrote:
>
> On 03/17, Steve Davis wrote:
>> 3) A codec should be perfectly reversible i.e. X = encode(decode(X)).
>> Without tagging, merely parsing out a string as a list is not perfectly
>> reversible.
>>
>> 4) What is the right way to implement the function is_string(List) correctly?
>>
>
> Those two are kind of funny tricky because 'String' as a standalone type does not convery enough information yet.
>
> What you need to be aware of, for example, is encoding: is it ISO-8859-1 (latin1), a flavor of unicode (UTF-8, UTF-16, UTF-32, UCS-* etc.), Other variants such as CP-1252, plain ASCII, and so on. Erlang lists also let you specify strings as raw unicode codepoint sequences rather than under any specific encoding.
>
> Many of these will share the same basic format, such that "Hello, World!" shows up the same in most of these (if you omit byte-order marks) such that you cannot *detect* that information from the data, it has to be carried from the input or specification in most cases.  Then, once it's in place, you can tag it appropriately or make sure you know the meaning.
>
> is_string(List) may tell you true or false, but the result there does not tell you whether you can do anything with it in your libraries, merge them together, or make sure they have been normalized to fixed point.
>
> Strings are trickier than that, sadly.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Erlang Syntax and "Patterns" (Again)

Richard A. O'Keefe-2
In reply to this post by Steve Davis


On 17/03/16 11:53 pm, Steve Davis wrote:
> > ROK said:
> > Yawn.
> (What am I doing trying to argue with ROK??? Am I MAD?)
>
> 1) Why is it people rant about "string handling" in Erlang?

Because it is not the same as Java.
>
> 2) Principle of least surprise:
> 1> [H|T] = [22,87,65,84,33].
> [22,87,65,84,33]
> 2> H.
> 22
> 3> T.
> "WAT!”
This is a legitimate complaint, but it confuses two things.
There is *STRING HANDLING*, which is fine, and
there is *LIST PRINTING*, which causes the confusion.

For comparison, DEC-10 Prolog, PDP-11 Prolog, C-Prolog, and Quintus Prolog
all did STRING HANDLING as lists of character codes, but
all did LIST PRINTING without ever converting lists of numbers to strings.
The answer was that there was a library procedure to print a list of
integers as a string and you could call that whenever you wanted to,
such as in a user-defined pretty-printing procedure.  Here's a transcript
from SICStus Prolog:
| ?- write([65,66,67]).
[65,66,67]
yes
| ?- write("ABC").
[65,66,67]
yes

The heuristic used by the debugger in some Prologs was that a list of
integers between 32 and 126 inclusive was printed as a string; that
broke down with Latin 1, and broke harder with Unicode.  The simple
behaviour mandated by the standard that lists of integers print as
lists of integers confuses people once, then they learn that string
quotes are an input notation, not an output notation, and if they want
string notation in output, they have to call a special procedure to get it.

The ISO Prolog committee introduced a horrible alternative which the
DEC-10 Prolog designers had experienced in some Lisp systems and
learned to hate: flip a switch and "ABC" is read as ['A','B','C']. The
principal reason given for that was that the output was semi-readable.
One of my arguments against it was that this required every Prolog
system to be able to hold 17*2**16 atoms, and I new for a fact that
many would struggle to do so.  The retort was "they must be changed
to make a special case for one-character atoms".  Oh well, no silver
bullet.

That does serve as a reminder, though, that using [a,b,c] instead of
[$a,$b,$c] is *possible* in Erlang.

Just to repeat the basic point: the printing of (some) integer lists as
strings is SEPARABLE from the issue of how strings are represented and
processed; that could be changed without anything else in the language
changing.
>
> 3) A codec should be perfectly reversible i.e. X = encode(decode(X)).
> Without tagging, merely parsing out a string as a list is not
> perfectly reversible.
Here you are making a demand that very few other programming languages
can support.  For example, take JavaScript.  "\u0041" is read as "A",
and you are not going to get "\u0041" back from "A".  You're not even
going to get "\x41" back from it, even though "\x41" == "A".

Or take Erlang, where
1> 'foo bar'.
'foo bar'
2> 'foobar'.
foobar
with the same kind of thing happening in Prolog.

And of COURSE reading [1 /* one */, 2 /* deux */, 4 /* kvar */]
in JavaScript preserves the comments so that re-encoding the
data structure restores the input perfectly.  </sarc>

Or for that matter consider floating point numbers, where
even the languages that produce the best possible conversions
cannot promise that encode(decode(x)) == x.

No, I'm sorry, this "perfectly reversible codec" requirement sets up
a standard that NO programming language I'm aware of satisfies.
It is, in fact, a straw man.  What you *can* ask, and what some
language designers and implementers strive to give you, is
     decode(encode(decode(x))) == decode(x).

But to repeat the point made earlier, the way that lists of plausible
character codes is printed is SEPARABLE from the way strings are
represented and handled and in an ancestral language is SEPARATE.
>
> 4) What is the right way to implement the function is_string(List)
> correctly?
>
> *ducks*

That really is a "have you stopped beating your wife, answer yes or no"
sort of question.

It depends on the semantics you *want* it to have.  The Quintus
library didn't provide any such predicate, but it did provide

plausible_chars(Term)
  when Term is a sequence of integers satisfying
  is_graphic(C) or is_space(C),
  possibly ending with a tail that is a variable or
  a variable bound by numbervars/3.

Notice the careful choice of name:  not IS (certainly) a string,
but is a PLAUSIBLE list of characters.

It was good enough for paying customers to be happy with the
module it was part of (which was the one offering the
non-usual portray_chars(Term) command).

One of the representations Quintus used for strings (again, a
library feature, not a core language feature) was in Erlang
notation {external_string,FileName,Offset,Length}, and idea
that struck the customer I developed it for as a great
innovation, when I'd simply stolen it from Smalltalk!

The thing is that STRINGS ARE WRONG for most things,
however represented.  For example, when Java changed
the representation of String so that slicing became a
costly operation, I laughed, because I had my own representation
of strings that provided O(1) concatenation as well as cheap
slicing.  (Think Erlang iolists and you won't be far wrong.)
The Pop2 language developed and used at Edinburgh
represented file names as lists, e.g., [/dev/null] was in
Erlang notation ['/',dev.'/',null].  This made file name
manipulation easier than representing them as strings.
Any time there is internal structure, any time there is scope
for sharing substructure, any time you need to process
the parts of a string, strings are wrong.

The PERL lesson is that regular expressions are a fantastic
tool for doing the wrong thing quite simply.



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Erlang Syntax and "Patterns" (Again)

Emil Holmstrom
I am probably repeating what someone else already have said in some other similar thread.

The confusion between strings and [integer()] would have been greatly reduced if char() existed, $a wouldn't have to be syntactic sugar for 97 but would actually be "character a". You would have to explicitly convert char() -> integer() and wise versa. This is how strings are implemented in ML and Haskell.

Regarding character encoding: inside Erlang Unicode could always be assumed, converson between different character encodings could be done on I/O.

/emil

On Fri, 18 Mar 2016 at 00:51, Richard A. O'Keefe <[hidden email]> wrote:


On 17/03/16 11:53 pm, Steve Davis wrote:
> > ROK said:
> > Yawn.
> (What am I doing trying to argue with ROK??? Am I MAD?)
>
> 1) Why is it people rant about "string handling" in Erlang?

Because it is not the same as Java.
>
> 2) Principle of least surprise:
> 1> [H|T] = [22,87,65,84,33].
> [22,87,65,84,33]
> 2> H.
> 22
> 3> T.
> "WAT!”
This is a legitimate complaint, but it confuses two things.
There is *STRING HANDLING*, which is fine, and
there is *LIST PRINTING*, which causes the confusion.

For comparison, DEC-10 Prolog, PDP-11 Prolog, C-Prolog, and Quintus Prolog
all did STRING HANDLING as lists of character codes, but
all did LIST PRINTING without ever converting lists of numbers to strings.
The answer was that there was a library procedure to print a list of
integers as a string and you could call that whenever you wanted to,
such as in a user-defined pretty-printing procedure.  Here's a transcript
from SICStus Prolog:
| ?- write([65,66,67]).
[65,66,67]
yes
| ?- write("ABC").
[65,66,67]
yes

The heuristic used by the debugger in some Prologs was that a list of
integers between 32 and 126 inclusive was printed as a string; that
broke down with Latin 1, and broke harder with Unicode.  The simple
behaviour mandated by the standard that lists of integers print as
lists of integers confuses people once, then they learn that string
quotes are an input notation, not an output notation, and if they want
string notation in output, they have to call a special procedure to get it.

The ISO Prolog committee introduced a horrible alternative which the
DEC-10 Prolog designers had experienced in some Lisp systems and
learned to hate: flip a switch and "ABC" is read as ['A','B','C']. The
principal reason given for that was that the output was semi-readable.
One of my arguments against it was that this required every Prolog
system to be able to hold 17*2**16 atoms, and I new for a fact that
many would struggle to do so.  The retort was "they must be changed
to make a special case for one-character atoms".  Oh well, no silver
bullet.

That does serve as a reminder, though, that using [a,b,c] instead of
[$a,$b,$c] is *possible* in Erlang.

Just to repeat the basic point: the printing of (some) integer lists as
strings is SEPARABLE from the issue of how strings are represented and
processed; that could be changed without anything else in the language
changing.
>
> 3) A codec should be perfectly reversible i.e. X = encode(decode(X)).
> Without tagging, merely parsing out a string as a list is not
> perfectly reversible.
Here you are making a demand that very few other programming languages
can support.  For example, take JavaScript.  "\u0041" is read as "A",
and you are not going to get "\u0041" back from "A".  You're not even
going to get "\x41" back from it, even though "\x41" == "A".

Or take Erlang, where
1> 'foo bar'.
'foo bar'
2> 'foobar'.
foobar
with the same kind of thing happening in Prolog.

And of COURSE reading [1 /* one */, 2 /* deux */, 4 /* kvar */]
in JavaScript preserves the comments so that re-encoding the
data structure restores the input perfectly.  </sarc>

Or for that matter consider floating point numbers, where
even the languages that produce the best possible conversions
cannot promise that encode(decode(x)) == x.

No, I'm sorry, this "perfectly reversible codec" requirement sets up
a standard that NO programming language I'm aware of satisfies.
It is, in fact, a straw man.  What you *can* ask, and what some
language designers and implementers strive to give you, is
     decode(encode(decode(x))) == decode(x).

But to repeat the point made earlier, the way that lists of plausible
character codes is printed is SEPARABLE from the way strings are
represented and handled and in an ancestral language is SEPARATE.
>
> 4) What is the right way to implement the function is_string(List)
> correctly?
>
> *ducks*

That really is a "have you stopped beating your wife, answer yes or no"
sort of question.

It depends on the semantics you *want* it to have.  The Quintus
library didn't provide any such predicate, but it did provide

plausible_chars(Term)
  when Term is a sequence of integers satisfying
  is_graphic(C) or is_space(C),
  possibly ending with a tail that is a variable or
  a variable bound by numbervars/3.

Notice the careful choice of name:  not IS (certainly) a string,
but is a PLAUSIBLE list of characters.

It was good enough for paying customers to be happy with the
module it was part of (which was the one offering the
non-usual portray_chars(Term) command).

One of the representations Quintus used for strings (again, a
library feature, not a core language feature) was in Erlang
notation {external_string,FileName,Offset,Length}, and idea
that struck the customer I developed it for as a great
innovation, when I'd simply stolen it from Smalltalk!

The thing is that STRINGS ARE WRONG for most things,
however represented.  For example, when Java changed
the representation of String so that slicing became a
costly operation, I laughed, because I had my own representation
of strings that provided O(1) concatenation as well as cheap
slicing.  (Think Erlang iolists and you won't be far wrong.)
The Pop2 language developed and used at Edinburgh
represented file names as lists, e.g., [/dev/null] was in
Erlang notation ['/',dev.'/',null].  This made file name
manipulation easier than representing them as strings.
Any time there is internal structure, any time there is scope
for sharing substructure, any time you need to process
the parts of a string, strings are wrong.

The PERL lesson is that regular expressions are a fantastic
tool for doing the wrong thing quite simply.



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Erlang Syntax and "Patterns" (Again)

Hynek Vychodil
A superlative suggestion sir, with only two minor drawbacks: one, Erlang is dynamically typed language and two, Erlang is dynamically typed language. I know that technically that’s only one drawback, but I thought it was such a big one it was worth mentioning twice.

Hynek

On Fri, Mar 18, 2016 at 5:30 PM, Emil Holmstrom <[hidden email]> wrote:
I am probably repeating what someone else already have said in some other similar thread.

The confusion between strings and [integer()] would have been greatly reduced if char() existed, $a wouldn't have to be syntactic sugar for 97 but would actually be "character a". You would have to explicitly convert char() -> integer() and wise versa. This is how strings are implemented in ML and Haskell.

Regarding character encoding: inside Erlang Unicode could always be assumed, converson between different character encodings could be done on I/O.

/emil

On Fri, 18 Mar 2016 at 00:51, Richard A. O'Keefe <[hidden email]> wrote:


On 17/03/16 11:53 pm, Steve Davis wrote:
> > ROK said:
> > Yawn.
> (What am I doing trying to argue with ROK??? Am I MAD?)
>
> 1) Why is it people rant about "string handling" in Erlang?

Because it is not the same as Java.
>
> 2) Principle of least surprise:
> 1> [H|T] = [22,87,65,84,33].
> [22,87,65,84,33]
> 2> H.
> 22
> 3> T.
> "WAT!”
This is a legitimate complaint, but it confuses two things.
There is *STRING HANDLING*, which is fine, and
there is *LIST PRINTING*, which causes the confusion.

For comparison, DEC-10 Prolog, PDP-11 Prolog, C-Prolog, and Quintus Prolog
all did STRING HANDLING as lists of character codes, but
all did LIST PRINTING without ever converting lists of numbers to strings.
The answer was that there was a library procedure to print a list of
integers as a string and you could call that whenever you wanted to,
such as in a user-defined pretty-printing procedure.  Here's a transcript
from SICStus Prolog:
| ?- write([65,66,67]).
[65,66,67]
yes
| ?- write("ABC").
[65,66,67]
yes

The heuristic used by the debugger in some Prologs was that a list of
integers between 32 and 126 inclusive was printed as a string; that
broke down with Latin 1, and broke harder with Unicode.  The simple
behaviour mandated by the standard that lists of integers print as
lists of integers confuses people once, then they learn that string
quotes are an input notation, not an output notation, and if they want
string notation in output, they have to call a special procedure to get it.

The ISO Prolog committee introduced a horrible alternative which the
DEC-10 Prolog designers had experienced in some Lisp systems and
learned to hate: flip a switch and "ABC" is read as ['A','B','C']. The
principal reason given for that was that the output was semi-readable.
One of my arguments against it was that this required every Prolog
system to be able to hold 17*2**16 atoms, and I new for a fact that
many would struggle to do so.  The retort was "they must be changed
to make a special case for one-character atoms".  Oh well, no silver
bullet.

That does serve as a reminder, though, that using [a,b,c] instead of
[$a,$b,$c] is *possible* in Erlang.

Just to repeat the basic point: the printing of (some) integer lists as
strings is SEPARABLE from the issue of how strings are represented and
processed; that could be changed without anything else in the language
changing.
>
> 3) A codec should be perfectly reversible i.e. X = encode(decode(X)).
> Without tagging, merely parsing out a string as a list is not
> perfectly reversible.
Here you are making a demand that very few other programming languages
can support.  For example, take JavaScript.  "\u0041" is read as "A",
and you are not going to get "\u0041" back from "A".  You're not even
going to get "\x41" back from it, even though "\x41" == "A".

Or take Erlang, where
1> 'foo bar'.
'foo bar'
2> 'foobar'.
foobar
with the same kind of thing happening in Prolog.

And of COURSE reading [1 /* one */, 2 /* deux */, 4 /* kvar */]
in JavaScript preserves the comments so that re-encoding the
data structure restores the input perfectly.  </sarc>

Or for that matter consider floating point numbers, where
even the languages that produce the best possible conversions
cannot promise that encode(decode(x)) == x.

No, I'm sorry, this "perfectly reversible codec" requirement sets up
a standard that NO programming language I'm aware of satisfies.
It is, in fact, a straw man.  What you *can* ask, and what some
language designers and implementers strive to give you, is
     decode(encode(decode(x))) == decode(x).

But to repeat the point made earlier, the way that lists of plausible
character codes is printed is SEPARABLE from the way strings are
represented and handled and in an ancestral language is SEPARATE.
>
> 4) What is the right way to implement the function is_string(List)
> correctly?
>
> *ducks*

That really is a "have you stopped beating your wife, answer yes or no"
sort of question.

It depends on the semantics you *want* it to have.  The Quintus
library didn't provide any such predicate, but it did provide

plausible_chars(Term)
  when Term is a sequence of integers satisfying
  is_graphic(C) or is_space(C),
  possibly ending with a tail that is a variable or
  a variable bound by numbervars/3.

Notice the careful choice of name:  not IS (certainly) a string,
but is a PLAUSIBLE list of characters.

It was good enough for paying customers to be happy with the
module it was part of (which was the one offering the
non-usual portray_chars(Term) command).

One of the representations Quintus used for strings (again, a
library feature, not a core language feature) was in Erlang
notation {external_string,FileName,Offset,Length}, and idea
that struck the customer I developed it for as a great
innovation, when I'd simply stolen it from Smalltalk!

The thing is that STRINGS ARE WRONG for most things,
however represented.  For example, when Java changed
the representation of String so that slicing became a
costly operation, I laughed, because I had my own representation
of strings that provided O(1) concatenation as well as cheap
slicing.  (Think Erlang iolists and you won't be far wrong.)
The Pop2 language developed and used at Edinburgh
represented file names as lists, e.g., [/dev/null] was in
Erlang notation ['/',dev.'/',null].  This made file name
manipulation easier than representing them as strings.
Any time there is internal structure, any time there is scope
for sharing substructure, any time you need to process
the parts of a string, strings are wrong.

The PERL lesson is that regular expressions are a fantastic
tool for doing the wrong thing quite simply.



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Erlang Syntax and "Patterns" (Again)

Emil Holmstrom
I fail to see the significance of the type system in this case, it doesn't stop Erlang to have a char() type? It has float(), integer(), atom(), etc... Too force lists to have the same element type is still possible even if Erlang is dynamically typed. Unfortunately iolist() would have to go. Maybe I am missing something obvious?


/emil

On Fri, 18 Mar 2016 at 18:15, Hynek Vychodil <[hidden email]> wrote:
A superlative suggestion sir, with only two minor drawbacks: one, Erlang is dynamically typed language and two, Erlang is dynamically typed language. I know that technically that’s only one drawback, but I thought it was such a big one it was worth mentioning twice.

Hynek

On Fri, Mar 18, 2016 at 5:30 PM, Emil Holmstrom <[hidden email]> wrote:
I am probably repeating what someone else already have said in some other similar thread.

The confusion between strings and [integer()] would have been greatly reduced if char() existed, $a wouldn't have to be syntactic sugar for 97 but would actually be "character a". You would have to explicitly convert char() -> integer() and wise versa. This is how strings are implemented in ML and Haskell.

Regarding character encoding: inside Erlang Unicode could always be assumed, converson between different character encodings could be done on I/O.

/emil

On Fri, 18 Mar 2016 at 00:51, Richard A. O'Keefe <[hidden email]> wrote:


On 17/03/16 11:53 pm, Steve Davis wrote:
> > ROK said:
> > Yawn.
> (What am I doing trying to argue with ROK??? Am I MAD?)
>
> 1) Why is it people rant about "string handling" in Erlang?

Because it is not the same as Java.
>
> 2) Principle of least surprise:
> 1> [H|T] = [22,87,65,84,33].
> [22,87,65,84,33]
> 2> H.
> 22
> 3> T.
> "WAT!”
This is a legitimate complaint, but it confuses two things.
There is *STRING HANDLING*, which is fine, and
there is *LIST PRINTING*, which causes the confusion.

For comparison, DEC-10 Prolog, PDP-11 Prolog, C-Prolog, and Quintus Prolog
all did STRING HANDLING as lists of character codes, but
all did LIST PRINTING without ever converting lists of numbers to strings.
The answer was that there was a library procedure to print a list of
integers as a string and you could call that whenever you wanted to,
such as in a user-defined pretty-printing procedure.  Here's a transcript
from SICStus Prolog:
| ?- write([65,66,67]).
[65,66,67]
yes
| ?- write("ABC").
[65,66,67]
yes

The heuristic used by the debugger in some Prologs was that a list of
integers between 32 and 126 inclusive was printed as a string; that
broke down with Latin 1, and broke harder with Unicode.  The simple
behaviour mandated by the standard that lists of integers print as
lists of integers confuses people once, then they learn that string
quotes are an input notation, not an output notation, and if they want
string notation in output, they have to call a special procedure to get it.

The ISO Prolog committee introduced a horrible alternative which the
DEC-10 Prolog designers had experienced in some Lisp systems and
learned to hate: flip a switch and "ABC" is read as ['A','B','C']. The
principal reason given for that was that the output was semi-readable.
One of my arguments against it was that this required every Prolog
system to be able to hold 17*2**16 atoms, and I new for a fact that
many would struggle to do so.  The retort was "they must be changed
to make a special case for one-character atoms".  Oh well, no silver
bullet.

That does serve as a reminder, though, that using [a,b,c] instead of
[$a,$b,$c] is *possible* in Erlang.

Just to repeat the basic point: the printing of (some) integer lists as
strings is SEPARABLE from the issue of how strings are represented and
processed; that could be changed without anything else in the language
changing.
>
> 3) A codec should be perfectly reversible i.e. X = encode(decode(X)).
> Without tagging, merely parsing out a string as a list is not
> perfectly reversible.
Here you are making a demand that very few other programming languages
can support.  For example, take JavaScript.  "\u0041" is read as "A",
and you are not going to get "\u0041" back from "A".  You're not even
going to get "\x41" back from it, even though "\x41" == "A".

Or take Erlang, where
1> 'foo bar'.
'foo bar'
2> 'foobar'.
foobar
with the same kind of thing happening in Prolog.

And of COURSE reading [1 /* one */, 2 /* deux */, 4 /* kvar */]
in JavaScript preserves the comments so that re-encoding the
data structure restores the input perfectly.  </sarc>

Or for that matter consider floating point numbers, where
even the languages that produce the best possible conversions
cannot promise that encode(decode(x)) == x.

No, I'm sorry, this "perfectly reversible codec" requirement sets up
a standard that NO programming language I'm aware of satisfies.
It is, in fact, a straw man.  What you *can* ask, and what some
language designers and implementers strive to give you, is
     decode(encode(decode(x))) == decode(x).

But to repeat the point made earlier, the way that lists of plausible
character codes is printed is SEPARABLE from the way strings are
represented and handled and in an ancestral language is SEPARATE.
>
> 4) What is the right way to implement the function is_string(List)
> correctly?
>
> *ducks*

That really is a "have you stopped beating your wife, answer yes or no"
sort of question.

It depends on the semantics you *want* it to have.  The Quintus
library didn't provide any such predicate, but it did provide

plausible_chars(Term)
  when Term is a sequence of integers satisfying
  is_graphic(C) or is_space(C),
  possibly ending with a tail that is a variable or
  a variable bound by numbervars/3.

Notice the careful choice of name:  not IS (certainly) a string,
but is a PLAUSIBLE list of characters.

It was good enough for paying customers to be happy with the
module it was part of (which was the one offering the
non-usual portray_chars(Term) command).

One of the representations Quintus used for strings (again, a
library feature, not a core language feature) was in Erlang
notation {external_string,FileName,Offset,Length}, and idea
that struck the customer I developed it for as a great
innovation, when I'd simply stolen it from Smalltalk!

The thing is that STRINGS ARE WRONG for most things,
however represented.  For example, when Java changed
the representation of String so that slicing became a
costly operation, I laughed, because I had my own representation
of strings that provided O(1) concatenation as well as cheap
slicing.  (Think Erlang iolists and you won't be far wrong.)
The Pop2 language developed and used at Edinburgh
represented file names as lists, e.g., [/dev/null] was in
Erlang notation ['/',dev.'/',null].  This made file name
manipulation easier than representing them as strings.
Any time there is internal structure, any time there is scope
for sharing substructure, any time you need to process
the parts of a string, strings are wrong.

The PERL lesson is that regular expressions are a fantastic
tool for doing the wrong thing quite simply.



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Erlang Syntax and "Patterns" (Again)

Jesper Louis Andersen-2

On Sat, Mar 19, 2016 at 5:34 PM, Emil Holmstrom <[hidden email]> wrote:
I fail to see the significance of the type system in this case, it doesn't stop Erlang to have a char() type?

Right. You would have to invent a new tag for it, and you would have to potentially take a performance hit due to the new tagging scheme, but adding distinct values at runtime is fairly easy to do. With different tags, $a and 97 are now different values, and conversion between them is now explicit. What makes this idea more powerful in statically typed languages has to do with erasure: once we have a compiled program with a picked representation, we can choose the same representation for multiple types. This is what makes it efficient: char and int are both implemented as integers internally, but the type system protects against misusing one as the other. Erlang terms must be self-describing.

With a more complex type zoo comes the additional burden of managing differences among types however. This is not free for the programmer, and most programming languages try to strike a balance between precision and convenience. Of course, there are different opinions as to what level is needed.




--
J.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Erlang Syntax and "Patterns" (Again)

Richard A. O'Keefe-2
In reply to this post by Emil Holmstrom


On 19/03/16 5:30 am, Emil Holmstrom wrote:
> The confusion between strings and [integer()] would have been greatly
> reduced if char() existed, $a wouldn't have to be syntactic sugar for
> 97 but would actually be "character a". You would have to explicitly
> convert char() -> integer() and wise versa. This is how strings are
> implemented in ML and Haskell.
Actually no.
<quote>
    The STRING signature specifies the basic operations on a string type,
    which is a vector of the underlying character type char
    as defined in the structure.

    signature STRING = ...
    structure String :> STRING
        where type string = string
        where type string = CharVector.vector
        where type char  = Char.char
    structure WideString :> STRING
        where type string = WideCharacterVector.vector
        where type char  = WideChar.char
</quote>

 From the SML Basis Library book, page 360.

There are indeed Char.char and WideChar.char types which are just wrappers
for integer types such that chr and ord are implementation-level identities.
But characters as such are so far from being basic to strings in ML that the
very syntax, #"c", tells you that they were an afterthought.

For what it's worth, when I'm working on strings in SML, I suppose I
*should*
make use of the StringCvt.reader type but it's usually easier to use
explode to convert (*change* the representation of) strings to lists and
implode to convert back.

As for Haskell, String does indeed exist, but the preferred representation
for strings in "efficient" programs is
http://hackage.haskell.org/package/text-1.2.2.1/docs/Data-Text.html

(It's interesting to note that a Text value in Haskell is NOT an arbitrary
sequence of suitably bounded integral values; a sequence of bounded
integral values yes, arbitrary no.)

I completely agree that encodings should be handled at the edges of a
system, *for data that the system will accept/generate*.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions