Changes to string module

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Changes to string module

zxq9-2
I missed it when looking through R20 release notes, but on referencing the string module I noticed a LOT of changes have happened there. String objects are now more or less based on unicode:chardata() and utf8 graphemes.

Whoever is responsible for this -- THANK YOU.

For those of us in East Asia life just got a bit easier. Not 100% easy, because what fun would that be, but this sure is a lot nicer.

Yay!
-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Changes to string module

Lloyd R. Prentice-2
Hi all,

I can appreciate the reasons for the changes, but it has created moments of frustration for me.

I'm running an earlier version of Erlang. Plan to upgrade, but not yet. My work habit is to frequently consult on-line Erlang docs.

The other day I looked up string:span/2, which I've used in various places throughout my code. But what's this? It's marked "obsolete - use take/2."

I must have caught the docs at a point of transition since there was no take/2 in the list of string functions. What to do? What to do?

That version of the docs seems to have been updated. Both take/2 and span/2 are now in the list.

OK, I can live with momentary and passing frustration. But now, when I upgrade Erlang, do I have to worry that span/2 will be dropped at some point--- after all, it's "obsolete."

Or do I have run two versions of Erlang?

Plus, I now find the string docs much harder to use since I have to think about whether a given function is in my version of Erlang or not.

Petty issues and, perhaps the price of progress. But it does illustrate that library revision may have unintended consequences.

Best wishes,

LRP

Sent from my iPad

> On Jul 21, 2017, at 4:46 AM, zxq9 <[hidden email]> wrote:
>
> I missed it when looking through R20 release notes, but on referencing the string module I noticed a LOT of changes have happened there. String objects are now more or less based on unicode:chardata() and utf8 graphemes.
>
> Whoever is responsible for this -- THANK YOU.
>
> For those of us in East Asia life just got a bit easier. Not 100% easy, because what fun would that be, but this sure is a lot nicer.
>
> Yay!
> -Craig
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Changes to string module

Dan Gudmundsson-3


On Fri, Jul 21, 2017 at 4:02 PM Lloyd R. Prentice <[hidden email]> wrote:
Hi all,

I can appreciate the reasons for the changes, but it has created moments of frustration for me.

I'm running an earlier version of Erlang. Plan to upgrade, but not yet. My work habit is to frequently consult on-line Erlang docs.

The other day I looked up string:span/2, which I've used in various places throughout my code. But what's this? It's marked "obsolete - use take/2."

I must have caught the docs at a point of transition since there was no take/2 in the list of string functions. What to do? What to do?

That version of the docs seems to have been updated. Both take/2 and span/2 are now in the list.

OK, I can live with momentary and passing frustration. But now, when I upgrade Erlang, do I have to worry that span/2 will be dropped at some point--- after all, it's "obsolete."

It might be dropped at some point, but not in any near future, not until OTP-23 and probably not in that release either.
We will deprecate (and add warnings) and remove the docs of the old functions in OTP-21.

The old functions will be kept.


Or do I have run two versions of Erlang?

Plus, I now find the string docs much harder to use since I have to think about whether a given function is in my version of Erlang or not.

Petty issues and, perhaps the price of progress. But it does illustrate that library revision may have unintended consequences.

The only difference right now (in OTP-20) is the addition of new functions and the text that warns that the old ones will be deprecated in 21.
So the only problem you might get in OTP-21 is deprecated warnings for using old functions which you can suppress.

/Dan


Best wishes,

LRP

Sent from my iPad

> On Jul 21, 2017, at 4:46 AM, zxq9 <[hidden email]> wrote:
>
> I missed it when looking through R20 release notes, but on referencing the string module I noticed a LOT of changes have happened there. String objects are now more or less based on unicode:chardata() and utf8 graphemes.
>
> Whoever is responsible for this -- THANK YOU.
>
> For those of us in East Asia life just got a bit easier. Not 100% easy, because what fun would that be, but this sure is a lot nicer.
>
> Yay!
> -Craig
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Changes to string module

Grzegorz Junka
In reply to this post by Lloyd R. Prentice-2


On 21/07/2017 14:02, Lloyd R. Prentice wrote:

> Hi all,
>
> I can appreciate the reasons for the changes, but it has created moments of frustration for me.
>
> I'm running an earlier version of Erlang. Plan to upgrade, but not yet. My work habit is to frequently consult on-line Erlang docs.
>
> The other day I looked up string:span/2, which I've used in various places throughout my code. But what's this? It's marked "obsolete - use take/2."
>
> I must have caught the docs at a point of transition since there was no take/2 in the list of string functions. What to do? What to do?
>
> That version of the docs seems to have been updated. Both take/2 and span/2 are now in the list.
>
> OK, I can live with momentary and passing frustration. But now, when I upgrade Erlang, do I have to worry that span/2 will be dropped at some point--- after all, it's "obsolete."
>
> Or do I have run two versions of Erlang?
>
> Plus, I now find the string docs much harder to use since I have to think about whether a given function is in my version of Erlang or not.
>

Isn't just following a documentation for your version enough?

http://www.erlang.org/docs/versions/

Regards
Grzegorz
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Changes to string module

Lloyd R. Prentice-2
In reply to this post by Dan Gudmundsson-3

Hi Dan,

 

A big thumbs up for your work on Erlang, but...

 

At risk of being tagged culturally insensitive, Is it presumptions of me to ask if we could please split the new Unicode string functions into a separate library and preserve the ascii string functions as we find them in older Erlang versions?

 

I've been working with ascii string functions but now find the new docs much harder to work with, e.g. to my eye too much visual noise, plus stuff like this:

 

- string:chr/2 (and rchr/2) returns an index into a string. But we're told "This function is obsolete. Use find/2."

  find/2, however, returns "returns the remainder of the string or nomatch...."

 

- similarly, string:cspan/2 is marked "This function is obsolete. Use take/3."

  But string:take/3 returns leading and trailing data.

 

Plus, I dread the future necessity of rewriting string code.

 

I can certainly see the value of adding  functions that support Unicode. And applaud them. But stomping on existing ascii functions seems like a stretch too far.

 

Thanks,

 

LRP

 

 

 

 

 

 

-----Original Message-----
From: "Dan Gudmundsson" <[hidden email]>
Sent: Friday, July 21, 2017 10:17am
To: "Lloyd R. Prentice" <[hidden email]>, "zxq9" <[hidden email]>
Cc: [hidden email]
Subject: Re: [erlang-questions] Changes to string module



On Fri, Jul 21, 2017 at 4:02 PM Lloyd R. Prentice <[hidden email]> wrote:
Hi all,

I can appreciate the reasons for the changes, but it has created moments of frustration for me.

I'm running an earlier version of Erlang. Plan to upgrade, but not yet. My work habit is to frequently consult on-line Erlang docs.

The other day I looked up string:span/2, which I've used in various places throughout my code. But what's this? It's marked "obsolete - use take/2."

I must have caught the docs at a point of transition since there was no take/2 in the list of string functions. What to do? What to do?

That version of the docs seems to have been updated. Both take/2 and span/2 are now in the list.

OK, I can live with momentary and passing frustration. But now, when I upgrade Erlang, do I have to worry that span/2 will be dropped at some point--- after all, it's "obsolete."
It might be dropped at some point, but not in any near future, not until OTP-23 and probably not in that release either.
We will deprecate (and add warnings) and remove the docs of the old functions in OTP-21.
The old functions will be kept.

Or do I have run two versions of Erlang?

Plus, I now find the string docs much harder to use since I have to think about whether a given function is in my version of Erlang or not.

Petty issues and, perhaps the price of progress. But it does illustrate that library revision may have unintended consequences.
The only difference right now (in OTP-20) is the addition of new functions and the text that warns that the old ones will be deprecated in 21.
So the only problem you might get in OTP-21 is deprecated warnings for using old functions which you can suppress.
/Dan

Best wishes,

LRP

Sent from my iPad

> On Jul 21, 2017, at 4:46 AM, zxq9 <[hidden email]> wrote:
>
> I missed it when looking through R20 release notes, but on referencing the string module I noticed a LOT of changes have happened there. String objects are now more or less based on unicode:chardata() and utf8 graphemes.
>
> Whoever is responsible for this -- THANK YOU.
>
> For those of us in East Asia life just got a bit easier. Not 100% easy, because what fun would that be, but this sure is a lot nicer.
>
> Yay!
> -Craig
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Changes to string module

Richard A. O'Keefe-2
In reply to this post by Lloyd R. Prentice-2


On 22/07/17 2:02 AM, Lloyd R. Prentice wrote:
> I can appreciate the reasons for the changes, but it has created moments of frustration for me.

I really don't like to say this, but this is one of the things
the Java documentation does right.  Fields and methods may be
labelled "Since x.y" or "Deprecated since x.y".

It would be seriously painful to apply this to the existing
Erlang documentation, but perhaps future changes could do this?


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Changes to string module

zxq9-2
In reply to this post by Lloyd R. Prentice-2
Hi, Lloyd!

API changes are a major pain. Hopefully there will not be too many minor traumas as things move forward...

On 2017年07月23日 日曜日 15:53:42 you wrote:
> At risk of being tagged culturally insensitive, Is it presumptions of me to ask if we could please split the new Unicode string functions into a separate library and preserve the ascii string functions as we find them in older Erlang versions?

Not culturally insensitive -- annoyed at a changing API. That's normal.

The original problem was that the "string" module really performed (occasionally redundant) list operations, not string operations, just insufficiently stringish operations. The maintainers went with a jarring approach to change (well sort of -- nothing has been removed yet). This is understandable considering the alternatives. They left functions in place many of us are using, but deprecated many of them, and eventually we will find ourselves with a for-real string module. Quite nice.

But the road there is fraught with peril...
 
> I've been working with ascii string functions but now find the new docs much harder to work with, e.g. to my eye too much visual noise, plus stuff like this:
>  
> - string:chr/2 (and rchr/2) returns an index into a string. But we're told "This function is [ obsolete ]( http://erlang.org/doc/man/string.html#oldapi ). Use [ find/2 ]( http://erlang.org/doc/man/string.html#find-2 )."
>   find/2, however, returns "returns the remainder of the string or nomatch...."

Yep. That's a big wtf. Why not just leave that one there or (at worst) move it to something more general where it really belongs like lists:index/2 which the language lacks (because it is usually just not called for and nobody seems to have trouble with this).

At a minimum, the docs shouldn't ONLY reference string:find/2,3, but also include a reference to a way to get the exact same behavior -- for example, using the re module:


1> S = "All you need in this life is ignorance and confidence, and then success is sure.".
"All you need in this life is ignorance and confidence, and then success is sure."
2> string:chr(S, $i).
14
3> re:run(S, "i").
{match,[{13,1}]}
4> Chr = fun(String, Char) -> case re:run(String, [Char]) of {match, [{I, 1}]} -> I + 1; nomatch -> 0 end end.
#Fun<erl_eval.12.87737649>
8> Chr(S, $i).
14


Which indicates an internal definition can give us the same effect with


-module(my_string).

-spec chr(String, Char) -> Index
    when String :: list(),
         Char   :: non_neg_integer(),
         Index  :: 0 | pos_integer().

chr(String, Char) ->
    case re:run(String, [Char]) of
        {match, [{Index, 1}]} -> Index + 1;
        nomatch               -> 0
    end.


(And then, of course, doing the equivalent of 's/string:chr/my_string:chr/g' over the source...)

That is a relatively easy fix that leaves whatever code depends on the exact return value of the current string:chr/2 function without a big rewrite.

But having to do so is a bit annoying.

Why not leave string:chr/2 in place as-is? I don't really know. Maybe because the string module is intended to conceptually do string processing, not array processing. Or something. Or whatever.


> - similarly, string:cspan/2 is marked "This function is [ obsolete ]( http://erlang.org/doc/man/string.html#oldapi ). Use [ take/3 ]( http://erlang.org/doc/man/string.html#take-3 )."
>   But string:take/3 returns leading and trailing data.

That is pretty odd also.

Once again, I think the idea here is that the new API is assuming what the intended use of this function typically was, and jumping straight to that intended effect instead of leaving the current intermediate step in place.

Once again, the re module can be used to get the same result:

40> string:cspan(S, "abc").
34
41> string:cspan(S, "ZXY").
80
42> string:cspan(S, "All").
0
43> Cspan = fun(S, C) -> case re:run(S, "[" ++ C ++ "]") of {match, [{I, 1}]} -> I; nomatch -> length(S) end end.
#Fun<erl_eval.12.87737649>
44> Cspan(S, "abc").
34
45> Cspan(S, "ZXY").
80
46> Cspan(S, "All").
0

Which indicates we could do:


-module(my_string).

-spec cspan(String, Chars) -> Index
    when String :: [non_neg_integer()],
         Char   :: [non_neg_integer()],
         Index  :: non_neg_integer().

cspan(String, Chars) ->
    case re:run(String, "[" ++ Chars ++ "]") of
        {match, [{Index, 1}]} -> Index;
        nomatch               -> length(String)
    end.


(...and once again run sed for cspan on the source).


> Plus, I dread the future necessity of rewriting string code.

This is annoying -- I totally agree.

Overall I think the changes are good. They leave the natural place to do listy things in the lists module, the regexy things in the re module, and truly stringy things over true UTF8 strings -- and this is a HUGE step forward in terms of doing non-Romaji things.

BUT

WHY some direct 1-for-1 replacement functions, references, advice, etc. is not included in the docs is beyond me. It does little help to direct someone to a string munging function from the docs on a deprecated index function when users have very likely already written their OWN string munging libs based on index return values.

Hopefully the switch won't hurt too terribly bad.

-Craig

PS: Thanks again for the awesome work on strings, Dan! I noticed I had an email sitting in my box from you months ago regarding some Kana functions you had put in. I wound up dropping out of civilization for a bit right then -- and I'll get back to you on it eventually.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions