How best to resolve apparent inconsistency between re:run and string:slice

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

How best to resolve apparent inconsistency between re:run and string:slice

Scott Finnie

Hello all

 

I discovered an apparent inconsistency between re:run/2 and string:slice/2 last night.  I'd appreciate any suggestions on the right way to solve.  Best illustrated by this example:

 

1> S1="\rDogs".

"\rDogs"

2> S2="\r\nDogs".

"\r\nDogs"

3> {match, [{_, To1}]}=re:run(S1, "^\r").

{match,[{0,1}]}

4> {match, [{_, To2}]}=re:run(S2, "^\r\n").

{match,[{0,2}]}

5> string:slice(S1,To1).

"Dogs"

6> string:slice(S2,To2).

"ogs"

 

I’d guess this is due to string:slice treating “\r\n” as a single lexeme, whereas re:run treats it as two characters.

 

What I’m trying to achieve is consistent behaviour that (a) recognises any newline (CR, LF, CRLF, …) at the start of the string and removes it.  I also need a count of the contiguous newlines found.

 

As an aside, the now-obsolete string:sub_string/2 does seem to work consistently:

 

7> string:sub_string(S1,To1+1).

"Dogs"

8> string:sub_string(S2,To2+1).

"Dogs"

 

Thanks,

Scott.




--------------------
Hymans Robertson LLP is a limited liability partnership registered in
England and Wales with registered number OC310282. A list of
members of Hymans Robertson LLP is available for inspection at One
London Wall, London, EC2Y 5EA, the firm's registered office. Hymans
Robertson LLP is authorised and regulated by the Financial Conduct
Authority and licensed by the Institute and Faculty of Actuaries for
a range of investment business activities.

This e-mail and any attachments are confidential. If it is not intended for
you then please tell us and respect that confidentiality. E-mails and
attachments can be corrupted or altered after sending: if you rely on
advice or product transmitted by e-mail then you do so at your own
risk. This footnote also confirms that this email message has been swept
for the presence of computer viruses.

Visit hymans.co.uk/information/privacy-notice/ for
details of how we use your personal information.
--------------------

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How best to resolve apparent inconsistency between re:run and string:slice

Dan Sommers
On 8/29/19 2:59 AM, Scott Finnie wrote:

 > What I’m trying to achieve is consistent behaviour that (a) recognises
 > any newline (CR, LF, CRLF, …) at the start of the string and removes
 > it.  I also need a count of the contiguous newlines found.

Regular expressions seem needlessly complicated for this task.  I'd use
something more direct.  Assuming that you want to count CRs and LFs
separately, and not recognize CRLF as a single new line:

     %% return a {String, Counter} tuple where the leading CRs and LFs
     %% have stripped from String and counted in Counter

     count_and_strip_leading_crs_and_lfs(String) ->
       count_and_strip_leading_crs_and_lfs(String, 0).

     count_and_strip_leading_crs_and_lfs([$\r | String], Counter) ->
       count_and_strip_leading_crs_and_lfs(String, Counter + 1);
     count_and_strip_leading_crs_and_lfs([$\n | String], Counter) ->
       count_and_strip_leading_crs_and_lfs(String, Counter + 1);
     count_and_strip_leading_crs_and_lfs(String, Counter) ->
       {String, Counter}.

To count CRLFs as a single new line, add this to the beginning of
count_and_strip_leading_crs_and_lfs/2:

count_and_strip_leading_crs_and_lfs([$\r, $\n | String], Counter) ->
   count_and_strip_leading_crs_and_lfs(String, Counter + 1);

HTH,
Dan
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: How best to resolve apparent inconsistency between re:run and string:slice

Scott Finnie
Thanks for the response Dan,

>> What I'm trying to achieve is consistent behaviour that (a) recognises  > any newline (CR, LF, CRLF, ...) at the start of the string and removes  > it.  I also need a count of the contiguous newlines found.

>Regular expressions seem needlessly complicated for this task.

In the example I gave, I'd agree.  In reality there are quite a few lexemes that constitute a newline.  The full regex is "^(?>\r\n|\n|\x0b|\f|\r|\x85)".  Adding a function clause for each of those would mean quite a lot of code duplication.  There is perhaps a halfway house though, where it matches on a regex but doesn't rely on the {From, To} match result in taking the new string slice.

It still leaves unresolved the seeming inconsistency between re:run and string:slice - though I appreciate you focused on suggesting an approach to my 1st order problem, not the 2nd order symptom.

Thanks again for your suggestion.

-S.

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

________________________________

--------------------
Hymans Robertson LLP is a limited liability partnership registered in
England and Wales with registered number OC310282. A list of
members of Hymans Robertson LLP is available for inspection at One
London Wall, London, EC2Y 5EA, the firm's registered office. Hymans
Robertson LLP is authorised and regulated by the Financial Conduct
Authority and licensed by the Institute and Faculty of Actuaries for
a range of investment business activities.

This e-mail and any attachments are confidential. If it is not intended for
you then please tell us and respect that confidentiality. E-mails and
attachments can be corrupted or altered after sending: if you rely on
advice or product transmitted by e-mail then you do so at your own
risk. This footnote also confirms that this email message has been swept
for the presence of computer viruses.

Visit hymans.co.uk/information/privacy-notice/<http://www.hymans.co.uk/information/privacy-notice/> for
details of how we use your personal information.
--------------------
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions