regexp:first_match

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

regexp:first_match

Niels Christensen
Reading the documentation for regexp, I am surprised that

2> regexp:first_match("<DATE>22-03-03</DATE>","<.+>").
{match,1,21}
3>

I should have thought (and wanted!) the result to be

{match,1,6}

Anyone know why not?

Niels Christensen




Reply | Threaded
Open this post in threaded view
|

regexp:first_match

Robert Virding-4
Niels Christensen <christen> writes:
>Reading the documentation for regexp, I am surprised that
>
>2> regexp:first_match("<DATE>22-03-03</DATE>","<.+>").
>{match,1,21}
>3>
>
>I should have thought (and wanted!) the result to be
>
>{match,1,6}

Here again to combine and confirm the other replies to this question.

regexp:match will search the whole string to find the longest match.
If there is more than one match with the same length then it will
choose the first one.

regexp:first_match will choose the first match, but it is also greedy
and returns the longest possible match.  Which is what you discovered.
Originally it just took the first match (as you wanted) but when
someone "optimised" the code this behaviour changed.

I don't which is better.  At least it is now consistently greedy.

Perhaps you can say that regexp:match returns the "first longest"
while regexp:first_match returns the "longest first".  How about a
"last shortest"? :-)

Yes . is consistent with other regular expressions and matches any
character except \n.

Robert