Parsing UUIDs with Leex

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Parsing UUIDs with Leex

Massimo Cesaro
Hello,
I'm developing the transpiler for an IoT oriented DSL to Erlang.
The idea is to take a domain specific language in input and create the erlang source code in output.
In my erlang project, I'm using the venerable Leex and Yecc tools to build the lexer and parser.
Using Erlang/OTP 22 [erts-10.5.2], so far, so good.

Today I was trying to define a UUID data type in the DSL language using the following definition in the leex .xlr file:

UUID = [0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}

and the corresponding rule:
{UUID}       : {token, {uuid, TokenChars, TokenLine}}.

in the same file there is also the definition for an atom type:

ATOM = [a-zA-Z0-9_]*

with the rule
{ATOM}       : {token, {atom, list_to_atom(TokenChars), TokenLine}}.

I found that the generated lexer cannot match this UUID 
3eeb8daf-4e66-4c02-bb28-68934157e36e in input, 

but it matches the following tokens sequence:
{atom,'3eeb8daf',25},{minus,25},{atom,'4e66',25},{minus,25},{atom,'4c02',25},{minus,25},{atom,bb28,25},{minus,25},{atom,'68934157e36e',25}

The minus token is present because I'm implementing also unary and arithmetic expressions.

I tested my definition with re:
7> re:run("3eeb8daf-4e66-4c02-bb28-68934157e36e", "[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}", [{capture,all,list},unicode]).          
{match,["3eeb8daf-4e66-4c02-bb28-68934157e36e"]}

8> re:run("3eeb8daf-4e66-4c02-bb28-68934157e36e", "[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}", [{capture,first,list},unicode]).
{match,["3eeb8daf-4e66-4c02-bb28-68934157e36e"]}

so I guess that at least the definition should be right. 

My workaround is to include the UUIDs in the source code between double quotes and then parsing UUIDs as strings, but I'm still wondering if I'm doing something wrong.

Thanks,
Massimo

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Parsing UUIDs with Leex

Stanislaw Klekot
On Fri, Oct 11, 2019 at 04:22:32PM +0200, Massimo Cesaro wrote:
> Today I was trying to define a UUID data type in the DSL language using the
> following definition in the leex .xlr file:
>
> UUID =
> [0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}
>
> and the corresponding rule:
> {UUID}       : {token, {uuid, TokenChars, TokenLine}}.

Are you sure that {8} is not treated as a macro name?

Also, you don't need to quote your dashes with a backslash in the
regexp.

--
Stanislaw Klekot
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Parsing UUIDs with Leex

PAILLEAU Eric
In reply to this post by Massimo Cesaro
Hi,

Atom's definition is invalid.
Should be :

ATOM = [a-z][a-zA-Z0-9_]*

Take also care of rule precedence.
UUID rule should have higher precedence than ATOM rule.

regards

Le 11/10/2019 à 16:22, Massimo Cesaro a écrit :
>
> ATOM = [a-zA-Z0-9_]*



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Parsing UUIDs with Leex

Massimo Cesaro
In reply to this post by Stanislaw Klekot

On Fri, Oct 11, 2019 at 6:30 PM Stanislaw Klekot <[hidden email]> wrote:
On Fri, Oct 11, 2019 at 04:22:32PM +0200, Massimo Cesaro wrote:
> Today I was trying to define a UUID data type in the DSL language using the
> following definition in the leex .xlr file:
>
> UUID =
> [0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}
>
> and the corresponding rule:
> {UUID}       : {token, {uuid, TokenChars, TokenLine}}.

Are you sure that {8} is not treated as a macro name?
 
Not really, I'm looking into leex source.


Also, you don't need to quote your dashes with a backslash in the
regexp.

Yup, better safe than sorry. It didn't change the outcome, though.

Best regards,
Massimo 
--
Stanislaw Klekot

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Led
Reply | Threaded
Open this post in threaded view
|

Re: Parsing UUIDs with Leex

Led
In reply to this post by PAILLEAU Eric

Atom's definition is invalid.
Should be :

ATOM = [a-z][a-zA-Z0-9_]*


ATOM = [a-z][a-zA-Z0-9_@]*

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Parsing UUIDs with Leex

Håkan Huss
In reply to this post by Massimo Cesaro
Den lör 12 okt. 2019 14:39Massimo Cesaro <[hidden email]> skrev:

On Fri, Oct 11, 2019 at 6:30 PM Stanislaw Klekot <[hidden email]> wrote:
On Fri, Oct 11, 2019 at 04:22:32PM +0200, Massimo Cesaro wrote:
> Today I was trying to define a UUID data type in the DSL language using the
> following definition in the leex .xlr file:
>
> UUID =
> [0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}
>
> and the corresponding rule:
> {UUID}       : {token, {uuid, TokenChars, TokenLine}}.

Are you sure that {8} is not treated as a macro name?
 
Not really, I'm looking into leex source.

Looking at the Leex docs for the supported regexes it seems that the {n} construct isn't supported.

/Håkan

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Parsing UUIDs with Leex

PAILLEAU Eric
In reply to this post by Led
To be exact,

ATOM = ([a-z][a-zA-Z0-9_@]*|'[^']*')


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Parsing UUIDs with Leex

Massimo Cesaro
In reply to this post by Håkan Huss
Right, although if you read the leex.erl source code it seems that it could. 
Whatever. I'll stick to UUIDs as strings.

Thank you all.

Massimo

On Sat, Oct 12, 2019 at 5:27 PM Håkan Huss <[hidden email]> wrote:
Den lör 12 okt. 2019 14:39Massimo Cesaro <[hidden email]> skrev:

On Fri, Oct 11, 2019 at 6:30 PM Stanislaw Klekot <[hidden email]> wrote:
On Fri, Oct 11, 2019 at 04:22:32PM +0200, Massimo Cesaro wrote:
> Today I was trying to define a UUID data type in the DSL language using the
> following definition in the leex .xlr file:
>
> UUID =
> [0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}
>
> and the corresponding rule:
> {UUID}       : {token, {uuid, TokenChars, TokenLine}}.

Are you sure that {8} is not treated as a macro name?
 
Not really, I'm looking into leex source.

Looking at the Leex docs for the supported regexes it seems that the {n} construct isn't supported.

/Håkan

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions