Fix for A=<<1>>

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Fix for A=<<1>>

James Fish
That the start of  "A=<<1>>" is incorrectly tokenized into A, =<, <
has always bothered me, so here's a patch for erl_scan.erl that fixes
it.  Special casing this in the scanner is a bit grotty, but it's
better than having a special case in the documentation.

I'm posting this here instead of erlang-patches to see if anyone can
come up with a reason why this is a bad idea (besides being an odd
special case).

(Apologies for the manual patch, BTW.)

James


After:

%% Punctuation characters and operators, first recognise multiples.

insert:

%% The first clause looks for "=<<" and splits it into "=","<<" so
%% matches like "=<<1>>" aren't tokenized as "=<","<".
scan1([$=,$<,$<|Cs], Toks, Pos) ->
    scan1(Cs, [{'<<',Pos},{'=',Pos}|Toks], Pos);



Reply | Threaded
Open this post in threaded view
|

Fix for A=<<1>>

Ulf Wiger-4

I quickly glanced at erl_scan.erl to see if my instinctive
objection to your patch was correct, and it was... but
leading to a further question/complaint:

erl_scan.erl is designed to be reentrant. Thus, your code
may not always work, since it might happen that the split
into chunks will occur right inside "=<<".

What I observed in erl_scan.erl is that this kind of
cheating is already done when matching "<<", ">>", ">=",
"->", etc.

pre_escape/2 does things the hard (reentrant) way, but e.g.
scan_escape/2 cheats.


Or am I overlooking some magic code snippet that guarantees
that there are always enough bytes in the scan buffer to
ensure that the right function clause matches?

(BTW, xmerl_scan.erl, which I wrote, suffers from the same
problem; matching multiples in the function head is great
for readability, but not if you want your scanner to be
reentrant.)

/Uffe

On Thu, 1 May 2003, James Hague wrote:

>That the start of "A=<<1>>" is incorrectly tokenized into
>A, =<, < has always bothered me, so here's a patch for
>erl_scan.erl that fixes it.  Special casing this in the
>scanner is a bit grotty, but it's better than having a
>special case in the documentation.
>
>I'm posting this here instead of erlang-patches to see if
>anyone can come up with a reason why this is a bad idea
>(besides being an odd special case).
>
>(Apologies for the manual patch, BTW.)
>
>James
>
>
>After:
>
>%% Punctuation characters and operators, first recognise multiples.
>
>insert:
>
>%% The first clause looks for "=<<" and splits it into "=","<<" so
>%% matches like "=<<1>>" aren't tokenized as "=<","<".
>scan1([$=,$<,$<|Cs], Toks, Pos) ->
>    scan1(Cs, [{'<<',Pos},{'=',Pos}|Toks], Pos);
>
>

--
Ulf Wiger, Senior Specialist,
   / / /   Architecture & Design of Carrier-Class Software
  / / /    Strategic Product & System Management
 / / /     Ericsson AB, Connectivity and Control Nodes



Reply | Threaded
Open this post in threaded view
|

Fix for A=<<1>>

Robert Virding-5
The old scanner worked in two passes, the pre_XXX functions just collect characters until the end of the form is reached, then the scan_XXX functions do the actual tokenising which is much easier when you know you all there is. This was done to make it simpler to handle the reentrant handling.

As an aside leex generates a one pass scanner which handles the reentrant collecting and tokenising in one pass which is easy in generated code.

Robert

----- Original Message -----
From: "Ulf Wiger" <etxuwig>
To: "James Hague" <james>
Cc: <erlang-questions>
Sent: Friday, May 02, 2003 1:47 PM
Subject: Re: Fix for A=<<1>>


>
> I quickly glanced at erl_scan.erl to see if my instinctive
> objection to your patch was correct, and it was... but
> leading to a further question/complaint:
>
> erl_scan.erl is designed to be reentrant. Thus, your code
> may not always work, since it might happen that the split
> into chunks will occur right inside "=<<".
>
> What I observed in erl_scan.erl is that this kind of
> cheating is already done when matching "<<", ">>", ">=",
> "->", etc.
>
> pre_escape/2 does things the hard (reentrant) way, but e.g.
> scan_escape/2 cheats.
>
>
> Or am I overlooking some magic code snippet that guarantees
> that there are always enough bytes in the scan buffer to
> ensure that the right function clause matches?
>
> (BTW, xmerl_scan.erl, which I wrote, suffers from the same
> problem; matching multiples in the function head is great
> for readability, but not if you want your scanner to be
> reentrant.)
>
> /Uffe
>
> On Thu, 1 May 2003, James Hague wrote:
>
> >That the start of "A=<<1>>" is incorrectly tokenized into
> >A, =<, < has always bothered me, so here's a patch for
> >erl_scan.erl that fixes it.  Special casing this in the
> >scanner is a bit grotty, but it's better than having a
> >special case in the documentation.
> >
> >I'm posting this here instead of erlang-patches to see if
> >anyone can come up with a reason why this is a bad idea
> >(besides being an odd special case).
> >
> >(Apologies for the manual patch, BTW.)
> >
> >James
> >
> >
> >After:
> >
> >%% Punctuation characters and operators, first recognise multiples.
> >
> >insert:
> >
> >%% The first clause looks for "=<<" and splits it into "=","<<" so
> >%% matches like "=<<1>>" aren't tokenized as "=<","<".
> >scan1([$=,$<,$<|Cs], Toks, Pos) ->
> >    scan1(Cs, [{'<<',Pos},{'=',Pos}|Toks], Pos);
> >
> >
>
> --
> Ulf Wiger, Senior Specialist,
>    / / /   Architecture & Design of Carrier-Class Software
>   / / /    Strategic Product & System Management
>  / / /     Ericsson AB, Connectivity and Control Nodes
>
>