Quantcast

xmerl_scan for XMPP -- too much lookahead?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

xmerl_scan for XMPP -- too much lookahead?

Tony Garnock-Jones-2
Hi all,

I'm trying to use xmerl_scan to handle XMPP stanzas, and I think I've
found a problem with it. I don't think it's intended for streaming XML
in XMPP-like interleaved-request-and-response-document situations.

The attached program feeds the string "<opentag>" to xmerl_scan. I would
have expected to receive the open-tag event before blocking for more
data, but instead it requires at least one more character of input data
before it will emit the open-tag event! If I instead pass "<opentag> ",
with a space after the close-bracket, it emits the open-tag event as
expected, plus the start of an xmlText, before blocking for more data.

My question, then, is:

  Is this a bug? Should xmerl_scan supply the open-tag event before
  blocking, when it is fed "<opentag>"?

(The specific context of this problem is dealing with the
<stream:stream> sent by the server at the handshake stage of XEP-114.)

Regards,
  Tony

P.S.: to run the attached program,
  $ erlc lookaheadbug.erl && erl -run lookaheadbug go
--
 [][][] Tony Garnock-Jones     | Mob: +44 (0)7905 974 211
   [][] LShift Ltd             | Tel: +44 (0)20 7729 7060
 []  [] http://www.lshift.net/ | Email: [hidden email]

-module(lookaheadbug).
-include_lib("xmerl/include/xmerl.hrl").
-export([go/0]).

go() ->
    xmerl_scan:string("<opentag>",
                      [{event_fun, fun handle_event/2},
                       {continuation_fun, fun get_more_data/3}]).

get_more_data(Continue, Exception, ScannerState) ->
    io:format("blocking for more data~n"),
    case get(tag_received) of
        true ->
            io:format("hooray!~n"),
            Exception(ScannerState);
        _ ->
            io:format("oh dear!~n"),
            throw(we_should_have_evented_a_tag_open_by_now)
    end.

handle_event(Event, ScannerState) ->
    io:format("parser got event:~n~p~n", [Event]),
    case {Event#xmerl_event.event, Event#xmerl_event.data} of
        {started, #xmlElement{}} ->
            put(tag_received, true);
        _ ->
            ok
    end,
    ScannerState.



________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: xmerl_scan for XMPP -- too much lookahead?

Gleb Peregud
Hi

You could have seen it already :) But nevertheless there is exmpp [1]
library from Process One. It supports such things:

1: http://support.process-one.net/doc/display/EXMPP/

HTH.
Gleb Peregud

On Wed, Sep 23, 2009 at 15:34, Tony Garnock-Jones <[hidden email]> wrote:

> Hi all,
>
> I'm trying to use xmerl_scan to handle XMPP stanzas, and I think I've
> found a problem with it. I don't think it's intended for streaming XML
> in XMPP-like interleaved-request-and-response-document situations.
>
> The attached program feeds the string "<opentag>" to xmerl_scan. I would
> have expected to receive the open-tag event before blocking for more
> data, but instead it requires at least one more character of input data
> before it will emit the open-tag event! If I instead pass "<opentag> ",
> with a space after the close-bracket, it emits the open-tag event as
> expected, plus the start of an xmlText, before blocking for more data.
>
> My question, then, is:
>
>  Is this a bug? Should xmerl_scan supply the open-tag event before
>  blocking, when it is fed "<opentag>"?
>
> (The specific context of this problem is dealing with the
> <stream:stream> sent by the server at the handshake stage of XEP-114.)
>
> Regards,
>  Tony
>
> P.S.: to run the attached program,
>  $ erlc lookaheadbug.erl && erl -run lookaheadbug go
> --
>  [][][] Tony Garnock-Jones     | Mob: +44 (0)7905 974 211
>   [][] LShift Ltd             | Tel: +44 (0)20 7729 7060
>  []  [] http://www.lshift.net/ | Email: [hidden email]
>
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: xmerl_scan for XMPP -- too much lookahead?

Tony Garnock-Jones-2
Hi Gleb,

Yes, thanks, I've seen that already. We're already using it, in fact. If
exmpp didn't exist, though, the bug (?) in xmerl_scan would make it
difficult (impossible?) to write a pure-Erlang XEP-114 implementation
using xmerl.

Regards,
  Tony


Gleb Peregud wrote:
> Hi
>
> You could have seen it already :) But nevertheless there is exmpp [1]
> library from Process One. It supports such things:
>
> 1: http://support.process-one.net/doc/display/EXMPP/
>
> HTH.
> Gleb Peregud
--
 [][][] Tony Garnock-Jones     | Mob: +44 (0)7905 974 211
   [][] LShift Ltd             | Tel: +44 (0)20 7729 7060
 []  [] http://www.lshift.net/ | Email: [hidden email]

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: xmerl_scan for XMPP -- too much lookahead?

Mickaël Rémond
In reply to this post by Gleb Peregud
Hello,

Le 23 sept. 2009 à 16:07, Gleb Peregud a écrit :

> Hi
>
> You could have seen it already :) But nevertheless there is exmpp [1]
> library from Process One. It supports such things:
>
> 1: http://support.process-one.net/doc/display/EXMPP/

Yes, come on Tony, use a real tool dedicated to the task ;)

Our test and all reports we have shows it is very efficient.
Let us know if you have questions, reports or improvements,

--
Mickaël Rémond
  http://www.process-one.net/





________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: xmerl_scan for XMPP -- too much lookahead?

Ulf Wiger-3
In reply to this post by Tony Garnock-Jones-2

Tony,

The support for streaming in xmerl_scan is quite hackish,
and is bound to be broken one way or another. I doubt that
it is feasible to correctly handle streams given how
xmerl_scan works.

As it is, you could reasonably ask of it to be a little
bit more discerning - right now, xmerl_eventp only breaks
at whitespace, which is very conservative. The main
problem is that xmerl_scan is undisciplined when it
comes to ensuring that it has enough characters to
pattern-match in the current function head.

E.g.

%% [75] ExternalID ::= 'SYSTEM' S SystemLiteral
%%                   | 'PUBLIC' S PubidLiteral S SystemLiteral
scan_doctype1([], S=#xmerl_scanner{continuation_fun = F}) ->
     F(fun(MoreBytes, S1) -> scan_doctype1(MoreBytes, S1) end,
       fun(S1) -> ?fatal(unexpected_end, S1) end,
       S);
scan_doctype1("PUBLIC" ++ T, S0) ->
     ...

If given a stream fragment, like "PUBL", the matching
above will fail, and xmerl_scan will derail. THIS is
a serious bug - and I'm originally at fault. ;-)

Have you tried using xmerl_sax_parser instead?
The plan is to replace xmerl_scan completely for stream
parsing. And given this, xmerl_eventp is unlikely to see
any major improvements.

BR,
Ulf W



Tony Garnock-Jones wrote:

> Hi all,
>
> I'm trying to use xmerl_scan to handle XMPP stanzas, and I think I've
> found a problem with it. I don't think it's intended for streaming XML
> in XMPP-like interleaved-request-and-response-document situations.
>
> The attached program feeds the string "<opentag>" to xmerl_scan. I would
> have expected to receive the open-tag event before blocking for more
> data, but instead it requires at least one more character of input data
> before it will emit the open-tag event! If I instead pass "<opentag> ",
> with a space after the close-bracket, it emits the open-tag event as
> expected, plus the start of an xmlText, before blocking for more data.
>
> My question, then, is:
>
>   Is this a bug? Should xmerl_scan supply the open-tag event before
>   blocking, when it is fed "<opentag>"?
>
> (The specific context of this problem is dealing with the
> <stream:stream> sent by the server at the handshake stage of XEP-114.)
>
> Regards,
>   Tony
>
> P.S.: to run the attached program,
>   $ erlc lookaheadbug.erl && erl -run lookaheadbug go
>
>
> ------------------------------------------------------------------------
>
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org


--
Ulf Wiger
CTO, Erlang Training & Consulting Ltd
http://www.erlang-consulting.com

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: xmerl_scan for XMPP -- too much lookahead?

Tony Garnock-Jones-2
Ulf Wiger wrote:
> Have you tried using xmerl_sax_parser instead?
> The plan is to replace xmerl_scan completely for stream
> parsing. And given this, xmerl_eventp is unlikely to see
> any major improvements.

Aha! Thank you, Ulf, I will try that.

Regards,
  Tony
--
 [][][] Tony Garnock-Jones     | Mob: +44 (0)7905 974 211
   [][] LShift Ltd             | Tel: +44 (0)20 7729 7060
 []  [] http://www.lshift.net/ | Email: [hidden email]

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Loading...