SV: xmerl usage question

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

SV: xmerl usage question

Fredrik Linder-2
Hi Chandrashekhar
 
First: It is an error in an external dtd spec to twice specify an element or its attrlst. I think it is allowed to re-define elements in the <!DOCTYPE tag however (not sure about this though).
 
So back to xmerl:
 
When xmerl parses the given string it will insert all <!ELEMENT and <!ATTRLST information it finds into the rules (except the #FIXED information), and if there already is such information there it will generate the error you've seen. Perfectly as it should be.
 
Normally xmerl creates a *new* table for each call to xmerl_scan:string/X, and hence will not generate that error.
 
I'm I correct if I guess that you like to read the dtd files only ones? If so, what you probably need to do is to at the second++ run *not* read into the rules table and later swap to the one you earlier read. You would proabably also need to play with the fetch options when playing with the rules options in this way.
 
And now a little about how we utilize this to only read the dtd files ones:
 
The way we do is to first parse a fake xml with all dtd information in it using the {prolog, stop} and rules options, to initialize the rules.
 
Later we set the fetch options to choose the correct rules table (instead of reading the dtd files) that matched the incoming dtd spec, and switch to that rule set when the <!DOCTYPE element is ending.
 
Good luck
/Fredrik

________________________________

Fr?n: owner-erlang-questions genom Chandrashekhar Mullaparthi
Skickat: on 2004-04-14 11:16
Till: Question Erlang
?mne: xmerl usage question



Hi all,

I'm trying to specify my own ETS table when using xmerl for parsing as
follows. But the request fails the second time as below.

7> FetchFun = fun(_, State) -> {ok, {string, Dtd}, State} end.
#Fun<erl_eval.11.1870983>

19> xmerl_scan:string(Xml, [{fetch_fun, FetchFun}, {rules, myrules}]).
{{xmlElement,'MyRequest',
              [],
              1,
              [],
              [{xmlText,[{'MyRequest',1}],1,[],"\n"},
               {xmlElement,'Elem1',
                           [{'MyRequest',1}],
                           2,
                           [],
                           
[{xmlText,[{'Elem1',2},{'MyRequest',1}],1,[],"abcd"}],
                           [],
                           'Elem1',
                           [],
                           {xmlNamespace,[],[]}},
               {xmlText,[{'MyRequest',1}],3,[],"\n"},
               {xmlElement,'Elem2',
                           [{'MyRequest',1}],
                           4,
                           [],
                           
[{xmlText,[{'Elem2',4},{'MyRequest',1}],1,[],"abcd"}],
                           [],
                           'Elem2',
                           [],
                           {xmlNamespace,[],[]}},
               {xmlText,[{'MyRequest',1}],5,[],"\n"}],
              [],
              'MyRequest',
              [],
              {xmlNamespace,[],[]}},
  "\n"}
20> xmerl_scan:string(Xml, [{fetch_fun, FetchFun}, {rules, myrules}]).
871- fatal: {already_defined,'MyRequest'}
** exited: {fatal,{{already_defined,'MyRequest'},2,20}} **

Is there any way around this?? I see where this is happening, but I
don't understand why xmerl flags this as an error. any help much
appreciated.

thanks,
Chandru



Reply | Threaded
Open this post in threaded view
|

SV: xmerl usage question

chandru
Hi Fredrik,

On 14 Apr 2004, at 12:33, Fredrik Linder wrote:

> Hi Chandrashekhar
>
> First: It is an error in an external dtd spec to twice specify an
> element or its attrlst. I think it is allowed to re-define elements in
> the <!DOCTYPE tag however (not sure about this though).
>
> So back to xmerl:
>
> When xmerl parses the given string it will insert all <!ELEMENT and
> <!ATTRLST information it finds into the rules (except the #FIXED
> information), and if there already is such information there it will
> generate the error you've seen. Perfectly as it should be.

The problem here is not that I have duplicate elements in my DTD. The
problem is that when the same DTD is parsed again when parsing the next
chunk of XML data, the parser complains about duplicate elements,
because that element already exists in the rules table from the
previous parse.

>
> Normally xmerl creates a *new* table for each call to
> xmerl_scan:string/X, and hence will not generate that error.

There seems to be a bug where the table is not deleted after the parse,
resulting in the number of ETS tables to keep growing until the more
ETS tables can be created (the default limit seems to be 1400) and the
node then crashes. I haven't tracked down this bug yet.

>
> I'm I correct if I guess that you like to read the dtd files only
> ones? If so, what you probably need to do is to at the second++ run
> *not* read into the rules table and later swap to the one you earlier
> read. You would proabably also need to play with the fetch options
> when playing with the rules options in this way.
>
> And now a little about how we utilize this to only read the dtd files
> ones:
>
> The way we do is to first parse a fake xml with all dtd information in
> it using the {prolog, stop} and rules options, to initialize the
> rules.
>
> Later we set the fetch options to choose the correct rules table
> (instead of reading the dtd files) that matched the incoming dtd spec,
> and switch to that rule set when the <!DOCTYPE element is ending.

This sounds like a good idea. I will try it out.

thanks
Chandru

PS: I'm using xmerl-0.18