Quantcast

Patch to xmerl_scan to fix character-reference normalization in attribute values

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Patch to xmerl_scan to fix character-reference normalization in attribute values

Tom Moertel-2
The following short patch fixes a bug in xmerl that causes character references in attribute values to be normalized incorrectly:

    git fetch https://github.com/tmoertel/otp.git xmerl_attr_charref_fix

Explanation:

Section 3.3.3 of the XML Recommendation gives the rules for
attribute-value normalization.  One of those rules requires
that character references not be re-normalized after being
replaced with the referenced characters:

    For a character reference, append the referenced
    character to the normalized value.

And, in particular:

    Note that if the unnormalized attribute value contains
    a character reference to a white space character other
    than space (#x20), the normalized value contains the
    referenced character itself (#xD, #xA or #x9).


In xmerl_scan, however, character references in attributes are
normalized again after replacement.  For example, the
character reference "&#xA" in the following XML document gets
normalized (incorrectly) into a space when parsed:

    2> xmerl_scan:string("<root x='&#xA;'/>").
    {... [{xmlAttribute,x,[],[],[],[],1,[]," ",false}] ...}

This short patch restores the correct behavior:

    2> xmerl_scan:string("<root x='&#xA;'/>").
    {... [{xmlAttribute,x,[],[],[],[],1,[],"\n",false}] ...}

NOTE:  This change does not include tests because I could not
find a test suite for xmerl.



Cheers,
Tom


_______________________________________________
erlang-patches mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-patches
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Patch to xmerl_scan to fix character-reference normalization in attribute values

Henrik Nord-2
On 04/28/2011 11:24 PM, Tom Moertel wrote:
The following short patch fixes a bug in xmerl that causes character references in attribute values to be normalized incorrectly:

    git fetch https://github.com/tmoertel/otp.git xmerl_attr_charref_fix

Explanation:

Section 3.3.3 of the XML Recommendation gives the rules for
attribute-value normalization.  One of those rules requires
that character references not be re-normalized after being
replaced with the referenced characters:

    For a character reference, append the referenced
    character to the normalized value.

And, in particular:

    Note that if the unnormalized attribute value contains
    a character reference to a white space character other
    than space (#x20), the normalized value contains the
    referenced character itself (#xD, #xA or #x9).


In xmerl_scan, however, character references in attributes are
normalized again after replacement.  For example, the
character reference "&#xA" in the following XML document gets
normalized (incorrectly) into a space when parsed:

    2> xmerl_scan:string("<root x='&#xA;'/>").
    {... [{xmlAttribute,x,[],[],[],[],1,[]," ",false}] ...}

This short patch restores the correct behavior:

    2> xmerl_scan:string("<root x='&#xA;'/>").
    {... [{xmlAttribute,x,[],[],[],[],1,[],"\n",false}] ...}

NOTE:  This change does not include tests because I could not
find a test suite for xmerl.



Cheers,
Tom

_______________________________________________ erlang-patches mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-patches
Your branch is included in 'opu'
If nothing major breaks you it will be merged into 'dev' shortly

Thank you for the contribution!

-- 
/Henrik Nord Erlang/OTP

_______________________________________________
erlang-patches mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-patches
Loading...