Sunday, July 05, 2009 2:47 PM
codingsanity
By Design bugs
Next time you wonder to yourself why a bug exists in Microsoft software, consider the possibility that Microsoft simply want it that way. Some time ago, I wanted to compare two XML documents. Growing despondent about the idea of writing such a system myself, I cast around for options, and encountered the XNodeEqualityComparer. I was thrilled, and made use of it throughout my code.
Some time later I started encountering problems. It seemed that the comparer was marking documents that were identical as being different. When we investigated, we found that this comparer was failing on two main issues. This first was the closing style of tags. It was picking up these two fragments as different:
<setting></setting>
<setting/>
I must admit I was a bit surprised. Virtually no software that I am aware of sees these two as different, although they are very slightly different according to the W3C specification. This was annoying, but not a complete show stopper. The next error was a little more of a problem. It seems that the XNodeEqualityComparer also picks up attribute ordering as making the documents different.
Thus it would see these two fragments as different:
<setting name="DefaultFileAcquisitionFolderPath" serializeAs="String">
<setting serializeAs="String" name="DefaultFileAcquisitionFolderPath">
Now, this one was a killer for me. Our XML was coming from various systems and they had slight differences in their attribute ordering. We could do nothing about these differences whatsoever. I logged the issue with Microsoft, wrote a workaround and forgot about it. After a short while it came back that they wouldn't fix it, they pretty much said that their implementation was correct. This startled me, since I was pretty sure that XML attribute ordering means absolutely nothing. I did some investigation and found this part of the W3C Specification section 3.1:
[Definition: The beginning of every non-empty XML element is marked by a start-tag.]
Start-tag
The Name in the start- and end-tags gives the element's type. [Definition: The Name-AttValue pairs are referred to as the attribute specifications of the element], [Definition: with the Name in each pair referred to as the attribute name ] and [Definition: the content of the AttValue (the text between the ' or " delimiters) as the attribute value.] Note that the order of attribute specifications in a start-tag or empty-element tag is not significant.
Please re-read that last line: "Note that the order of attribute specifications in a start-tag or empty-element tag is not significant."
Accordingly, I recreated the bug report (since there is no way to request one to be reopened), and included the above information. In the arguments that followed I pointed out that despite all the code that XNodeEqualityComparer calls (specifically the abstract DeepEquals on XElement), it to all intents and purposes does the following:
string value1 = node1.ToString();
string value2 = node2.ToString();
return value1 == value2;
Which makes me wonder what point XNodeQualityComparer has? It ignores the XML specification, ignores how XML itself works and provides no value over a simple ToString. In order to do this it has a great deal of code that is completely and utterly pointless.
My last communication from Microsoft before they closed the bug as By Design was the following:
Hi Sean,
This is by design.XNodeEqualityComparer was not designed to stricly adhere to the xml spec.Most people expect attribute ordering to be significant and hence XNodeEqualityComparer was designed that way.
thanks
Nithya Sampathkumar
Program Manager
So, there you have it. If you're using XML and are wondering why the results you're getting are not the same as what the specification says you should be getting, the answer is simple. Microsoft write their code to fit people's expectations of what the specification says rather than what it actually says. I was also a little taken aback about their assertion that most people consider attribute ordering to be significant. When I asked around no-one seemed to.
So, a question to you all: do any of you consider XML attribute ordering significant when comparing documents for equality?
Update: Well, the answer seems to be an overwhelming no, both here and in the reddit thread, so I'm confused about where Nithya gets her "Most people".
Anyway, I have created a little class that implements an XML comparison more, ahem, correctly than Microsoft's. I have also created a byte comparison which shows that Microsoft's implementation is virtually the same as a text compare, but twice as slow. You can read about it here.
Filed under: General, Code, Microsoft