XPath bug in old versions of ElementTree

I figured out why my XML parsing code works fine using the [pure-Python ElementTree XML parsing module][elementtree] but fails when using [the speedy and memory-optimized cElementTree XML parsing module][celementtree].

[The XPath 1.0 specification][xpath] says `’.’` is short-hand for `’self::node()’`, selecting a node itself.

Parsing an XML document and selecting the context node with ElementTree in Python 2.5:

>>> from xml.etree import ElementTree
>>> ElementTree.VERSION
‘1.2.6’
>>> doc = “BUG
>>> node1 = ElementTree.fromstring(doc).find(‘./Example’)
>>> node1

>>> node1.find(‘.’)

>>> node1.find(‘.’) == node1
True

See how the result of `node1.find(‘.’)` is the node itself? [As it should be][selfnode].

Parsing an XML document and selecting the context node with cElementTree in Python 2.5:

>>> from xml.etree import cElementTree
>>> doc = “BUG
>>> node2 = cElementTree.fromstring(doc).find(‘./Example’)
>>> node2

>>> node2.find(‘.’)
>>> node2.find(‘.’) == node2
False

Balls. The result of `node2.find(‘.’)` is `None`.

However! I have a kludgey work-around that works whether you use ElementTree or cElementTree. Use `’./’` instead of `’.’`:

>>> node1.find(‘./’)

>>> node1.find(‘./’) == node1
True
>>> node2.find(‘./’)

>>> node2.find(‘./’) == node2
True

*Kludgey because `’./’` is not a valid XPath expression.*

So we are back on track. Also works for Python 2.6 which has the same version of ElementTree.

Fortunately Python 2.7 got a new version of ElementTree and the bug is fixed:

>>> from xml.etree import ElementTree
>>> ElementTree.VERSION
‘1.3.0’
>>> doc = “BUG
>>> node3 = ElementTree.fromstring(doc).find(‘./Example’)
>>> node3

>>> node3.find(‘.’)

>>> node3.find(‘.’) == node3
True

However! They also fixed my kludgey work-around:

>>> node3.find(‘./’)
>>> node3.find(‘./’) == node3
False

So I can’t code something that works for all three versions. This is annoying. I was hoping to just replace ElementTree with the C version, makes my code run in one third the time (the XML parts of it run in one tenth the time). And cannot install any compiled modules – the code can only rely on Python 2.5’s standard library.

[celementtree]: http://effbot.org/zone/celementtree.htm
[elementtree]: http://effbot.org/zone/element-index.htm
[xpath]: http://www.w3.org/TR/xpath/
[selfnode]: http://www.w3.org/TR/xpath/#path-abbrev

Leave a Reply

Your email address will not be published. Required fields are marked *