Python: Parsing XML and Retaining the Comments

By default, Python’s built-in ElementTree module strips comments as it reads them. The solution is just obscure enough to be hard to find.

import xml.etree.ElementTree as ET

class _CommentedTreeBuilder(ET.TreeBuilder):
    def comment(self, data):
        self.start('!comment', {})
        self.data(data)
        self.end('!comment')

def parse(filepath):
    ctb = _CommentedTreeBuilder()
    xp = ET.XMLParser(target=ctb)
    tree = ET.parse(filepath, parser=xp)

    root = tree.getroot()
    # ...

When enumerating the parsed nodes, the comments will have a tag-name of “!comment”.

Advertisements