Python: Parsing XML and Retaining the Comments

By default, Python’s built-in ElementTree module strips comments as it reads them. The solution is just obscure enough to be hard to find.

import xml.etree.ElementTree as ET

class _CommentedTreeBuilder(ET.TreeBuilder):
    def comment(self, data):
        self.start('!comment', {})
        self.data(data)
        self.end('!comment')

def parse(filepath):
    ctb = _CommentedTreeBuilder()
    xp = ET.XMLParser(target=ctb)
    tree = ET.parse(filepath, parser=xp)

    root = tree.getroot()
    # ...

When enumerating the parsed nodes, the comments will have a tag-name of “!comment”.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s