PriorityQueue versus heapq

Python’s queue.PriorityQueue queue is actually based on the heapq module, but provides a traditional Python queue interface. The difference appears to largely be the interface: OO vs. passing a list (which heapq can act on directly).

The documentation for PriorityQueue is a little misleading, at least when you didn’t take a moment to think about how the sorting works. This is what it says:

A typical pattern for entries is a tuple in the form: (priority_number, data)

I ran into an issue where I was getting an error when the second parameter (the actual item) couldn’t be used to sort. Whereas the documentation implies that there’s a convention the expects the priority to be in the first spot, it looks like the sort is just evaluating the entire tuple. This means that, when I was trying to insert with a priority that was already in the queue, the second item of both was being compared (this is how tuples are sorted). Curiously, I guess most of my previous use-cases involved priorities (such as timestamps) that were either sparse enough or the data happened to be sortable. Crap.

Now, looking back at the documentation for heapq, I’ve noticed one of the examples:

>>> h = []
>>> heappush(h, (5, 'write code'))
>>> heappush(h, (7, 'release product'))
>>> heappush(h, (1, 'write spec'))
>>> heappush(h, (3, 'create tests'))
>>> heappop(h)
(1, 'write spec')

So, it turns out that heapq also [hastily] recommends using tuples, but we now know that this comes with a lazy assumption: It only works if you’re willing to allow it to sort by the item itself if two or more items share a priority.

So, in conclusion, the nicest strategy is to use an object that has the “rich-comparison methods” defined on it (e.g. __lt__ and __eq__) rather than tuples. This will allow you to constrain the comparison operations.

Advertisements