How to use Python’s “diff” functionality

Python natively provides the ability to compare two documents, and to produce a set of patch instructions to get from one to the other. This is often useful to 1) provide quick insight into the differences between the two (if any), or 2) provide a list of changes to derive a later version of a document, which is typically much lighter than sending the whole, updated document if the original document is already available on the receiving end.

This is an example of the few lines of code required to generate a list of those instructions, and how to apply them to derive the updated document.

Our documents, for the purpose of this example:

original = """

updated = """

If all you want is to get a list of adds and removes, use ndiff:

from difflib import ndiff

def get_updates(original, updated):
    """Return a 2-tuple of (adds, removes) describing the changes to get from

    diff = ndiff(original.split("\n"), updated.split("\n"))

    adds = set()
    deletes = set()
    for row in diff:
        diff_type = row[0]
        if diff_type == ' ':

        entry = row[2:]

        if diff_type == '+':
        elif diff_type == '-':

    return (list(adds), list(deletes))


updates = get_updates(original, updated)

updates contains a 2-tuple of adds and removes, respectively:

(['line7', 'line6'], ['line2', 'line1'])

If, on the other hand, you do actually need a full set of patch instructions, use SequenceMatcher:

from difflib import SequenceMatcher

def get_transforms(original, updated):
    """Get a list of patch instructions to get from ORIGINAL to UPDATED."""

    s = SequenceMatcher(None, original, updated)

    tag_mapping = { 'delete': '-',
                    'insert': '+',
                    'replace': '>' }

    transforms = []
    for tag, i1, i2, j1, j2 in s.get_opcodes():
        if tag == 'delete':
            transform = ('-', (i1, i2))
        elif tag == 'insert':
            transform = ('+', (i1, i2), updated[j1:j2])
        elif tag == 'replace':
            transform = ('>', (i1, i2), updated[j1:j2])
            transform = ('=', (i1, i2), (j1, j2))


    return transforms

def apply_transforms(original, transforms):
    """Execute the transform instructions returned from get_transforms() to
    derive UPDATED from ORIGINAL.

    updated = []
    for transform in transforms:
        if transform[0] == '-':
        elif transform[0] == '+':
        elif transform[0] == '>':
        else: # Equals.

    return ''.join(updated)


transforms = get_transforms(original, updated)

transforms contains:

[('-', (0, 12)), ('=', (12, 31), (0, 19)), ('+', (31, 31), 'line6\nline7\n')]

To derive updated from original:

updated_derived = apply_transforms(original, transforms)
print(updated == updated_derived)

Which displays:


Writing Your Own Timezone Implementation for Python

Python has the concept of “naive” and “aware” times. The former refers to a timezone-capable date/time object that hasn’t been assigned a timezone, and the latter refers to one that has.

However, Python only provides an interface for “tzinfo” implementations: classes that define a particular timezone. It does not provide the implementations themselves. So, you either have to do your own implementations, or use something like the widely used “pytz” or “pytzpure” (a pure-Python version).

This is a quick example of how to write your own, courtesy of Google:

from datetime import tzinfo, timedelta, datetime

class _TzBase(tzinfo):
    def utcoffset(self, dt):
        return timedelta(hours=self.get_offset()) + self.dst(dt)

    def _FirstSunday(self, dt):
        """First Sunday on or after dt."""
        return dt + timedelta(days=(6 - dt.weekday()))

    def dst(self, dt):
        # 2 am on the second Sunday in March
        dst_start = self._FirstSunday(datetime(dt.year, 3, 8, 2))
        # 1 am on the first Sunday in November
        dst_end = self._FirstSunday(datetime(dt.year, 11, 1, 1))

        if dst_start <= dt.replace(tzinfo=None) < dst_end:
            return timedelta(hours=1)
            return timedelta(hours=0)

    def tzname(self, dt):
        if self.dst(dt) == timedelta(hours=0):
            return self.get_tz_name()
            return self.get_tz_with_dst_name()

    def get_offset(self):
        """Returns the offset in hours (-5)."""
        raise NotImplementedError()

    def get_tz_name(self):
        """Returns the standard acronym (EST)."""
        raise NotImplementedError()
    def get_tz_with_dst_name(self):
        """Returns the DST version of the acronym ('EDT')."""        
        raise NotImplementedError()

class TzGmt(_TzBase):
    """Implementation of the EST timezone."""

    def get_offset(self):
        return 0

    def get_tz_name(self):
        return 'GMT'
    def get_tz_with_dst_name(self):
        return 'GMT'

class TzEst(_TzBase):
    """Implementation of the EST timezone."""

    def get_offset(self):
        return -5

    def get_tz_name(self):
        return 'EST'
    def get_tz_with_dst_name(self):
        return 'EDT'

Use it, like so:

from datetime import datetime

now_est =
now_gmt = now_est.astimezone(TzGmt())

This produces a datetime object with an EST timezone, and then uses it to produce a GMT time.

AppEngine Development Environment Module Restrictions

AppEngine has some very tight but obvious restrictions on what types of Python modules can be invoked from application code. The general rule of thumb is that modules that need filesystem access or C code can’t be used. So, which modules are allowed or disallowed? Which modules are partially implemented, or defined and completely empty (yes, there are/were some)?

Unfortunately, the only official list of such modules is very dated.

There was a point, in the not-too-distant past, that the reigning perception of AppEngine’s module support was that the development environment does no such restriction, leaving a dangerous and scary gap between what will definitely run on your system and what you can be sure will run in production.

It turns out that there is some protection in the development environment.. Maybe even complete protection.

The google/appengine/tools/devappserver2/python/ module appears to be wholly responsible for the loading of modules. At the top, there’s a sys.meta_path assignment. This is what appears as of version 1.8.4:

  sys.meta_path = [

This defines a series of module “finders” responsible for resolving imported modules. This is where restrictions are imposed. The following are descriptions/insights about each one.

StubModuleImportHook: Replaces complete modules with different ones.
ModuleOverrideImportHook: Adjust partially white-listed modules (symbols may be added, removed, or updated).
BuiltinImportHook: Imposes a white-list on builtin modules. This raises an ImportError on everything else.
CModuleImportHook: Imposes a white-list on C modules.
path_override_hook: Has an instance of PathOverrideImportHook. It looks like this module looks for modules in special paths (the kind scattered in the.
PyCryptoRandomImportHook: Fixes the loading of .
PathRestrictingImportHook: Makes sure any remaining imports come out of an accessible path.

If you have a question of what specific modules are involved, look in the module mentioned above. The first four finders are relatively concrete. Most of their modules are expressed in lists.

A Pure-Python Implementation of “pytz”

There is a problem with the standard “pytz” package: It’s awesome, but can’t be used on systems that don’t allow direct file access. I created “pytzpure” to account for this. It allows you to build-out data files as Python modules. As long as these modules are put into the path, the “pytzpure” module will provide the same exports as the original “pytz” package.

For export:

PYTHONPATH=. python pytzpure/tools/ /tmp/tzppdata


Verifying export path exists: /tmp/tzppdata
Verifying .
Writing zone tree.
(578) timezones written.
Writing country timezones.
Writing country names.

To use:

from datetime import datetime
from pytzpure import timezone
utc = timezone('UTC')
detroit = timezone('America/Detroit')
strftime('%H:%M:%S %z')
'16:34:37 -0400'

Dumping Raw Python from Dictionary

I wrote a simple tool to generate a Python string-representation of the given data. Note that this renders data very similar to JSON, with the exception of the handling of NULLs.

Example usage:

get_as_python({ 'data1': { 'data22': { 'data33': 44 }},
                'data2': ['aa','bb','cc'],
                'data3': ('dd','ee','ff',None) })

Output (notice that a dict does not carry order, as expected):

data1 = {"data22":{"data33":44}}
data3 = ["dd","ee","ff",None]
data2 = ["aa","bb","cc"]

PySecure is now Python 3 Compatible

Changes to PySecure for Python 3 compatibility have now been checked in and pushed to PyPI.

A large amount of the labor went into refactoring nearly every occurrence of strings for string/bytes correctness. I also did an internal refactor of all of the tests (which largely just invoke a bunch of the functionalities and rely on the right exceptions to fail out when they should).

Unfortunately, I discovered that libssh’s reverse port-forwarding appears to be broken in 0.6.0 (which is incompatible with 0.5.5, for its authentication calls). This has been registered as bug #126 in their tracker.

Using “dialog” for Nice, Easy, C-Based Console Dialogs

dialog is a great command-line-based dialog tool that let’s you construct twenty-three types of dialog screens, that resemble the best of any available dialog utilities.

It’s as simple as running the following from the command-line:

dialog --yesno "Yes or no, please." 6 30

Very few of the users of dialog probably know that it can be statically linked to provide the same functionality in a C application. It doesn’t help that there is almost no documentation on the subject.

This is an example of how to create a “yesno” dialog:

#include <curses.h>
#include <dialog.h>

int main()
    int rc;
    init_dialog(stdin, stderr);
    rc = dialog_yesno("title", "message", 0, 0);

    return rc;

I explicitly pre-include curses.h so dialog.h won’t go looking in the wrong place. It might be different in your situation.

To build:

gcc -o example example.c -L dialogpath -I dialogpath -ldialog -lncurses -lm

Just configure and build your dialog sources, and then use that path in the make line, above.

This program will return an integer representing which button was pressed (true/0, false/1), or whether the dialog was cancelled with ESC (255).

Progress of GDriveFS (Google Drive FUSE Adapter)

The GDriveFS project has picked-up a lot of momentum in the last couple of months. I original wrote it because there didn’t exist any other FUSE-traditional implementations of a Google Drive client. Due to the massive amount of complexity involved in keeping track of an account’s filesystem organization and integrating a useful amount of GD’s feature set, only a handful of projects were created, and they were mostly very limited.

Needless to say, primary development lagged on for a while, but one year and two-hundred commits later, it has a following.

Thanks to all of those who have been involved. I have been getting regular community contributions/inquiries/bug-fixes. All are welcomed to get their feet wet in one way, or another.

Automotive Trouble Code Lookup

A little while ago, I was doing some work on an automotive project, and looked for a web-service that could resolve trouble codes (DTCs) into messages. This is the same process that you might go through when you run to AutoZone to have them find the code(s) that explain your warning light, and the little OBDII reader dimly states “P0442”.

It turns out that there were more than a few clumsily-designed websites that would allow a human to do this, but there were few/none that provided a clean, machine-readable interface.

So, I spent a couple of hours and deployed using the data from ScanTool. Easy for humans. Easy for machines.

Status of PySecure

A couple of months ago, I was looking for a Python SSH/SFTP solution. The only one that turned up and had some credibility was Paramiko. It’s pure Python, and reliable. It works great, but it hasn’t moved beyond RSA and DSA keys. This proved a problem with OpenSSH’s default now being ECDSA.

I spent some time getting into ECDSA so that I could extend Paramiko to include it. I was either going to integrate python-ecdsa or resign myself to compromising the pure-Python nature of Paramiko and calling OpenSSL. However, right as I got to this point, I thought of libssh, and, sure enough, it [allegedly] supports ECDSA as of recently. I immediately began to write a Python library to make the whole process elegant and clean.

We’re nearing completion (see PySecure). It’s easy to connect to a host via password or key. Some of the available and tested features:

  • Local and reverse port forwarding
  • Open a remote shell
  • Enumerate remote files with SFTP.
  • Manipulate a file. This object has all of the standard filesystem functions, and is also a full “file-like” object. It can be read and written like any other file.
  • Remote filesystem recursion and mirroring.

I love the last feature.

I’m currently working on the X11-forwarding, but it requires some back-and-forth with the libssh developers. It also turns out that their EC support might need some debugging. It looks like they’re actively working on it. They’ve been very responsive.