Writing and Reading 7-Zip Archives From Python

I don’t often need to read or write archives from code. When I do, and I don’t want to call a tool via shell-commands, I’ll use zip-files. Obviously there are better formats out there, but when it comes to library compatibility, tar and zip are the easiest possible formats to manipulate. If you’re desperate, you can even write a quick tar archiver with relative simplicity (the headers are mostly ASCII).

Obviously, the emphasis here has been on availability. My preferred format is 7-Zip (which uses LZMA compression). Though you don’t often see 7-Zip archives for download, I’ve been using this format for eight-years and haven’t looked back. The compression is good and the tool is every bit as easy as zip.

Unfortunately, there’s limited support for 7-Zip in Python. To the best of my knowledge, only the libarchive Python package can read and write 7-Zip archives. The libarchive Python package is developed and supported separately from the C library that it implements.

Though the library is structured to support any format that the libarchive library can (all major formats, and probably all of the minor ones), the Python project is outrightly labeled as a work-in-progress. 7-Zip is the only format explicitly supported for both reading and writing. Fortunately, it also supports libarchive‘s autodetection functionality. So, you can read/expand any archive, as long as you can afford the extra couple of milliseconds that the detection will cost you.

The focus of this project is to provide elegant archiving routines. Most of the API functions are implemented as generators.

Example

To enumerate the entries in an archive:

import libarchive

with libarchive.reader('test.7z') as reader:
    for e in reader:
        # (The entry evaluates to a filename.)
        print("> %s" % (e))

To extract the entries from an archive to the current directory (like a normal, Unix-based extraction):

import libarchive

for state in libarchive.pour('test.7z'):
    if state.pathname == 'dont/write/me':
        state.set_selected(False)
        continue

    # (The state evaluates to a filename.)
    print("Writing: %s" % (state))

To build an archive from a collection of files (omit the target for stdout):

import libarchive

for entry in libarchive.create(
                '7z', 
                ['/aa/bb', '/cc/dd'], 
                'create.7z'):
    print("Adding: %s" % (entry))
Advertisements