Creating TAR Archives in Go

A short program to show how to write TAR-GZ and TAR-XZ (LZMA) archives. Note that I have not included an example for TAR-BZ2 because there is no easily-findable public library for doing so.

package main

import (
    "archive/tar"
    "compress/gzip"

    "fmt"
    "os"
    "io"
    "time"

    "github.com/ulikunitz/xz"
)

func addFile(tw *tar.Writer, filepath string) {
    data := fmt.Sprintf("I am data: %s\n", filepath)

    h := new(tar.Header)
    h.Name = filepath
    h.Size = int64(len(data))
    h.Mode =  int64(0666)
    h.ModTime = time.Now()

    // write the header to the tarball archive
    if err := tw.WriteHeader(h); err != nil {
        panic(err)
    }

    // copy the file data to the tarball 
    if _, err := io.WriteString(tw, data); err != nil {
        panic(err)
    }
}

func createTarGz() {
    f, err := os.Create("output.tar.gz")
    if err != nil {
        panic(err)
    }

    defer f.Close()

    gw := gzip.NewWriter(f)
    defer gw.Close()

    tw := tar.NewWriter(gw)
    defer tw.Close()

    addFile(tw, "aa")
    addFile(tw, "bb/cc")
}

func createTarXz() {
    f, err := os.Create("output.tar.xz")
    if err != nil {
        panic(err)
    }

    defer f.Close()

    xw, err := xz.NewWriter(f)
    if err != nil {
        panic(err)
    }

    defer xw.Close()

    tw := tar.NewWriter(xw)
    defer tw.Close()

    addFile(tw, "dd")
    addFile(tw, "ee/ff")
}

func main() {
    createTarGz()
    createTarXz()
}

Examine the outputs:

$ tar tzf output.tar.gz 
aa
bb/cc
$ tar xz -O - -f output.tar.gz aa
I am data: aa
$ tar xz -O - -f output.tar.gz bb/cc
I am data: bb/cc

$ tar tJf output.tar.xz
dd
ee/ff
I am data: bb/cc
$ tar xJ -O - -f output.tar.xz dd
I am data: dd
$ tar xJ -O - -f output.tar.xz ee/ff
I am data: ee/ff

RAR Archive Embedded VM

It turns out that the RAR specification allows for the support of a virtual machine called “RarVM”. This allows you to actually embed custom filters into your RARs using a simple instruction set. Though WinRAR’s implementation of RarVM is considered to be buggy (read: “insecure”.. see Known Bugs), it seems as if the unpopularity of this feature is relatively limited to WinRAR (and any other proprietary implementations) and not necessarily the specification itself.

It’s a cool feature, in principle.

See:

Writing and Reading 7-Zip Archives From Python

I don’t often need to read or write archives from code. When I do, and I don’t want to call a tool via shell-commands, I’ll use zip-files. Obviously there are better formats out there, but when it comes to library compatibility, tar and zip are the easiest possible formats to manipulate. If you’re desperate, you can even write a quick tar archiver with relative simplicity (the headers are mostly ASCII).

Obviously, the emphasis here has been on availability. My preferred format is 7-Zip (which uses LZMA compression). Though you don’t often see 7-Zip archives for download, I’ve been using this format for eight-years and haven’t looked back. The compression is good and the tool is every bit as easy as zip.

Unfortunately, there’s limited support for 7-Zip in Python. To the best of my knowledge, only the libarchive Python package can read and write 7-Zip archives. The libarchive Python package is developed and supported separately from the C library that it implements.

Though the library is structured to support any format that the libarchive library can (all major formats, and probably all of the minor ones), the Python project is outrightly labeled as a work-in-progress. 7-Zip is the only format explicitly supported for both reading and writing. Fortunately, it also supports libarchive‘s autodetection functionality. So, you can read/expand any archive, as long as you can afford the extra couple of milliseconds that the detection will cost you.

The focus of this project is to provide elegant archiving routines. Most of the API functions are implemented as generators.

Example

To enumerate the entries in an archive:

import libarchive

with libarchive.reader('test.7z') as reader:
    for e in reader:
        # (The entry evaluates to a filename.)
        print("> %s" % (e))

To extract the entries from an archive to the current directory (like a normal, Unix-based extraction):

import libarchive

for state in libarchive.pour('test.7z'):
    if state.pathname == 'dont/write/me':
        state.set_selected(False)
        continue

    # (The state evaluates to a filename.)
    print("Writing: %s" % (state))

To build an archive from a collection of files (omit the target for stdout):

import libarchive

for entry in libarchive.create(
                '7z', 
                ['/aa/bb', '/cc/dd'], 
                'create.7z'):
    print("Adding: %s" % (entry))