SQLite as an Archive (or an Easy, Compressed Filesystem)

The sqlar project was an experiment to test the practicality of a SQLite archive format by the lead SQLite developer. I say “was” because, like sqlite, it seldom needs modification and appears to be stable.

What makes it fun is that the result is not internalized and inaccessible. Rather, it’s an intact SQLite database that you can readily inspect from the SQLite client. Therefore, the compression is per-blob and the purpose of the tool is merely to make it convenient to add records corresponding to files. The reverse should be true as well: if you create a sqlite DB and populate it with zlib-compressed data, you should simply be able to dump it using the sqlar tool.

To test sqlar out, either checkout or download a copy of the source (which embeds SQLite within it). You may download it here: SQLite Archiver.

Extract it and build:

$ mkdir sqlar
$ cd sqlar/
$ tar xzf ../sqlar-src-15adeb2f9a.tar.gz 
$ cd sqlar-src-15adeb2f9a/

$ make
gcc -g -I. -D_FILE_OFFSET_BITS=64 -Wall -Werror -c sqlite3.c -DSQLITE_THREADSAFE=0 -DSQLITE_OMIT_LOAD_EXTENSION sqlite3.c
gcc -g -I. -D_FILE_OFFSET_BITS=64 -Wall -Werror -o sqlar sqlar.c sqlite3.o -lz

Run a test:

$ mkdir test_root
$ echo "test1" > test_root/test1
$ echo "test2" > test_root/test2
$ echo "test3" > test_root/test3

$ ./sqlar -v test_root.sqlar test_root
  added: test_root
  added: test_root/test1
  added: test_root/test3
  added: test_root/test2

$ mkdir output
$ cd output/

$ ../sqlar -x -v ../test_root.sqlar 
test_root
test_root/test1
test_root/test3
test_root/test2

Visually inspect the database file:

$ cd ..

$ sqlite3 test_root.sqlar 
SQLite version 3.8.2 2013-12-06 14:53:30
Enter ".help" for instructions
Enter SQL statements terminated with a ";"

sqlite> .schema
CREATE TABLE sqlar(
  name TEXT PRIMARY KEY,
  mode INT,
  mtime INT,
  sz INT,
  data BLOB
);

sqlite> select * from sqlar;
test_root|16893|1455064403|0|
test_root/test1|33204|1455064393|6|test1

test_root/test3|33204|1455064403|6|test3

test_root/test2|33204|1455064399|6|test2

sqlite> 

Notice that no compression was performed because the files are so trivial. zlib is used for compression every time unless you explicitly turn it off using “-n”.

You can also mount the archive using FUSE:

$ # This requires libfuse-dev to be installed.
$ make sqlarfs
gcc -g -I. -D_FILE_OFFSET_BITS=64 -Wall -Werror -o sqlarfs sqlarfs.c sqlite3.o -lz -lfuse
$ mkdir mount
$ ./sqlarfs test_root.sqlar `pwd`/mount

In another terminal, list the contents:

$ ls -l
total 0
dr-xr-xr-x 1 dustin dustin 0 Feb  9 19:51 test_root

$ cd test_root/

$ ls -l
total 0
-r--r--r-- 1 dustin dustin   6 Feb  9 19:33 test1
-r--r--r-- 1 dustin dustin   6 Feb  9 19:33 test2
-r--r--r-- 1 dustin dustin   6 Feb  9 19:33 test3
-r--r--r-- 1 dustin dustin 576 Feb  9 19:51 test4

$ cat test4 
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.

This was a later version of our earlier archive where I added a highly compressible text-file (test4).

It’s a fun little tool.

Advertisements

DEFLATE Socket Compression in Python

Properly-working DEFLATE compression is elusive in Python. Thanks to wtolson, I’ve found such a solution.

This is an example framework for establishing the compression and decompression objects:

def activate_zlib():
    import zlib

    wbits = -zlib.MAX_WBITS

    compress = zlib.compressobj(level, zlib.DEFLATED, wbits)

    compressor = lambda x: compress.compress(x) + \
                            compress.flush(zlib.Z_SYNC_FLUSH)

    decompress = zlib.decompressobj(wbits)
    
    decompressor = lambda x: decompress.decompress(x) + \
                                decompress.flush()
    
    return (compressor, decompressor)

With this example, you’ll pass all of your outgoing data through the compressor, and all of your incoming data to the decompressor. As always, you’ll do a read-loop until you’ve decompressed the expected number of bytes.