A Complete Huffman Encoder Implementation

I’ve written a Huffman implementation for the purpose of completely showing how to build the frequency-table, Huffman tree, encoding table, as well as how to serialize the tree, store the tree and data to a file, restore both structures from a file, decode the data using the tree, and how to make this more fun using Python.

This is the test code (test_steps):

clear_bytes = test_get_data()
_dump_hex("Original data:", clear_bytes)

tu = TreeUtility()

# Build encoding table and tree.

he = Encoding()
encoding = he.get_encoding(clear_bytes)

print("Weights:n{0}".format(pprint.pformat(encoding.weights)))
print('')

print("Tree:")
print('')

tu.print_tree(encoding.tree)
print('')

flat_encoding_table = { 
    (hex(c)[2:] + ' ' + chr(c).strip()): b
    for (c, b) 
    in encoding.table.items() }

print("Encoding:n{0}".format(pprint.pformat(flat_encoding_table)))
print('')

# Encode the data.

print("Encoded characters:nn{0}n".
      format(encode_to_debug_string(encoding.table, clear_bytes)))

encoded_bytes = encode(encoding.table, clear_bytes)
_dump_hex("Encoded:", encoded_bytes)

# Decode the data.

decoded_bytes_list = decode(encoding.tree, encoded_bytes)
decoded_bytes = bytes(decoded_bytes_list)

assert 
    clear_bytes == decoded_bytes, 
    "Decoded does not equal the original."

_dump_hex("Decoded:", decoded_bytes)

print("Decoded text:")
print('')
print(decoded_bytes)
print('')

# Serialize and unserialize tree.

serialized_tree = tu.serialize(encoding.tree)
unserialized_tree = tu.unserialize(serialized_tree)

decoded_bytes_list2 = decode(unserialized_tree, encoded_bytes)
decoded_bytes2 = bytes(decoded_bytes_list2)

assert 
    clear_bytes == decoded_bytes2, 
    "Decoded does not equal the original after serializing/" 
    "unserializing the tree."

This is its output:

(Dump) Original data:

54 68 69 73 20 69 73 20 61 20 74 65 73 74 2e 20
54 68 61 6e 6b 20 79 6f 75 20 66 6f 72 20 6c 69
73 74 65 6e 69 6e 67 2e 0a

Weights:
{10: 1,
 32: 7,
 46: 2,
 84: 2,
 97: 2,
 101: 2,
 102: 1,
 103: 1,
 104: 2,
 105: 4,
 107: 1,
 108: 1,
 110: 3,
 111: 2,
 114: 1,
 115: 4,
 116: 3,
 117: 1,
 121: 1}

Tree:

LEFT>
. LEFT>
. . LEFT>
. . . VALUE=(69) [i]
. . RIGHT>
. . . VALUE=(73) [s]
. RIGHT>
. . LEFT>
. . . LEFT>
. . . . VALUE=(54) [T]
. . . RIGHT>
. . . . VALUE=(65) [e]
. . RIGHT>
. . . LEFT>
. . . . LEFT>
. . . . . VALUE=(66) [f]
. . . . RIGHT>
. . . . . VALUE=(72) [r]
. . . RIGHT>
. . . . LEFT>
. . . . . VALUE=(6c) [l]
. . . . RIGHT>
. . . . . VALUE=(a) []
RIGHT>
. LEFT>
. . LEFT>
. . . LEFT>
. . . . VALUE=(6f) [o]
. . . RIGHT>
. . . . VALUE=(61) [a]
. . RIGHT>
. . . LEFT>
. . . . VALUE=(74) [t]
. . . RIGHT>
. . . . VALUE=(6e) [n]
. RIGHT>
. . LEFT>
. . . VALUE=(20) []
. . RIGHT>
. . . LEFT>
. . . . LEFT>
. . . . . LEFT>
. . . . . . VALUE=(6b) [k]
. . . . . RIGHT>
. . . . . . VALUE=(79) [y]
. . . . RIGHT>
. . . . . VALUE=(68) [h]
. . . RIGHT>
. . . . LEFT>
. . . . . VALUE=(2e) [.]
. . . . RIGHT>
. . . . . LEFT>
. . . . . . VALUE=(75) [u]
. . . . . RIGHT>
. . . . . . VALUE=(67) [g]

Encoding:
{'20 ': bitarray('110'),
 '2e .': bitarray('11110'),
 '54 T': bitarray('0100'),
 '61 a': bitarray('1001'),
 '65 e': bitarray('0101'),
 '66 f': bitarray('01100'),
 '67 g': bitarray('111111'),
 '68 h': bitarray('11101'),
 '69 i': bitarray('000'),
 '6b k': bitarray('111000'),
 '6c l': bitarray('01110'),
 '6e n': bitarray('1011'),
 '6f o': bitarray('1000'),
 '72 r': bitarray('01101'),
 '73 s': bitarray('001'),
 '74 t': bitarray('1010'),
 '75 u': bitarray('111110'),
 '79 y': bitarray('111001'),
 'a ': bitarray('01111')}

Encoded characters:

0100 11101 000 001 110 000 001 110 1001 110 1010 0101 001 1010 11110 110 0100 11101 1001 1011 111000 110 111001 1000 111110 110 01100 1000 01101 110 01110 000 001 1010 0101 1011 000 1011 111111 11110 01111

(Dump) Encoded:

4e 83 81 d3 a9 4d 7b 27 66 f8 dc c7 d9 90 dc e0
69 6c 5f fe 7c

(Dump) Decoded:

54 68 69 73 20 69 73 20 61 20 74 65 73 74 2e 20
54 68 61 6e 6b 20 79 6f 75 20 66 6f 72 20 6c 69
73 74 65 6e 69 6e 67 2e 0a

Decoded text:

b'This is a test. Thank you for listening.\n'
Advertisements