SQLAlchemy and MySQL Encoding

July 25, 2014dustin

I recently ran into an issue with the encoding of data coming back from MySQL through sqlalchemy. This is the first time that I’ve encountered such issues since this project first came online, months ago.

I am using utf8 encoding on my database, tables, and columns. I just added a new column, and suddenly my pages and/or AJAX calls started failing with one of the following two messages, respectively:

UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0x96 in position 5: ordinal not in range(128)
UnicodeDecodeError: ‘utf8’ codec can’t decode byte 0x96 in position 5: invalid start byte

When I tell the stored procedure to return an empty string for the new column instead of its data, it works. The other text columns have an identical encoding.

It turns out that SQLAlchemy defaults to the latin1 encoding. If you need something different, than you’re in for a surprise. The official solution is to pass the “encoding” parameter to create_engine. This is the example from the documentation:

engine = create_engine("mysql://scott:tiger@hostname/dbname", encoding='latin1', echo=True)

In my case, I tried utf8. However, it still didn’t work. I don’t know if that ever works. It wasn’t until I uncovered a StackOverflow entry that I found the answer. I had to append “?charset=utf8” to the DSN string:

mysql+mysqldb://username:password@hostname:port/database_name?charset=utf8

The following are the potential explanations:

Since I copy and pasted values that were set into these columns, I accidentally introduced a character that was out of range.
The two encodings have an overlapping set of codes, and I finally introduced a character that was supported by one but not the other.

Whatever the case, it’s fixed and I’m a few hours older.

Using ctypes to Read Binary Data from a Double-Pointer

July 23, 2014dustin

This is a sticky and exotic use-case of ctypes. In the example below, we make a call to some library function that treats ptr like a double-pointer, and sets ptr to point to a buffer and sets count with the number of bytes that are available there. The data at the pointer may have one or more NULL bytes that should not be interpreted as terminators.

from ctypes import *

ptr = ctypes.c_char_p()
count = ctypes.c_size_t()

r = library.some_call(
        ctypes.cast(ctypes.byref(ptr), 
                    ctypes.POINTER(ctypes.c_void_p)), 
        ctypes.byref(count))

if r != 0:
    raise ValueError("Library call failed.")

data = ctypes.string_at(ptr, count.value)

Snappy for Very Easy Compression

July 12, 2014dustin

Snappy is a fast compression algorithm by Google. When I’ve used it, it’s been for socket compression, though it can be used for file compression, too.

For socket compression in Python, the examples are embarrassingly simple. First, import the module:

import snappy

When you establish the socket (which we’ll refer to as s), create the compressor and decompressor:

c = snappy.StreamCompressor()
d = snappy.StreamDecompressor()

From that point on, just pass all of your outgoing data through c.add_chunk(), and all of your read data through d.decompress(). add_chunk() and decompress() may return an empty string from time-to-time (obviously).

Use ca_kit to Rapidly Establish a CA

July 10, 2014dustin

I began using ca_kit so often that it became inconvenient not having it formally packaged and uploaded to PyPI. So, I’ve built it into a formal package. The following scripts are published into the path, upon install:

ck_create_ca: Create CA certificates
ck_create: Create regular certificates
ck_sign: Sign a regular certificate against the CA certificate
ck_verify_ca: Verify that a signed certificate matches the CA certificate

I hope this is as indisposable to you as it is to me.

DEFLATE Socket Compression in Python

July 9, 2014dustin

Properly-working DEFLATE compression is elusive in Python. Thanks to wtolson, I’ve found such a solution.

This is an example framework for establishing the compression and decompression objects:

def activate_zlib():
    import zlib

    wbits = -zlib.MAX_WBITS

    compress = zlib.compressobj(level, zlib.DEFLATED, wbits)

    compressor = lambda x: compress.compress(x) + \
                            compress.flush(zlib.Z_SYNC_FLUSH)

    decompress = zlib.decompressobj(wbits)
    
    decompressor = lambda x: decompress.decompress(x) + \
                                decompress.flush()
    
    return (compressor, decompressor)

With this example, you’ll pass all of your outgoing data through the compressor, and all of your incoming data to the decompressor. As always, you’ll do a read-loop until you’ve decompressed the expected number of bytes.

Method Overloads in Python 3.4

July 6, 2014dustin

Python 3.4 added a “singledispatch” decorator to functools, which provides method overloads. This enables you to perform different operations based on the type of the first argument.

By default, it prefers to work with static methods. This mostly comes from the link above:

import functools


class TestClass(object):
    @functools.singledispatch
    def test_method(arg):
        print("Let me just say,", end=" ")
        print(arg)

    @test_method.register(int)
    def _(arg):
        print("Strength in numbers, eh?", end=" ")
        print(arg)

    @test_method.register(list)
    def _(arg):
        print("Enumerate this:")

        for i, elem in enumerate(arg):
            print(i, elem)

if __name__ == '__main__':
    TestClass.test_method(55555)
    TestClass.test_method([33, 22, 11])

However, there is a low-impact way to get overloading on instance-methods, too. We’ll just place our own wrapper around the standard singledispatch wrapper, and hijack the bulk of the functionality:

import functools

def instancemethod_dispatch(func):
    dispatcher = functools.singledispatch(func)
    def wrapper(*args, **kw):
        return dispatcher.dispatch(args[1].__class__)(*args, **kw)
    wrapper.register = dispatcher.register
    functools.update_wrapper(wrapper, func)
    return wrapper


class TestClass2(object):
    @instancemethod_dispatch
    def test_method(self, arg):
        print("2: Let me just say,", end=" ")
        print(arg)

    @test_method.register(int)
    def _(self, arg):
        print("2: Strength in numbers, eh?", end=" ")
        print(arg)

    @test_method.register(list)
    def _(self, arg):
        print("2: Enumerate this:")

        for i, elem in enumerate(arg):
            print(i, elem)

if __name__ == '__main__':
    t = TestClass2()
    t.test_method(55555)
    t.test_method([33, 22, 11])

Aside from superficial changes to the original example, we just added the instancemethod_dispatch function and updated the methods to take a “self” argument.

A special thanks to Zero Piraeus for penning the instancemethod_dispatch method (under the original name of “methdispatch”).

The blohg Blogging Engine

July 6, 2014dustin

Written in Python, and stores into a Mercurial or Git repository:

http://blohg.org/

“arrow” Package for Complete Python Date/Time Manipulation

July 6, 2014dustin

The “arrow” package, to reduce the number of packages that you need to rely on to do what you need to do:

http://crsmithdev.com/arrow/

Hidden in Plain Site: The Python print() Statement

July 6, 2014dustin

The use of print() is so commonplace and thoughtless that it’s easy to forget that it’s still a function. There are parameters often neglected. You may even find yourself using sys.stdout to avoid the automatic newline, which is folly.

This is the signature, as of 3.4:

print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False)

None of the parameters need an explanation. The flush parameter was added in 3.3 .

A Compatible Way of Getting Package-Paths in Python

July 6, 2014dustin

The Python community is always looking to improve things in the way of consistency, and that’s its best and worst feature because from time to time, one method isn’t finished before another method is begun.

For example, when it comes to something, like packaging, you may need to account for what you need in several different ways, such as identifying non-Python files in both the “package data” clauses of your setup attributes and listing them in a MANIFEST.in (one is considered only when building source distributions, and another is considered only when building binary distributions). However, in order to do this, you also have to embed the non-Python files within one of your actual source directories (the “package” directories), because the package-data files are made to belong to particular packages. We won’t even talk about the complexities of source and binary packages when packaging into a wheel. Such divergences are the topic of many entire series of articles.

With similar compatibility problems, in order to use one of the module loaders, for the purposes of reflection, the recommended package will vary depending on whether you’re running 2.x, 3.2/3.3, and 3.4 .

It’s a pain. For your convenience, this is such a flow, used to determine the path of the package:

_MODULE_NAME = 'module name'
_APP_PATH = None

# Works in 3.4

try:
    import importlib.util
    _ORIGIN = importlib.util.find_spec(_MODULE_NAME).origin
    _APP_PATH = os.path.abspath(os.path.dirname(_ORIGIN))
except:
    pass

# Works in 3.2

if _APP_PATH is None:
    try:
        import importlib
        _INITFILEPATH = importlib.find_loader(_MODULE_NAME).path
        _APP_PATH = os.path.abspath(os.path.dirname(_INITFILEPATH))
    except:
        pass

# Works in 2.x

if _APP_PATH is None:
    import imp
    _APP_PATH = imp.find_module(_MODULE_NAME)[1]

Random Engineering

Gotta figure that out.

Widgets

Search

python

SQLAlchemy and MySQL Encoding

Using ctypes to Read Binary Data from a Double-Pointer

Snappy for Very Easy Compression

Use ca_kit to Rapidly Establish a CA

DEFLATE Socket Compression in Python

Method Overloads in Python 3.4

The blohg Blogging Engine

“arrow” Package for Complete Python Date/Time Manipulation

Hidden in Plain Site: The Python print() Statement

A Compatible Way of Getting Package-Paths in Python

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: