SQLAlchemy and MySQL Encoding

I recently ran into an issue with the encoding of data coming back from MySQL through sqlalchemy. This is the first time that I’ve encountered such issues since this project first came online, months ago.

I am using utf8 encoding on my database, tables, and columns. I just added a new column, and suddenly my pages and/or AJAX calls started failing with one of the following two messages, respectively:

  • UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0x96 in position 5: ordinal not in range(128)
  • UnicodeDecodeError: ‘utf8’ codec can’t decode byte 0x96 in position 5: invalid start byte

When I tell the stored procedure to return an empty string for the new column instead of its data, it works. The other text columns have an identical encoding.

It turns out that SQLAlchemy defaults to the latin1 encoding. If you need something different, than you’re in for a surprise. The official solution is to pass the “encoding” parameter to create_engine. This is the example from the documentation:

engine = create_engine("mysql://scott:tiger@hostname/dbname", encoding='latin1', echo=True)

In my case, I tried utf8. However, it still didn’t work. I don’t know if that ever works. It wasn’t until I uncovered a StackOverflow entry that I found the answer. I had to append “?charset=utf8” to the DSN string:

mysql+mysqldb://username:password@hostname:port/database_name?charset=utf8

The following are the potential explanations:

  • Since I copy and pasted values that were set into these columns, I accidentally introduced a character that was out of range.
  • The two encodings have an overlapping set of codes, and I finally introduced a character that was supported by one but not the other.

Whatever the case, it’s fixed and I’m a few hours older.

Advertisement