Installing the Nginx Long-Polling/Comet Module on a Mac Using Homebrew

If you’re developing software on a Mac that’s targeted for use in a Linux environment, you’re not alone. You might be lucky-enough to be working in a scripting language-based project, so the difference between environments isn’t nearly as brutal as it would be if you actually had to perform builds. Still, there is the occasional environmental difference.

One such difference is what baked-in Nginx modules you’ll get on an Ubuntu host versus your Mavericks host. What if you need the same push/Comet module on your Mac that you get when you install the nginx-extras package? This is the nginx-push-stream-module module (there’s at least one other module with a similar name, which is usually an additional source of confusion).

You might think that you need to download the source to that module, potentially have to deal with build-dependencies, clone the standard nginx Homebrew formula, modify it, and build. You’d be wrong.

It’s very simple. After you’ve uninstalled your previous nginx formula, run:

$ brew install nginx-full --with-push-stream-module
==> Installing nginx-full from homebrew/homebrew-nginx
==> Installing nginx-full dependency: push-stream-nginx-module
==> Downloading https://github.com/wandenberg/nginx-push-stream-module/archive/0.4.1.tar.gz
Already downloaded: /Library/Caches/Homebrew/push-stream-nginx-module-0.4.1.tar.gz
🍺  /usr/local/Cellar/push-stream-nginx-module/0.4.1: 75 files, 1.1M, built in 2 seconds
==> Installing nginx-full
==> Downloading http://nginx.org/download/nginx-1.6.2.tar.gz
######################################################################## 100.0%
==> ./configure --prefix=/usr/local/Cellar/nginx-full/1.6.2 --with-http_ssl_module --with-pcre --with-ipv6 --sbin-path=/
==> make
==> make install
...

This requires the nginx-full formula because, besides push-stream-nginx-module, it has a massive number of definitions for third-party modules, whereas the normal nginx formula has very few. Your configuration will be largely preserved from the old Nginx to the new (only the launch configs should change).

For reference, this is the list of modules that nginx-full is configured for:

  def self.third_party_modules
    {
      "lua" => "Compile with support for LUA module",
      "echo" => "Compile with support for Echo Module",
      "auth-digest" => "Compile with support for Auth Digest Module",
      "set-misc" => "Compile with support for Set Misc Module",
      "redis2" => "Compile with support for Redis2 Module",
      "array-var" => "Compile with support for Array Var Module",
      "accept-language" => "Compile with support for Accept Language Module",
      "accesskey" => "Compile with support for HTTP Access Key Module",
      "auth-ldap" => "Compile with support for Auth LDAP Module",
      "auth-pam" => "Compile with support for Auth PAM Module",
      "cache-purge" => "Compile with support for Cache Purge Module",
      "ctpp2" => "Compile with support for CT++ Module",
      "headers-more" => "Compile with support for Headers More Module",
      "tcp-proxy" => "Compile with support for TCP proxy",
      "dav-ext" => "Compile with support for HTTP WebDav Extended Module",
      "eval" => "Compile with support for Eval Module",
      "fancyindex" => "Compile with support for Fancy Index Module",
      "mogilefs" => "Compile with support for HTTP MogileFS Module",
      "mp4-h264" => "Compile with support for HTTP MP4/H264 Module",
      "notice" => "Compile with support for HTTP Notice Module",
      "subs-filter" => "Compile with support for Substitutions Filter Module",
      "upload" => "Compile with support for Upload module",
      "upload-progress" => "Compile with support for Upload Progress module",
      "php-session" => "Compile with support for Parse PHP Sessions module",
      "anti-ddos" => "Compile with support for Anti-DDoS module",
      "captcha" => "Compile with support for Captcha module",
      "autols" => "Compile with support for Flexible Auto Index module",
      "auto-keepalive" => "Compile with support for Auto Disable KeepAlive module",
      "ustats" => "Compile with support for Upstream Statistics (HAProxy style) module",
      "extended-status" => "Compile with support for Extended Status module",
      "upstream-hash" => "Compile with support for Upstream Hash Module",
      "consistent-hash" => "Compile with support for Consistent Hash Upstream module",
      "healthcheck" => "Compile with support for Healthcheck Module",
      "log-if" => "Compile with support for Log-if Module",
      "txid" => "Compile with support for Sortable Unique ID",
      "upstream-order" => "Compile with support for Order Upstream module",
      "unzip" => "Compile with support for UnZip module",
      "var-req-speed" => "Compile with support for Var Request-Speed module",
      "http-flood-detector" => "Compile with support for Var Flood-Threshold module",
      "http-remote-passwd" => "Compile with support for Remote Basic Auth password module",
      "realtime-req" => "Compile with support for Realtime Request module",
      "counter-zone" => "Compile with support for Realtime Counter Zone module",
      "mod-zip" => "Compile with support for HTTP Zip Module",
      "rtmp" => "Compile with support for RTMP Module",
      "dosdetector" => "Compile with support for detecting DoS attacks",
      "push-stream" => "Compile with support for http push stream module",
    }
  end

As a result, you can use a similar command to install each of these modules.

The guy who’s responsible for this is easily worth his weight in donations.

Retrieving Multiple Result-Sets from SQLAlchemy

SQLAlchemy is a great Python-based database client, but, traditionally, it leaves you stuck when it comes to stored-procedures that return more than one dataset. This means that you’d have to either call separate queries or merge multiple datasets into one large, unnatural one. However, there is a way to read multiple datasets but it requires accessing the raw MySQL layer (which isn’t too bad).

This is the test-routine:

delimiter //

CREATE PROCEDURE `get_sets`()
BEGIN
    SELECT
        'value1' `series1_col1`,
        'value2' `series1_col2`;

    SELECT
        'value3' `series2_col1`,
        'value4' `series2_col2`;

    SELECT
        'value5' `series3_col1`,
        'value6' `series3_col2`;
END//

delimiter ;

The code:

import json

import sqlalchemy.pool

def _run_query(connection, query, parameters={}):
    sets = []

    try:
        cursor = connection.cursor()

        cursor.execute(query, parameters)

        while 1:
            #(column_name, type_, ignore_, ignore_, ignore_, null_ok, column_flags)
            names = [c[0] for c in cursor.description]

            set_ = []
            while 1:
                row_raw = cursor.fetchone()
                if row_raw is None:
                    break

                row = dict(zip(names, row_raw))
                set_.append(row)

            sets.append(list(set_))

            if cursor.nextset() is None:
                break

            # nextset() doesn't seem to be sufficiant to tell the end.
            if cursor.description is None:
                break
    finally:
        # Return the connection to the pool (won't actually close).
        connection.close()

    return sets

def _pretty_json_dumps(data):
    return json.dumps(
            data,
            sort_keys=True,
            indent=4, 
            separators=(',', ': ')) + "\n"

def _main():
    dsn = 'mysql+mysqldb://root:root@localhost:3306/test_database'

    engine = sqlalchemy.create_engine(
                dsn, 
                pool_recycle=7200,
                poolclass=sqlalchemy.pool.NullPool)

    # Grab a raw connection from the connection-pool.
    connection = engine.raw_connection()

    query = 'CALL get_sets()'
    sets = _run_query(connection, query)

    print(_pretty_json_dumps(sets))

if __name__ == '__main__':
    _main()

The output:

[
    [
        {
            "series1_col1": "value1",
            "series1_col2": "value2"
        }
    ],
    [
        {
            "series2_col1": "value3",
            "series2_col2": "value4"
        }
    ],
    [
        {
            "series3_col1": "value5",
            "series3_col2": "value6"
        }
    ]
]

Things to observe in the example:

  • The query parameters are still escaped (our parameters have spaces in them), even though we have to use classic Python string-substitution formatting with the raw connection-objects.
  • It’s up to us to extract the column-names from the cursor for each dataset.
  • The resulting datasets can’t be captured as generators, as they have to be read entirely before jumping to the next dataset. Technically, you can yield each dataset, but this has almost no usefulness since you’d rarely be required need to read through them sequentially and you’d only benefit if there were a large number of datasets.
  • The raw_connection() method claims a connection from the pool, and its close() method will return it to the pool without actually closing it.
  • I added pool_recycle for good measure. This is an enormous pain to have to deal with, if you’re new to SA and your connections keep “going away” because MySQL is closing them before SA can recycle them.

REFERENCE: Multiple Result Sets

Configuring Your Private Docker Registry for SSL

This post provides some redundancy since the Docker-provided reference to the example doesn’t have a lot of surface area.

It’s not entirely straight-forward how to configure Nginx to forward requests to your Registry instance, as several options are required, for Registry compatibility.

Starting the Registry (for your reference). In this case, we’re storing our images in S3, and forwarding from port 5001 on the host system to 5000 on the Docker container:

sudo /usr/local/bin/docker run 
    -d 
    -e SETTINGS_FLAVOR=s3 
    -e AWS_BUCKET=deploy-docker_images 
    -e STORAGE_PATH=/registry 
    -e AWS_KEY=<your AWS access-key> 
    -e AWS_SECRET=<your AWS secret-key> 
    -e SEARCH_BACKEND=sqlalchemy 
    -p 5001:5000 
    registry

This is the Nginx config, with help from the Docker example:

server {
        listen 5000;
        server_name localhost;

        ssl on;
        ssl_certificate /etc/ssl/certs/your.certificate.pem;
        ssl_certificate_key /etc/ssl/private/your.private_key.pem;

        client_max_body_size 0; # disable any limits to avoid HTTP 413 for large image uploads

        # required to avoid HTTP 411: see Issue #1486 (https://github.com/docker/docker/issues/1486)
        chunked_transfer_encoding on;

        location / {
                proxy_pass http://127.0.0.1:5001;
                proxy_set_header  Host           $http_host;   # required for docker client's sake
                proxy_set_header  X-Real-IP      $remote_addr; # pass on real client's IP
                proxy_set_header  Authorization  ""; # see https://github.com/dotcloud/docker-registry/issues/170
                proxy_read_timeout               900;
        }
}

Convert Syslog Events to a JSON Stream

The syslog-ng project serves as a general replacement for rsyslog (the default syslog daemon incarnation on Ubuntu and other distributions). It allows you to simply defining syslog sources, defining filters, defining destinations, and mapping them together. It also provides the ability to apply a pattern-tree to messages for classification (see “Processing message content with a pattern database” in the “Administrator Guide” PDF, below), as well as translating log messages into different formats.

It’s the latter that we’re concerned with. We can take Syslog output and use the JSON template-plugin to send JSON into a pipe, network destination, etc..

For this example, we’ll simply translate the system syslog events into JSON.

Installing/Configuring

  1. Install packages:
    $ sudo apt-get install syslog-ng-core
    $ sudo apt-get install syslog-ng-mod-json
    
  2. Modify /etc/syslog-ng/syslog-ng.conf:
    destination d_json {
       file("/var/log/messages.json" template("$(format-json --scope selected_macros --scope nv_pairs)\n"));
    };
    
    log {
       source(s_src); destination(d_json);
    };
    
  3. Restart the service:
    $ sudo service syslog-ng restart
    

Now, you’ll see a /var/log/messages.json. Mine shows the following:

{"TAGS":".source.s_src","SOURCEIP":"127.0.0.1","PROGRAM":"sudo","PRIORITY":"notice","MESSAGE":"  dustin : TTY=pts/23 ; PWD=/home/dustin ; USER=root ; COMMAND=/usr/sbin/service syslog-ng restart","LEGACY_MSGHDR":"sudo: ","HOST_FROM":"dustinhub","HOST":"dustinhub","FACILITY":"authpriv","DATE":"Sep 16 04:51:41"}
{"TAGS":".source.s_src","SOURCEIP":"127.0.0.1","PROGRAM":"sudo","PRIORITY":"info","MESSAGE":"pam_unix(sudo:session): session opened for user root by dustin(uid=0)","LEGACY_MSGHDR":"sudo: ","HOST_FROM":"dustinhub","HOST":"dustinhub","FACILITY":"authpriv","DATE":"Sep 16 04:51:41"}
{"TAGS":".source.s_src","SOURCEIP":"127.0.0.1","SOURCE":"s_src","PROGRAM":"syslog-ng","PRIORITY":"notice","PID":"15800","MESSAGE":"syslog-ng shutting down; version='3.5.3'","HOST_FROM":"dustinhub","HOST":"dustinhub","FACILITY":"syslog","DATE":"Sep 16 04:51:41"}
{"TAGS":".source.s_src","SOURCEIP":"127.0.0.1","SOURCE":"s_src","PROGRAM":"syslog-ng","PRIORITY":"notice","PID":"15889","MESSAGE":"syslog-ng starting up; version='3.5.3'","HOST_FROM":"dustinhub","HOST":"dustinhub","FACILITY":"syslog","DATE":"Sep 16 04:51:41"}
{"TAGS":".source.s_src","SOURCEIP":"127.0.0.1","SOURCE":"s_src","PROGRAM":"syslog-ng","PRIORITY":"notice","PID":"15889","MESSAGE":"EOF on control channel, closing connection;","HOST_FROM":"dustinhub","HOST":"dustinhub","FACILITY":"syslog","DATE":"Sep 16 04:51:41"}
{"TAGS":".source.s_src","SOURCEIP":"127.0.0.1","PROGRAM":"sudo","PRIORITY":"info","MESSAGE":"pam_unix(sudo:session): session closed for user root","LEGACY_MSGHDR":"sudo: ","HOST_FROM":"dustinhub","HOST":"dustinhub","FACILITY":"authpriv","DATE":"Sep 16 04:51:41"}

Conclusion

This all enables you to build your own filter in your favorite programming language by using a socket-server and a set of rules. You don’t have to be concerned with parsing the Syslog protocol or the semantics of file-parsing and message formats, and you can avoid the age-old paradigm of parsing log-files, after the fact, by chunks of time, and start to process them in real-time.

Documentation

The syslog-ng website has some “Administrator Guide” PDFs, but the site has very little other usefulness, and, though everyone loves syslog-ng, there is little more than configurations snippets in forum posts. However, those PDFs are thorough, and the configuration file is easy to understand (essentially different incarnations of the commands above).

Dynamically Compiling and Implementing a Function

Python allows you to dynamically compile things at the module-level. That’s why the compile() builtin keyword accepts sourcecode and dictionaries of locals and globals, and doesn’t provide a direct way to call a fragment of dynamic (read: textual) sourcecode with arguments (“xyz(arg1, arg2)”). You also can’t directly invoke compiled-code as a generator (where a function that uses “yield” is interpreted as a generator rather than just a function).

However, there’s a loophole, and it’s very elegant and consistent to Python. You simply have to wrap your code in a function definition, and then pull the function from the local scope. You can then call it as desired:

import hashlib
import random

def _compile(arg_names, code):
    name = "(lambda compile)"
    # Needs to start with a letter.
    id_ = 'a' + str(random.random())
    code = "def " + id_ + "(" + ', '.join(arg_names) + "):n" + 
           'n'.join(('  ' + line) for line in code.replace('r', '').split('n')) + 'n'

    c = compile(code, name, 'exec')
    locals_ = {}
    exec(c, globals(), locals_)
    return locals_[id_]

code = """
return a * b * c
"""

c = _compile(['a', 'b', 'c'], code)
print(c(1, 2, 3))

Serialize a Generator in Python

Simply inherit from the list class, and override the __iter__ method. A great example, from SO:

import json

def gen():
    yield 20
    yield 30
    yield 40

class StreamArray(list):
    def __iter__(self):
        return gen()

a = [1,2,3]
b = StreamArray()

print(json.dumps([1,a,b]))

SQLAlchemy and MySQL Encoding

I recently ran into an issue with the encoding of data coming back from MySQL through sqlalchemy. This is the first time that I’ve encountered such issues since this project first came online, months ago.

I am using utf8 encoding on my database, tables, and columns. I just added a new column, and suddenly my pages and/or AJAX calls started failing with one of the following two messages, respectively:

  • UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0x96 in position 5: ordinal not in range(128)
  • UnicodeDecodeError: ‘utf8’ codec can’t decode byte 0x96 in position 5: invalid start byte

When I tell the stored procedure to return an empty string for the new column instead of its data, it works. The other text columns have an identical encoding.

It turns out that SQLAlchemy defaults to the latin1 encoding. If you need something different, than you’re in for a surprise. The official solution is to pass the “encoding” parameter to create_engine. This is the example from the documentation:

engine = create_engine("mysql://scott:tiger@hostname/dbname", encoding='latin1', echo=True)

In my case, I tried utf8. However, it still didn’t work. I don’t know if that ever works. It wasn’t until I uncovered a StackOverflow entry that I found the answer. I had to append “?charset=utf8” to the DSN string:

mysql+mysqldb://username:password@hostname:port/database_name?charset=utf8

The following are the potential explanations:

  • Since I copy and pasted values that were set into these columns, I accidentally introduced a character that was out of range.
  • The two encodings have an overlapping set of codes, and I finally introduced a character that was supported by one but not the other.

Whatever the case, it’s fixed and I’m a few hours older.

Using ctypes to Read Binary Data from a Double-Pointer

This is a sticky and exotic use-case of ctypes. In the example below, we make a call to some library function that treats ptr like a double-pointer, and sets ptr to point to a buffer and sets count with the number of bytes that are available there. The data at the pointer may have one or more NULL bytes that should not be interpreted as terminators.

from ctypes import *

ptr = ctypes.c_char_p()
count = ctypes.c_size_t()

r = library.some_call(
        ctypes.cast(ctypes.byref(ptr), 
                    ctypes.POINTER(ctypes.c_void_p)), 
        ctypes.byref(count))

if r != 0:
    raise ValueError("Library call failed.")

data = ctypes.string_at(ptr, count.value)