Programmatically-Driven Websites in Python (with HTTPHandler and SO_LINGER)

We’re going to write a website whose requests are handled by subroutines, and use Python’s logging.handlers.HTTPHandler class to send requests to it. Documentation and/or examples for the former are sparse, and I thought that an example of the latter connecting to the former would be useful.

Understanding the Webserver

Using the built-in BaseHTTPServer.BaseHTTPRequestHandler webserver, you can wire methods for individual verbs (GET, POST, PUT, etc..). Requests on verbs that aren’t handled will return a 501. Aside from having to write the headers at the top of the methods yourself and needing to read a specific quantity of data-bytes (or you’ll block forever), this is similar to every other web-framework that you’ve used.

The only things that you really need to know are the following instance variables:

  • headers: A dictionary-like collection of headers.
  • rfile: A file-like object that will contain your data (if you receive any).
  • wfile: A file-like object that will receive your response data (if you send any).

You’ll also need to deal with how to handle unsent data when you terminate. Even if you shutdown a socket, it may not be closed by the system immediately if data has already moved across it. This relates to why we inherit from SocketServer.TCPServer and change the one class variable. We’ll discuss this more, below.

import pprint
import urlparse

import BaseHTTPServer
import SocketServer

_PORT = 8000


class TCPServerReusableSocket(SocketServer.TCPServer):
    allow_reuse_address = True


class HookedHTTPRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
    def __send_headers(self):
        self.send_response(200)
        self.send_header("Content-type", 'text/plain')
        self.end_headers()

    def do_GET(self):
        self.__send_headers()

        print("Received GET request for: %s" % (self.path,))

        self.wfile.write("Test from GET!\n")

    def do_POST(self):
        self.__send_headers()

        print("Received POST request for: %s" % (self.path,))

        print('')
        print('Headers')
        print('=======')
        pprint.pprint(self.headers.items())
        print('=======')

        length = int(self.headers['content-length'])
        data_raw = self.rfile.read(length)
        data = urlparse.parse_qs(data_raw)

        print('')
        print('Received')
        print('========')
        pprint.pprint(data)
        print('========')
        print('')

        self.wfile.write("Test from POST!\n")

httpd = TCPServerReusableSocket(
            ('localhost', _PORT), 
            HookedHTTPRequestHandler)

httpd.serve_forever()

We expect that what we’ve done above is fairly obvious and does not need an explanation. You can implement your own log_request(code=None, size=None) method in HookedHTTPRequestHandler to change how the requests are printed, or to remove them.

To continue our remarks about buffered-data above, we add special handling so that we don’t encounter the “socket.error: [Errno 48] Address already in use” error if you kill the server and restart it a moment later. You may choose one of the following two strategies:

  1. Force the socket to close immediately.
  2. Allow the socket to already be open.

(1) should be fine for logging/etc. However, this might not be a great option if you’re handling actual data. (2) should probably be the preferred strategy, but you’ll also have to be sure to implement a PID file in your application so that you can be sure that only one instance is running (assuming that’s desired).

To implement (2), use SocketServer.TCPServer instead of our custom TCPServerReusableSocket. and, add the following imports:

import socket
import struct

Then, add the following after we define httpd but before we start the server, to tell the SO_LINGER socket option to kill all buffered data immediately:

l_onoff = 1                                                                                                                                                           
l_linger = 0                                                                                                                                                          

httpd.socket.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', l_onoff, l_linger))

You can test this using cURL, if you can’t wait to setup HTTPHandler:

$ curl -X POST -d abc=def http://localhost:8000
Test from POST!

The webserver process will show:

$ python http_server.py 
127.0.0.1 - - [19/Oct/2014 15:28:47] "POST / HTTP/1.1" 200 -
Received POST request for: /

Headers
=======
[('host', 'localhost:8000'),
 ('content-type', 'application/x-www-form-urlencoded'),
 ('content-length', '7'),
 ('accept', '*/*'),
 ('user-agent', 'curl/7.30.0')]
=======

Received
========
{'abc': ['def']}
========

Understanding logging.handlers.HTTPHandler

My own use-case for this was from a new MapReduce platform (JobX), and I wanted to potentially emit messages to another system if certain tasks were accomplished. I used the built-in webserver that we invoked, above, to see these messages from the development system.

import logging
import logging.handlers

logger = logging.getLogger(__name__)

_TARGET = 'localhost:8000'
_PATH = '/'
_VERB = 'post'

sh = logging.handlers.HTTPHandler(_TARGET, _PATH, method=_VERB)

logger.addHandler(sh)
logger.setLevel(logging.DEBUG)

logger.debug("Test message.")

This will be shown by the webserver:

127.0.0.1 - - [19/Oct/2014 15:45:02] "POST / HTTP/1.0" 200 -
Received POST request for: /

Headers
=======
[('host', 'localhost'),
 ('content-type', 'application/x-www-form-urlencoded'),
 ('content-length', '368')]
=======

Received
========
{'args': ['()'],
 'created': ['1413747902.18'],
 'exc_info': ['None'],
 'exc_text': ['None'],
 'filename': ['push_socket_log.py'],
 'funcName': ['<module>'],
 'levelname': ['DEBUG'],
 'levelno': ['10'],
 'lineno': ['17'],
 'module': ['push_socket_log'],
 'msecs': ['181.387901306'],
 'msg': ['Test message.'],
 'name': ['__main__'],
 'pathname': ['./push_socket_log.py'],
 'process': ['65486'],
 'processName': ['MainProcess'],
 'relativeCreated': ['12.6709938049'],
 'thread': ['140735262810896'],
 'threadName': ['MainThread']}
========

Note that each field is a list with one item. If you want the output to look a little nicer, alter the above to add the following to the top of the module:

import datetime

_FMT_DATETIME_STD = '%Y-%m-%d %H:%M:%S'

Then, add the __print_entry method:

    def __print_entry(self, entry):
        created_epoch = float(entry['created'][0])
        when_dt = datetime.datetime.fromtimestamp(created_epoch)
        timestamp_phrase = when_dt.strftime(_FMT_DATETIME_STD)
        where_name = entry['name'][0][:40]
        level_name = entry['levelname'][0]

        message = entry['msg'][0]

        print('%s  %40s  %9s  %s' % 
              (timestamp_phrase, where_name, level_name, message))

Then, change the last part of do_POST:

    def do_POST(self):
        self.__send_headers()

        length = int(self.headers['content-length'])
        data_raw = self.rfile.read(length)
        data = urlparse.parse_qs(data_raw)

        self.__print_entry(data)

The output will now look like:

2014-10-19 16:16:00       MR_HANDLER.HTTP.map_obfuscation_one       INFO  Socket message!
2014-10-19 16:16:00                           MR_HANDLER.HTTP      ERROR  Mapper invocation [789b7ca7fcb6cede9ae5557b2121d392469dfc26] under request [85394d5bdb34a09ffa045776cc69d1d4cd17d657] failed. HANDLER=[map_obfuscation_one]

There is one weird thing about HTTPHandler, and it’s this: Many/all of the fields will be stringified in order to serialized them. If you call the logger like logging.debug('Received arguments: [%s] [%s]', arg1, arg2), then we’ll receive Received argument: [%s] in the msg field (or rather the msg list), and the arguments as a stringified tuple like (u'abc', u'def'). To avoid dealing with this, I’ll send messages into a function that’s in charge of the notifications, and produce the final string before I send it to the logger.

The same thing applies to tracebacks. If you log an exception, you’ll only get this:

 'exc_info': ['(<type 'exceptions.NameError'>, NameError("global name 'client_id' is not defined",), <traceback object at 0x110c92878>)'],
 'exc_text': ['None'],

Again, you’ll have to concatenate this into the log-message by some intermediate function (so that the primary application logic doesn’t have to know about it, but so that you’ll still get this information).

Advertisements