Improving your code with modern idioms

A good writeup of “modern” Python techniques:

http://python3porting.com/improving.html

Advertisements

Tool to Identify Supported Ciphersuites for a Given Server

cipherscan

$ ./cipherscan www.google.com:443
...................
prio  ciphersuite                  protocols                    pfs_keysize
1     ECDHE-RSA-CHACHA20-POLY1305  TLSv1.2                      ECDH,P-256,256bits
2     ECDHE-RSA-AES128-GCM-SHA256  TLSv1.2                      ECDH,P-256,256bits
3     ECDHE-RSA-AES128-SHA         TLSv1.1,TLSv1.2              ECDH,P-256,256bits
4     ECDHE-RSA-RC4-SHA            SSLv3,TLSv1,TLSv1.1,TLSv1.2  ECDH,P-256,256bits
5     AES128-GCM-SHA256            TLSv1.2
6     AES128-SHA256                TLSv1.2
7     AES128-SHA                   TLSv1.1,TLSv1.2
8     RC4-SHA                      SSLv3,TLSv1,TLSv1.1,TLSv1.2
9     RC4-MD5                      SSLv3,TLSv1,TLSv1.1,TLSv1.2
10    ECDHE-RSA-AES256-GCM-SHA384  TLSv1.2                      ECDH,P-256,256bits
11    ECDHE-RSA-AES256-SHA384      TLSv1.2                      ECDH,P-256,256bits
12    ECDHE-RSA-AES256-SHA         SSLv3,TLSv1,TLSv1.1,TLSv1.2  ECDH,P-256,256bits
13    AES256-GCM-SHA384            TLSv1.2
14    AES256-SHA256                TLSv1.2
15    AES256-SHA                   SSLv3,TLSv1,TLSv1.1,TLSv1.2
16    ECDHE-RSA-DES-CBC3-SHA       SSLv3,TLSv1,TLSv1.1,TLSv1.2  ECDH,P-256,256bits
17    DES-CBC3-SHA                 SSLv3,TLSv1,TLSv1.1,TLSv1.2
18    ECDHE-RSA-AES128-SHA256      TLSv1.2                      ECDH,P-256,256bits

Certificate: trusted, 2048 bit, sha1WithRSAEncryption signature

You can also get a JSON result:

$ /cipherscan -j -starttls xmpp jabber.ccc.de:5222
{
    "target": "jabber.ccc.de:5222",
    "date": "Sat, 19 Apr 2014 11:40:40 -0400",
    "ciphersuite": [
        {
            "cipher": "DHE-RSA-AES256-SHA",
            "protocols": [
                "SSLv3",
                "TLSv1"
            ],
            "pubkey": [
                "2048"
            ],
            "sigalg": [
                "sha1WithRSAEncryption"
            ],
            "trusted": "False",
            "pfs": "DH,1024bits"
        },
        {
            "cipher": "AES256-SHA",
            "protocols": [
                "SSLv3",
                "TLSv1"
            ],
            "pubkey": [
                "2048"
            ],
            "sigalg": [
                "sha1WithRSAEncryption"
            ],
            "trusted": "False",
            "pfs": "None"
        },
        {
            "cipher": "EDH-RSA-DES-CBC3-SHA",
            "protocols": [
                "SSLv3",
                "TLSv1"
            ],
            "pubkey": [
                "2048"
            ],
            "sigalg": [
                "sha1WithRSAEncryption"
            ],
            "trusted": "False",
            "pfs": "DH,1024bits"
        },
        {
            "cipher": "DES-CBC3-SHA",
            "protocols": [
                "SSLv3",
                "TLSv1"
            ],
            "pubkey": [
                "2048"
            ],
            "sigalg": [
                "sha1WithRSAEncryption"
            ],
            "trusted": "False",
            "pfs": "None"
        },
        {
            "cipher": "DHE-RSA-AES128-SHA",
            "protocols": [
                "SSLv3",
                "TLSv1"
            ],
            "pubkey": [
                "2048"
            ],
            "sigalg": [
                "sha1WithRSAEncryption"
            ],
            "trusted": "False",
            "pfs": "DH,1024bits"
        },
        {
            "cipher": "AES128-SHA",
            "protocols": [
                "SSLv3",
                "TLSv1"
            ],
            "pubkey": [
                "2048"
            ],
            "sigalg": [
                "sha1WithRSAEncryption"
            ],
            "trusted": "False",
            "pfs": "None"
        },
        {
            "cipher": "RC4-SHA",
            "protocols": [
                "SSLv3",
                "TLSv1"
            ],
            "pubkey": [
                "2048"
            ],
            "sigalg": [
                "sha1WithRSAEncryption"
            ],
            "trusted": "False",
            "pfs": "None"
        },
        {
            "cipher": "RC4-MD5",
            "protocols": [
                "SSLv3",
                "TLSv1"
            ],
            "pubkey": [
                "2048"
            ],
            "sigalg": [
                "sha1WithRSAEncryption"
            ],
            "trusted": "False",
            "pfs": "None"
        }
    ]
}

Programmatically-Driven Websites in Python (with HTTPHandler and SO_LINGER)

We’re going to write a website whose requests are handled by subroutines, and use Python’s logging.handlers.HTTPHandler class to send requests to it. Documentation and/or examples for the former are sparse, and I thought that an example of the latter connecting to the former would be useful.

Understanding the Webserver

Using the built-in BaseHTTPServer.BaseHTTPRequestHandler webserver, you can wire methods for individual verbs (GET, POST, PUT, etc..). Requests on verbs that aren’t handled will return a 501. Aside from having to write the headers at the top of the methods yourself and needing to read a specific quantity of data-bytes (or you’ll block forever), this is similar to every other web-framework that you’ve used.

The only things that you really need to know are the following instance variables:

  • headers: A dictionary-like collection of headers.
  • rfile: A file-like object that will contain your data (if you receive any).
  • wfile: A file-like object that will receive your response data (if you send any).

You’ll also need to deal with how to handle unsent data when you terminate. Even if you shutdown a socket, it may not be closed by the system immediately if data has already moved across it. This relates to why we inherit from SocketServer.TCPServer and change the one class variable. We’ll discuss this more, below.

import pprint
import urlparse

import BaseHTTPServer
import SocketServer

_PORT = 8000


class TCPServerReusableSocket(SocketServer.TCPServer):
    allow_reuse_address = True


class HookedHTTPRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
    def __send_headers(self):
        self.send_response(200)
        self.send_header("Content-type", 'text/plain')
        self.end_headers()

    def do_GET(self):
        self.__send_headers()

        print("Received GET request for: %s" % (self.path,))

        self.wfile.write("Test from GET!\n")

    def do_POST(self):
        self.__send_headers()

        print("Received POST request for: %s" % (self.path,))

        print('')
        print('Headers')
        print('=======')
        pprint.pprint(self.headers.items())
        print('=======')

        length = int(self.headers['content-length'])
        data_raw = self.rfile.read(length)
        data = urlparse.parse_qs(data_raw)

        print('')
        print('Received')
        print('========')
        pprint.pprint(data)
        print('========')
        print('')

        self.wfile.write("Test from POST!\n")

httpd = TCPServerReusableSocket(
            ('localhost', _PORT), 
            HookedHTTPRequestHandler)

httpd.serve_forever()

We expect that what we’ve done above is fairly obvious and does not need an explanation. You can implement your own log_request(code=None, size=None) method in HookedHTTPRequestHandler to change how the requests are printed, or to remove them.

To continue our remarks about buffered-data above, we add special handling so that we don’t encounter the “socket.error: [Errno 48] Address already in use” error if you kill the server and restart it a moment later. You may choose one of the following two strategies:

  1. Force the socket to close immediately.
  2. Allow the socket to already be open.

(1) should be fine for logging/etc. However, this might not be a great option if you’re handling actual data. (2) should probably be the preferred strategy, but you’ll also have to be sure to implement a PID file in your application so that you can be sure that only one instance is running (assuming that’s desired).

To implement (2), use SocketServer.TCPServer instead of our custom TCPServerReusableSocket. and, add the following imports:

import socket
import struct

Then, add the following after we define httpd but before we start the server, to tell the SO_LINGER socket option to kill all buffered data immediately:

l_onoff = 1                                                                                                                                                           
l_linger = 0                                                                                                                                                          

httpd.socket.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', l_onoff, l_linger))

You can test this using cURL, if you can’t wait to setup HTTPHandler:

$ curl -X POST -d abc=def http://localhost:8000
Test from POST!

The webserver process will show:

$ python http_server.py 
127.0.0.1 - - [19/Oct/2014 15:28:47] "POST / HTTP/1.1" 200 -
Received POST request for: /

Headers
=======
[('host', 'localhost:8000'),
 ('content-type', 'application/x-www-form-urlencoded'),
 ('content-length', '7'),
 ('accept', '*/*'),
 ('user-agent', 'curl/7.30.0')]
=======

Received
========
{'abc': ['def']}
========

Understanding logging.handlers.HTTPHandler

My own use-case for this was from a new MapReduce platform (JobX), and I wanted to potentially emit messages to another system if certain tasks were accomplished. I used the built-in webserver that we invoked, above, to see these messages from the development system.

import logging
import logging.handlers

logger = logging.getLogger(__name__)

_TARGET = 'localhost:8000'
_PATH = '/'
_VERB = 'post'

sh = logging.handlers.HTTPHandler(_TARGET, _PATH, method=_VERB)

logger.addHandler(sh)
logger.setLevel(logging.DEBUG)

logger.debug("Test message.")

This will be shown by the webserver:

127.0.0.1 - - [19/Oct/2014 15:45:02] "POST / HTTP/1.0" 200 -
Received POST request for: /

Headers
=======
[('host', 'localhost'),
 ('content-type', 'application/x-www-form-urlencoded'),
 ('content-length', '368')]
=======

Received
========
{'args': ['()'],
 'created': ['1413747902.18'],
 'exc_info': ['None'],
 'exc_text': ['None'],
 'filename': ['push_socket_log.py'],
 'funcName': ['<module>'],
 'levelname': ['DEBUG'],
 'levelno': ['10'],
 'lineno': ['17'],
 'module': ['push_socket_log'],
 'msecs': ['181.387901306'],
 'msg': ['Test message.'],
 'name': ['__main__'],
 'pathname': ['./push_socket_log.py'],
 'process': ['65486'],
 'processName': ['MainProcess'],
 'relativeCreated': ['12.6709938049'],
 'thread': ['140735262810896'],
 'threadName': ['MainThread']}
========

Note that each field is a list with one item. If you want the output to look a little nicer, alter the above to add the following to the top of the module:

import datetime

_FMT_DATETIME_STD = '%Y-%m-%d %H:%M:%S'

Then, add the __print_entry method:

    def __print_entry(self, entry):
        created_epoch = float(entry['created'][0])
        when_dt = datetime.datetime.fromtimestamp(created_epoch)
        timestamp_phrase = when_dt.strftime(_FMT_DATETIME_STD)
        where_name = entry['name'][0][:40]
        level_name = entry['levelname'][0]

        message = entry['msg'][0]

        print('%s  %40s  %9s  %s' % 
              (timestamp_phrase, where_name, level_name, message))

Then, change the last part of do_POST:

    def do_POST(self):
        self.__send_headers()

        length = int(self.headers['content-length'])
        data_raw = self.rfile.read(length)
        data = urlparse.parse_qs(data_raw)

        self.__print_entry(data)

The output will now look like:

2014-10-19 16:16:00       MR_HANDLER.HTTP.map_obfuscation_one       INFO  Socket message!
2014-10-19 16:16:00                           MR_HANDLER.HTTP      ERROR  Mapper invocation [789b7ca7fcb6cede9ae5557b2121d392469dfc26] under request [85394d5bdb34a09ffa045776cc69d1d4cd17d657] failed. HANDLER=[map_obfuscation_one]

There is one weird thing about HTTPHandler, and it’s this: Many/all of the fields will be stringified in order to serialized them. If you call the logger like logging.debug('Received arguments: [%s] [%s]', arg1, arg2), then we’ll receive Received argument: [%s] in the msg field (or rather the msg list), and the arguments as a stringified tuple like (u'abc', u'def'). To avoid dealing with this, I’ll send messages into a function that’s in charge of the notifications, and produce the final string before I send it to the logger.

The same thing applies to tracebacks. If you log an exception, you’ll only get this:

 'exc_info': ['(<type 'exceptions.NameError'>, NameError("global name 'client_id' is not defined",), <traceback object at 0x110c92878>)'],
 'exc_text': ['None'],

Again, you’ll have to concatenate this into the log-message by some intermediate function (so that the primary application logic doesn’t have to know about it, but so that you’ll still get this information).

Infinite, Secure, and Distributed Backups Using Tahoe

Thanks to zooko for this one: A secure, distributed storage-service built on top of S3, which uses Tahoe (see previous article) as its client. Your data is 100% encrypted locally before being pumped into S3. It’s called S4.

It’s $25/month for infinite storage. For those of us with mountains of data to backup, it’s a deal (S3 costs about $30/T, currently, and even Glacier is $10/T).

Once you setup your Tahoe client with the right introducer and share configuration (which is trivial), all you do is call the “backup” subcommand with the path that you want to backup.

Done (well, after potentially several weeks of backing-up it will be).