Implementing Sessions Under AppEngine With Go

A simple and intuitive package named cascadestore provided by the go-appengine-sessioncascade project to implement and combine Memcache, Datastore, the request context, or any combination of them, as session backends under AppEngine.

Example:

package handlers

import (
    "net/http"

    "google.golang.org/appengine"
    "google.golang.org/appengine/log"

    "github.com/dsoprea/goappenginesessioncascade"
)

const (
    sessionName = "MainSession"
)

var (
    sessionSecret = []byte("SessionSecret")
    sessionStore  = cascadestore.NewCascadeStore(cascadestore.DistributedBackends, sessionSecret)
)

func HandleRequest(w http.ResponseWriter, r *http.Request) {
    ctx := appengine.NewContext(r)

    if session, err := sessionStore.Get(r, sessionName); err != nil {
        panic(err)
    } else {
        if vRaw, found := session.Values["ExistingKey"]; found == false {
            log.Debugf(ctx, "Existing value not found.")
        } else {
            v := vRaw.(string)
            log.Debugf(ctx, "Existing value: [%s]", v)
        }

        session.Values["NewKey"] = "NewValue"
        if err := session.Save(r, w); err != nil {
         panic(err)
        }
    }
}
Advertisements
Atom UI

Efficiently Processing GPX Files in Go

Use gpxreader to process a GPX file of any size without reading the whole thing into memory. This also avoids Go’s issue where the Decoder can decode one node at a time, but, when you do that, it implicitly ignores all child nodes (because it seeks to the matching close tag for validation without any ability to disable this behavior).

An excerpt of the test-script from the project:

//...

func (gv *gpxVisitor) GpxOpen(gpx *gpxreader.Gpx) error {
    fmt.Printf("GPX: %s\n", gpx)

    return nil
}

func (gv *gpxVisitor) GpxClose(gpx *gpxreader.Gpx) error {
    return nil
}

func (gv *gpxVisitor) TrackOpen(track *gpxreader.Track) error {
    fmt.Printf("Track: %s\n", track)

    return nil
}

func (gv *gpxVisitor) TrackClose(track *gpxreader.Track) error {
    return nil
}

func (gv *gpxVisitor) TrackSegmentOpen(trackSegment *gpxreader.TrackSegment) error {
    fmt.Printf("Track segment: %s\n", trackSegment)

    return nil
}

func (gv *gpxVisitor) TrackSegmentClose(trackSegment *gpxreader.TrackSegment) error {
    return nil
}

func (gv *gpxVisitor) TrackPointOpen(trackPoint *gpxreader.TrackPoint) error {
    return nil
}

func (gv *gpxVisitor) TrackPointClose(trackPoint *gpxreader.TrackPoint) error {
    fmt.Printf("Point: %s\n", trackPoint)

    return nil
}

//...

func main() {
    var gpxFilepath string

    o := readOptions()

    gpxFilepath = o.GpxFilepath

    f, err := os.Open(gpxFilepath)
    if err != nil {
        panic(err)
    }

    defer f.Close()

    gv := newGpxVisitor()
    gp := gpxreader.NewGpxParser(f, gv)

    err = gp.Parse()
    if err != nil {
        print("Error: %s\n", err.Error())
        os.Exit(1)
    }
}

Output:

$ gpxreadertest -f 20140909.gpx 
GPX: GPX<C=[GPSLogger - http://gpslogger.mendhak.com/]>
Track: Track<>
Track segment: TrackSegment<>
Point: TrackPoint<LAT=(26.47886514) LON=(-80.08643986) ELV=(-12.000000) CRS=(197.899994) SPD=(35.250000) HDOP=(0.900000) SRC=[gps] SAT=(21) TIME=[2014-09-09 19:07:27 +0000 UTC]>
Point: TrackPoint<LAT=(26.40728154) LON=(-80.11801469) ELV=(9.000000) CRS=(0.000000) SPD=(0.000000) HDOP=(1.200000) SRC=[gps] SAT=(16) TIME=[2014-09-09 22:07:52 +0000 UTC]>
Point: TrackPoint<LAT=(26.54074478) LON=(-80.07230151) ELV=(-31.000000) CRS=(12.800000) SPD=(31.503967) HDOP=(1.000000) SRC=[gps] SAT=(17) TIME=[2014-09-09 22:53:27 +0000 UTC]>

Processing Text for Sentiment and Other Good Stuff

textblob integrates nltk and pattern. It allows you to easily extract and derive information from a passage of text.

To install:

$ sudo pip install textblob

Based on the example, here:

import textblob
import pprint

text = '''
The titular threat of The Blob has always struck me as the ultimate movie monster: an insatiably hungry, amoeba-like mass able to penetrate virtually any safeguard, capable of--as a doomed doctor chillingly describes it--"assimilating flesh on contact. Snide comparisons to gelatin be darned, it's a concept with the most devastating of potential consequences, not unlike the grey goo scenario proposed by technological theorists fearful of artificial intelligence run rampant.
'''

blob = textblob.TextBlob(text)

# Get parts of speech.
blob.tags

# Get list of individual noun-phrases.
blob.noun_phrases

# Print sentence and sentiment polarity:
for sentence in blob.sentences:
    print(sentence)
    print('')
    print(sentence.sentiment.polarity)
    print('')
    print('--')
    print('')

# Convert to Spanish.
blob.translate(to="es")

Output:

>>> import textblob
>>> import pprint
>>> 
>>> text = '''
... The titular threat of The Blob has always struck me as the ultimate movie monster: an insatiably hungry, amoeba-like mass able to penetrate virtually any safeguard, capable of--as a doomed doctor chillingly describes it--"assimilating flesh on contact. Snide comparisons to gelatin be darned, it's a concept with the most devastating of potential consequences, not unlike the grey goo scenario proposed by technological theorists fearful of artificial intelligence run rampant.
... '''
>>> 
>>> blob = textblob.TextBlob(text)
>>>
>>> # Get parts of speech.
>>> blob.tags
[(u'The', u'DT'), (u'titular', u'JJ'), (u'threat', u'NN'), (u'of', u'IN'), (u'The', u'DT'), (u'Blob', u'NNP'), (u'has', u'VBZ'), (u'always', u'RB'), (u'struck', u'VBD'), (u'me', u'PRP'), (u'as', u'IN'), (u'the', u'DT'), (u'ultimate', u'JJ'), (u'movie', u'NN'), (u'monster', u'NN'), (u'an', u'DT'), (u'insatiably', u'RB'), (u'hungry', u'JJ'), (u'amoeba-like', u'JJ'), (u'mass', u'NN'), (u'able', u'JJ'), (u'to', u'TO'), (u'penetrate', u'VB'), (u'virtually', u'RB'), (u'any', u'DT'), (u'safeguard', u'VB'), (u'capable', u'JJ'), (u'of--as', u'JJ'), (u'a', u'DT'), (u'doomed', u'VBN'), (u'doctor', u'NN'), (u'chillingly', u'RB'), (u'describes', u'VBZ'), (u'it', u'PRP'), (u'assimilating', u'VBG'), (u'flesh', u'NN'), (u'on', u'IN'), (u'contact', u'NN'), (u'Snide', u'NNP'), (u'comparisons', u'NNS'), (u'to', u'TO'), (u'gelatin', u'NN'), (u'be', u'VB'), (u'darned', u'JJ'), (u'it', u'PRP'), (u"'", u'POS'), (u's', u'PRP'), (u'a', u'DT'), (u'concept', u'NN'), (u'with', u'IN'), (u'the', u'DT'), (u'most', u'RBS'), (u'devastating', u'JJ'), (u'of', u'IN'), (u'potential', u'JJ'), (u'consequences', u'NNS'), (u'not', u'RB'), (u'unlike', u'IN'), (u'the', u'DT'), (u'grey', u'JJ'), (u'goo', u'NN'), (u'scenario', u'NN'), (u'proposed', u'VBN'), (u'by', u'IN'), (u'technological', u'JJ'), (u'theorists', u'NNS'), (u'fearful', u'JJ'), (u'of', u'IN'), (u'artificial', u'JJ'), (u'intelligence', u'NN'), (u'run', u'VB'), (u'rampant', u'JJ')]
>>>
>>> # Get list of individual noun-phrases.
>>> blob.noun_phrases
WordList([u'titular threat', 'blob', u'ultimate movie monster', u'amoeba-like mass', 'snide', u'potential consequences', u'grey goo scenario', u'technological theorists fearful', u'artificial intelligence run rampant'])
>>>
>>> # Print sentence and sentiment polarity:
>>> for sentence in blob.sentences:
...     print(sentence)
...     print('')
...     print(sentence.sentiment.polarity)
...     print('')
...     print('--')
...     print('')
... 

The titular threat of The Blob has always struck me as the ultimate movie monster: an insatiably hungry, amoeba-like mass able to penetrate virtually any safeguard, capable of--as a doomed doctor chillingly describes it--"assimilating flesh on contact.

0.06

--

Snide comparisons to gelatin be darned, it's a concept with the most devastating of potential consequences, not unlike the grey goo scenario proposed by technological theorists fearful of artificial intelligence run rampant.


-0.341666666667

--

>>>
>>> # Convert to Spanish.
>>> blob.translate(to="es")
TextBlob("La amenaza titular de The Blob siempre me ha parecido como el último monstruo de la película: una, la masa insaciablemente hambriento ameba capaz de penetrar prácticamente cualquier salvaguardia, capaz de - como médico condenado escalofriantemente lo describe - "asimilar carne en contacto. comparaciones Snide a la gelatina ser condenados, es un concepto con el más devastador de las posibles consecuencias, no muy diferente del escenario plaga gris propuesta por los teóricos tecnológicos temerosos de la inteligencia artificial ejecutar rampante.")

Awesome, right?

Subversion from Python

Generally, it’s preferable to bind to libraries rather than executables when given the option. In my case, I needed SVN access from Python and couldn’t, at that time, find a confidence-inspiring library to work with. So, I wrote svn.

It turns out that there is a Subversion-sponsored Python project. It looks to be SWIG-based.

This comes from the python-svn Apt package under Ubuntu.

The Programmer’s Guide has the following examples, among others:

cat:

import pysvn
client = pysvn.Client()
file_content = client.cat('file.txt')

ls:

import pysvn
client = pysvn.Client()
entry_list = client.ls('.')

info:

import pysvn
client = pysvn.Client()
entry = client.info('.')

Using inotify to watch for directory changes from Python

An inotify project is now available on PyPI. More documentation is available at the project homepage: PyInotify

Though the inotify functionality is uncomplicated to implement in C, it’s stupidly simple to implement in Python using this library.

To install:

$ sudo pip install inotify

This is the principal logic of the example provided in the project documentation:

i = inotify.adapters.Inotify()

i.add_watch('/tmp')

for event in i.event_gen():
    if event is not None:
        (header, type_names, watch_path, filename) = event

        _LOGGER.info("WD=(%d) MASK=(%d) COOKIE=(%d) LEN=(%d) MASK->NAMES=%s "
                     "WATCH-PATH=[%s] FILENAME=[%s]", 
                     header.wd, header.mask, header.cookie, header.len, type_names, 
                     watch_path, filename)

We ran the following operations on /tmp:

$ touch /tmp/aa
$ rm /tmp/aa
$ mkdir /tmp/dir1
$ rmdir /tmp/dir1

This was the corresponding output of the inotify process:

2015-04-24 05:02:06,667 - __main__ - INFO - WD=(1) MASK=(256) COOKIE=(0) LEN=(16) MASK->NAMES=['IN_CREATE'] FILENAME=[aa]
2015-04-24 05:02:06,667 - __main__ - INFO - WD=(1) MASK=(32) COOKIE=(0) LEN=(16) MASK->NAMES=['IN_OPEN'] FILENAME=[aa]
2015-04-24 05:02:06,667 - __main__ - INFO - WD=(1) MASK=(4) COOKIE=(0) LEN=(16) MASK->NAMES=['IN_ATTRIB'] FILENAME=[aa]
2015-04-24 05:02:06,667 - __main__ - INFO - WD=(1) MASK=(8) COOKIE=(0) LEN=(16) MASK->NAMES=['IN_CLOSE_WRITE'] FILENAME=[aa]
2015-04-24 05:02:17,412 - __main__ - INFO - WD=(1) MASK=(512) COOKIE=(0) LEN=(16) MASK->NAMES=['IN_DELETE'] FILENAME=[aa]
2015-04-24 05:02:22,884 - __main__ - INFO - WD=(1) MASK=(1073742080) COOKIE=(0) LEN=(16) MASK->NAMES=['IN_ISDIR', 'IN_CREATE'] FILENAME=[dir1]
2015-04-24 05:02:25,948 - __main__ - INFO - WD=(1) MASK=(1073742336) COOKIE=(0) LEN=(16) MASK->NAMES=['IN_ISDIR', 'IN_DELETE'] FILENAME=[dir1]

Lastly, this library also provides the ability to recursively watch a given directory. Just use the inotify.adapters.InotifyTree class instead of inotify.adapters.Inotify, and pass a path.

Programmatically-Driven Websites in Python (with HTTPHandler and SO_LINGER)

We’re going to write a website whose requests are handled by subroutines, and use Python’s logging.handlers.HTTPHandler class to send requests to it. Documentation and/or examples for the former are sparse, and I thought that an example of the latter connecting to the former would be useful.

Understanding the Webserver

Using the built-in BaseHTTPServer.BaseHTTPRequestHandler webserver, you can wire methods for individual verbs (GET, POST, PUT, etc..). Requests on verbs that aren’t handled will return a 501. Aside from having to write the headers at the top of the methods yourself and needing to read a specific quantity of data-bytes (or you’ll block forever), this is similar to every other web-framework that you’ve used.

The only things that you really need to know are the following instance variables:

  • headers: A dictionary-like collection of headers.
  • rfile: A file-like object that will contain your data (if you receive any).
  • wfile: A file-like object that will receive your response data (if you send any).

You’ll also need to deal with how to handle unsent data when you terminate. Even if you shutdown a socket, it may not be closed by the system immediately if data has already moved across it. This relates to why we inherit from SocketServer.TCPServer and change the one class variable. We’ll discuss this more, below.

import pprint
import urlparse

import BaseHTTPServer
import SocketServer

_PORT = 8000


class TCPServerReusableSocket(SocketServer.TCPServer):
    allow_reuse_address = True


class HookedHTTPRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
    def __send_headers(self):
        self.send_response(200)
        self.send_header("Content-type", 'text/plain')
        self.end_headers()

    def do_GET(self):
        self.__send_headers()

        print("Received GET request for: %s" % (self.path,))

        self.wfile.write("Test from GET!\n")

    def do_POST(self):
        self.__send_headers()

        print("Received POST request for: %s" % (self.path,))

        print('')
        print('Headers')
        print('=======')
        pprint.pprint(self.headers.items())
        print('=======')

        length = int(self.headers['content-length'])
        data_raw = self.rfile.read(length)
        data = urlparse.parse_qs(data_raw)

        print('')
        print('Received')
        print('========')
        pprint.pprint(data)
        print('========')
        print('')

        self.wfile.write("Test from POST!\n")

httpd = TCPServerReusableSocket(
            ('localhost', _PORT), 
            HookedHTTPRequestHandler)

httpd.serve_forever()

We expect that what we’ve done above is fairly obvious and does not need an explanation. You can implement your own log_request(code=None, size=None) method in HookedHTTPRequestHandler to change how the requests are printed, or to remove them.

To continue our remarks about buffered-data above, we add special handling so that we don’t encounter the “socket.error: [Errno 48] Address already in use” error if you kill the server and restart it a moment later. You may choose one of the following two strategies:

  1. Force the socket to close immediately.
  2. Allow the socket to already be open.

(1) should be fine for logging/etc. However, this might not be a great option if you’re handling actual data. (2) should probably be the preferred strategy, but you’ll also have to be sure to implement a PID file in your application so that you can be sure that only one instance is running (assuming that’s desired).

To implement (2), use SocketServer.TCPServer instead of our custom TCPServerReusableSocket. and, add the following imports:

import socket
import struct

Then, add the following after we define httpd but before we start the server, to tell the SO_LINGER socket option to kill all buffered data immediately:

l_onoff = 1                                                                                                                                                           
l_linger = 0                                                                                                                                                          

httpd.socket.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', l_onoff, l_linger))

You can test this using cURL, if you can’t wait to setup HTTPHandler:

$ curl -X POST -d abc=def http://localhost:8000
Test from POST!

The webserver process will show:

$ python http_server.py 
127.0.0.1 - - [19/Oct/2014 15:28:47] "POST / HTTP/1.1" 200 -
Received POST request for: /

Headers
=======
[('host', 'localhost:8000'),
 ('content-type', 'application/x-www-form-urlencoded'),
 ('content-length', '7'),
 ('accept', '*/*'),
 ('user-agent', 'curl/7.30.0')]
=======

Received
========
{'abc': ['def']}
========

Understanding logging.handlers.HTTPHandler

My own use-case for this was from a new MapReduce platform (JobX), and I wanted to potentially emit messages to another system if certain tasks were accomplished. I used the built-in webserver that we invoked, above, to see these messages from the development system.

import logging
import logging.handlers

logger = logging.getLogger(__name__)

_TARGET = 'localhost:8000'
_PATH = '/'
_VERB = 'post'

sh = logging.handlers.HTTPHandler(_TARGET, _PATH, method=_VERB)

logger.addHandler(sh)
logger.setLevel(logging.DEBUG)

logger.debug("Test message.")

This will be shown by the webserver:

127.0.0.1 - - [19/Oct/2014 15:45:02] "POST / HTTP/1.0" 200 -
Received POST request for: /

Headers
=======
[('host', 'localhost'),
 ('content-type', 'application/x-www-form-urlencoded'),
 ('content-length', '368')]
=======

Received
========
{'args': ['()'],
 'created': ['1413747902.18'],
 'exc_info': ['None'],
 'exc_text': ['None'],
 'filename': ['push_socket_log.py'],
 'funcName': ['<module>'],
 'levelname': ['DEBUG'],
 'levelno': ['10'],
 'lineno': ['17'],
 'module': ['push_socket_log'],
 'msecs': ['181.387901306'],
 'msg': ['Test message.'],
 'name': ['__main__'],
 'pathname': ['./push_socket_log.py'],
 'process': ['65486'],
 'processName': ['MainProcess'],
 'relativeCreated': ['12.6709938049'],
 'thread': ['140735262810896'],
 'threadName': ['MainThread']}
========

Note that each field is a list with one item. If you want the output to look a little nicer, alter the above to add the following to the top of the module:

import datetime

_FMT_DATETIME_STD = '%Y-%m-%d %H:%M:%S'

Then, add the __print_entry method:

    def __print_entry(self, entry):
        created_epoch = float(entry['created'][0])
        when_dt = datetime.datetime.fromtimestamp(created_epoch)
        timestamp_phrase = when_dt.strftime(_FMT_DATETIME_STD)
        where_name = entry['name'][0][:40]
        level_name = entry['levelname'][0]

        message = entry['msg'][0]

        print('%s  %40s  %9s  %s' % 
              (timestamp_phrase, where_name, level_name, message))

Then, change the last part of do_POST:

    def do_POST(self):
        self.__send_headers()

        length = int(self.headers['content-length'])
        data_raw = self.rfile.read(length)
        data = urlparse.parse_qs(data_raw)

        self.__print_entry(data)

The output will now look like:

2014-10-19 16:16:00       MR_HANDLER.HTTP.map_obfuscation_one       INFO  Socket message!
2014-10-19 16:16:00                           MR_HANDLER.HTTP      ERROR  Mapper invocation [789b7ca7fcb6cede9ae5557b2121d392469dfc26] under request [85394d5bdb34a09ffa045776cc69d1d4cd17d657] failed. HANDLER=[map_obfuscation_one]

There is one weird thing about HTTPHandler, and it’s this: Many/all of the fields will be stringified in order to serialized them. If you call the logger like logging.debug('Received arguments: [%s] [%s]', arg1, arg2), then we’ll receive Received argument: [%s] in the msg field (or rather the msg list), and the arguments as a stringified tuple like (u'abc', u'def'). To avoid dealing with this, I’ll send messages into a function that’s in charge of the notifications, and produce the final string before I send it to the logger.

The same thing applies to tracebacks. If you log an exception, you’ll only get this:

 'exc_info': ['(<type 'exceptions.NameError'>, NameError("global name 'client_id' is not defined",), <traceback object at 0x110c92878>)'],
 'exc_text': ['None'],

Again, you’ll have to concatenate this into the log-message by some intermediate function (so that the primary application logic doesn’t have to know about it, but so that you’ll still get this information).