Extracting Tokens from a String Template (Python)

Python provides regular-expression-based baked-in string-templating functionality. It’s highly configurable and allows you to easily do string-replacements into templates of the following manner:

Token 1: $Token1
Token 2: $Token2
Token 3: ${Token3}

You can tell it to use an alternate template pattern (using an alternate symbol or symbols) as well as being able to tell it to work differently at the regular-expression level.

It’s not spelled-out how to extract the tokens from a template, however. You would just use a simple regular-expression-based search:

import string
import re

text = """
Token 1: $Token1
Token 2: $Token2
Token 3: $Token3
"""

t = string.Template(text)
result = re.findall(t.pattern, t.template)
tokens = [r[1] for r in result]
print(tokens)
#['Token1', 'Token2', 'Token3']
Advertisements

Serving a Beautiful Website with R and Bootstrap

R Spline Screenshot

There is very limited coverage on how to build a website with R. It was a fight to answer the questions that I had. Obviously, this is because R was not meant to serve websites. In fact, if you want to serve a website that has any sort of volume, you’re probably better-off using Shiny Server or hosted Shinyapps.io: http://shiny.rstudio.com/deploy

However, you still may want to write your own R-based web application for one or more of the following reasons:

  • You just want to (I fit into this category)
  • You want more control over the web code, as Shiny will dynamically generate the code for you. Shiny may look spectacular, but at the cost of losing a lot of control (for good reason).
  • You want to run more than one application and don’t want to pay the $10,000/year price for each Shiny server instance.
  • You don’t want to pay for a hosted solution or subject yourself to the limits of a free account (your application will be deactivated after the first twenty-five hours of active usage each month).

So, assuming that you just want to run your own server, I’ve created a test-project to help you out. Make sure to install the requirements before running it. The project depends on the DAAG package. This is a companion package to a book that we use for the dataset in a spline example.

This project was created on top of Rook, a fairly low-level R package that removes most of the semantics of serving web-requests while still leaving you buried in the flow. This was brought about by Jeffrey Horner who had previously introduced both rApache and Brew. He was also involved in Shiny Server’s implementation.

We’ll include a couple of excerpts from the project, here. For more information on running the example project, go to the project website.

The main routing code:

#!/usr/bin/env Rscript

library(Rook)

source('ajax.r')

main.app <- Builder$new(
    # Static assets (images, Javascript, CSS)
    Static$new(
        urls = c('/static'),
        root = '.'
    ),

    # Webpage serving.
    Static$new(urls='/html',root='.'), 

    Rook::URLMap$new(
        '/ajax/lambda/result' = lambda.ajax.handler,
        '/ajax/lambda/image' = lambda.image.ajax.handler,
        '/' = Redirect$new('/html/index.html')
    )
)

s <- Rhttpd$new()
s$add(name='test_project',app=main.app)

s$start(port=5000)

while (TRUE) {
    Sys.sleep(0.5);
}

We loop at the bottom because, if you’re calling this as a script as intended, we want to keep it running in order to process requests.

The dynamic-request handlers:

library(jsonlite)
library(base64enc)

source('utility.r')

eval.code <- function(code, result_name=NULL) {
    message <- NULL
    cb_error <- function(e) {
        message <<- list(type='error', message=e$message)
    }
    cb_warning <- function(w) {
        message <<- list(type='warning', message=w$message)
    }

    tryCatch(eval(parse(text=code)), error=cb_error, warning=cb_warning)

    if(is.null(message)) {
        result <- list(success=TRUE)

        if(is.null(result_name) == FALSE) {
            if(exists(result_name) == FALSE) {
                result$found <- FALSE
            } else {
                result$found <- TRUE
                result$value <- mget(result_name)[[result_name]]
            }
        }

        return(result)
    } else {
        return(list(success=FALSE, message=message))
    }
}

lambda.ajax.handler <- function(env) {
    # Execute code and return the value for the variable of the given name.

    req <- Request$new(env)

    if(is.null(req$GET()$tab_name)) {
        # Parameters missing.
        res <- Response$new(status=500)
        write.text(res, "No 'tab_name' parameter provided.")
    } else if(is.null(req$GET()$result_name)) {
        # Parameters missing.
        res <- Response$new(status=500)
        write.text(res, "No 'result_name' parameter provided.")
    } else if(is.null(req$POST())) {
        # Body missing.
        res <- Response$new(status=500)
        write.text(res, "POST-data missing. Please provide code.")
    } else {
        # Execute code and return the result.

        res <- Response$new()

        result_name <- req$GET()$result_name
        code <- req$POST()[['code']]

        execution_result <- eval.code(code, result_name=result_name)
        execution_result$value = paste(capture.output(print(execution_result$value)), collapse='n')

        write.json(res, execution_result)
    }

    res$finish()
}

lambda.image.ajax.handler <- function(env) {
    # Execute code and return a base64-encoded image.

    req <- Request$new(env)

    if(is.null(req$GET()$tab_name)) {
        # Parameters missing.
        res <- Response$new(status=500)
        write.text(res, "No 'tab_name' parameter provided.")
    } else if(is.null(req$POST())) {
        # Body missing.
        res <- Response$new(status=500)
        write.text(res, "POST-data missing. Please provide code.")
    } else {
        # Execute code and return the result.

        # If we're returning an image, set the content-type and redirect 
        # the graphics device to a file.

        t <- tempfile()
        png(file=t)
        png(t, type="cairo", width=500, height=500)

        result_name <- req$GET()$result_name
        code <- req$POST()[['code']]

        execution_result <- eval.code(code, result_name=result_name)

        # If we're returning an image, stop the graphics device and return 
        # the data.

        dev.off()
        length <- file.info(t)$size

        if(length == 0) {
            res <- Response$new(status=500)
            res$header('Content-Type', 'text/plain')

            res$write("No image was generated. Your code is not complete.")
        } else {
            res <- Response$new()
            res$header('Content-Type', 'text/plain')

            data_uri <- dataURI(file=t, mime="image/png")
            res$write(data_uri)
        }
    }

    res$finish()
}

For reference, there is also another project called rapport that lets you produce HTML though not whole websites.

Beautiful Python Charts Using Seaborn

Of the five or six most well-known charting packages, none really impressed me (being a devoted user of Highcharts, in Javascript). The exception to this is plot.py but it’s a remote service and I’d rather not couple myself to a service.

In the end, I went with Seaborn (Stanford). This seemed to look the best of all of the options, even if it was tough to get some of the features working right. They recently added a chart for the exclusive purpose of plotting categorical/factor-based data: factorplot.

An example of Seaborn’s factorplot:

Seaborn - factorplot

I put together an example of a bar-chart using data from data.gov. I used pandas to read the CSV data. Since Seaborn is built on top of matplotlib and matplotlib appears to have issues displaying graphics in my local environment, I had to rely on running the example via ipython using the matplotlib magic function (which loads everything necessary and worked as expected).

To run the example, you’ll need the following packages (in addition to the ipython environment):

  • pandas
  • numpy
  • seaborn
  • matplotlib

The code:

# Tell ipython to load the matplotlib environment.
%matplotlib

import itertools

import pandas
import numpy
import seaborn
import matplotlib.pyplot

_DATA_FILEPATH = 'datagovdatasetsviewmetrics.csv'
_ROTATION_DEGREES = 90
_BOTTOM_MARGIN = 0.35
_COLOR_THEME = 'coolwarm'
_LABEL_X = 'Organizations'
_LABEL_Y = 'Views'
_TITLE = 'Organizations with Most Views'
_ORGANIZATION_COUNT = 10
_MAX_LABEL_LENGTH = 20

def get_data():
    # Read the dataset.

    d = pandas.read_csv(_DATA_FILEPATH)

    # Group by organization.

    def sum_views(df):
        return sum(df['Views per Month'])

    g = d.groupby('Organization Name').apply(sum_views)

    # Sort by views (descendingly).

    g.sort(ascending=False)

    # Grab the first N to plot.

    items = g.iteritems()
    s = itertools.islice(items, 0, _ORGANIZATION_COUNT)

    s = list(s)

    # Sort them in ascending order, this time, so that the larger ones are on 
    # the right (in red) in the chart. This has a side-effect of flattening the 
    # generator while we're at it.
    s = sorted(s, key=lambda (n, v): v)

    # Truncate the names (otherwise they're unwieldy).

    distilled = []
    for (name, views) in s:
        if len(name) > (_MAX_LABEL_LENGTH - 3):
            name = name[:17] + '...'

        distilled.append((name, views))

    return distilled

def plot_chart(distilled):
    # Split the series into separate vectors of labels and values.

    labels_raw = []
    values_raw = []
    for (name, views) in distilled:
        labels_raw.append(name)
        values_raw.append(views)

    labels = numpy.array(labels_raw)
    values = numpy.array(values_raw)

    # Create one plot.

    seaborn.set(style="white", context="talk")

    (f, ax) = matplotlib.pyplot.subplots(1)

    b = seaborn.barplot(
        labels, 
        values,
        ci=None, 
        palette=_COLOR_THEME, 
        hline=0, 
        ax=ax,
        x_order=labels)

    # Set labels.

    ax.set_title(_TITLE)
    ax.set_xlabel(_LABEL_X)
    ax.set_ylabel(_LABEL_Y)

    # Rotate the x-labels (otherwise they'll overlap). Seaborn also doesn't do 
    # very well with diagonal labels so we'll go vertical.
    b.set_xticklabels(labels, rotation=_ROTATION_DEGREES)

    # Add some margin to the bottom so the labels aren't cut-off.
    matplotlib.pyplot.subplots_adjust(bottom=_BOTTOM_MARGIN)

distilled = get_data()
plot_chart(distilled)

To run the example, save it to a file and load it into the ipython environment. If you were to save it as “barchart.ipy” (using the “ipy” extension so it processes the %matplotlib directive properly) and then start ipython using the “ipython” executable, you’d load it like this:

%run barchart.ipy

The graphic will be displayed in another window:

Seaborn barchart

I should also mention that I really like pygal but didn’t consider it an option because I wanted a traditional, flat image, not an SVG. Even so, here’s an example from their website:

import pygal                                                       # First import pygal
bar_chart = pygal.Bar()                                            # Then create a bar graph object
bar_chart.add('Fibonacci', [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55])  # Add some values
bar_chart.render_to_file('bar_chart.svg')                          # Save the svg to a file

Output:

pygal SVN Chart

Notice that the resulting SVG even has hover effects.

Copying All (Including Hidden) Files From One Directory Into Another

I just noticed a Superuser question with ~80000 views where people were still generally-clueless about a commandline trick with file-copying. As there was only a slight variation of this mentioned, I’m going to share it here.

So, the following command can be interpreted two different ways:

$ cp -r dir1 dir2
  • If dir2 exists, copy dir1 into dir2 as basename(dir1).
  • If dir2 doesn’t exist, copy dir1 into dirname(dir2) and name it basename(dir1).

What if you want to always copy the contents of dir1 into dir2? Well, you’d do this:

$ cp -r dir1/* dir2

However, this will ignore any hidden files in dir1. Instead, you can add a trailing slash to dir1:

$ cp -r dir1/ dir2

This will deterministically pour the contents of dir1 into dir2.

Example:

/tmp$ mkdir test_dir1
/tmp$ cd test_dir1/
/tmp/test_dir1$ touch aa
/tmp/test_dir1$ touch .bb
/tmp/test_dir1$ cd ..
/tmp$ mkdir test_dir2

/tmp$ cp -r test_dir1/* test_dir2
/tmp$ ls -1a test_dir2
.
..
aa

/tmp$ cp -r test_dir1/ test_dir2
/tmp$ ls -1a test_dir2
.
..
.bb
aa

The Superuser question insisted that you needed a period at the end of the from argument which isn’t accurate (but will still work).

Recursively Scanning a Path with Filters under Python

Python has a great function to walk a tree called os.walk(). It’s a simple generator (meaning that you just enumerate it), and, at each node (a specific child path) it gives you 1) the current path, 2) a list of child directories, and 3) a list of child files. You can even use it in such a way that you can adjust what child directories it will walk on-the-fly. However, it doesn’t take any filters. What if you just want to give it inclusion/exclusion rules and then see the matching results?

Enter pathscan. This library will silently start a background-worker (as a process) to scan the directory structure in parallel while forwarding results to the foreground. To install, just install the pathscan library. It requires Python 3.4.

The library runs as a generator:

import fss.constants
import fss.config.log
import fss.orchestrator

root_path = '/etc'

filter_rules = [
    (fss.constants.FT_DIR, fss.constants.FILTER_INCLUDE, 'init'),
    (fss.constants.FT_FILE, fss.constants.FILTER_INCLUDE, 'net*'),
    (fss.constants.FT_FILE, fss.constants.FILTER_EXCLUDE, 'networking.conf'),
]

o = fss.orchestrator.Orchestrator(root_path, filter_rules)
for (entry_type, entry_filepath) in o.recurse():
    if entry_type == fss.constants.FT_DIR:
        print("Directory: [%s]" % (entry_filepath,))
    else: # entry_type == fss.constants.FT_FILE:
        print("File: [%s]" % (entry_filepath,))

# Directory: [/etc/init]
# File: [/etc/networks]
# File: [/etc/netconfig]
# File: [/etc/init/network-interface-container.conf]
# File: [/etc/init/networking.conf]
# File: [/etc/init/network-interface-security.conf]
# File: [/etc/init/network-interface.conf]

A command-line tool is also included:

$ pathscan -i "i*.h" -id php /usr/include 
F /usr/include/iconv.h
F /usr/include/ifaddrs.h
F /usr/include/inttypes.h
F /usr/include/iso646.h
D /usr/include/php

Unix Signals and Their Integers

I always find that this information is too far buried in the include files or requires too many Google searches. So, they’re now printed here both for your convenience and mine. No doubt that some of them may not be standard on all Unixes, but the first nine are generally the only ones that are relevant.

Name Integer
SIGHUP 1
SIGINT 2
SIGQUIT 3
SIGILL 4
SIGTRAP 5
SIGABRT 6
SIGFPE 8
SIGKILL 9
SIGBUS 10
SIGSEGV 11
SIGSYS 12
SIGPIPE 13
SIGALRM 14
SIGTERM 15
SIGUSR1 16
SIGUSR2 17
SIGCHLD 18
SIGTSTP 20
SIGURG 21
SIGPOLL 22
SIGSTOP 23
SIGCONT 25
SIGTTIN 26
SIGTTOU 27
SIGVTALRM 28
SIGPROF 29
SIGXCPU 30
SIGXFSZ 31

Creating a Case-Sensitive Partition in OSX

I work inside of Vagrant on my Mac system. I only just ran into a case-sensitivity problem that led to the wrong libraries being included. For, even though I’m running in an Ubuntu instance, it’s still subject to the rules of the filesystem that it’s only just sharing off the host system. So, the time has come to fix this annoying little trait of my Mac environment.

Go to Disk Utility and create an image. Make sure you select a case-sensitive format (e.g. “Mac OS Extended (Case-sensitive, Journaled)”):

Creating a Case-Sensitive Image in Disk Utility

Notice that I chose “sparse disk image” for “Image Format”. This starts with a minimally-sized container that’ll grow as I populate it with data, rather than starting off at the requested size.

Since you’re probably going to want to mount this image on a particular folder, unmount it using Disk Utility or Finder (since it would’ve automatically been mounted after you created it). Then, go to the command-line and mount it where ever you’d like:

$ hdiutil attach -mountpoint ~/development DevelopmentData.sparseimage

After that, the sky is the limit. Naturally, consider using “rsync -a” if you have to copy existing files there.