Serving a Beautiful Website with R and Bootstrap

R Spline Screenshot

There is very limited coverage on how to build a website with R. It was a fight to answer the questions that I had. Obviously, this is because R was not meant to serve websites. In fact, if you want to serve a website that has any sort of volume, you’re probably better-off using Shiny Server or hosted Shinyapps.io: http://shiny.rstudio.com/deploy

However, you still may want to write your own R-based web application for one or more of the following reasons:

  • You just want to (I fit into this category)
  • You want more control over the web code, as Shiny will dynamically generate the code for you. Shiny may look spectacular, but at the cost of losing a lot of control (for good reason).
  • You want to run more than one application and don’t want to pay the $10,000/year price for each Shiny server instance.
  • You don’t want to pay for a hosted solution or subject yourself to the limits of a free account (your application will be deactivated after the first twenty-five hours of active usage each month).

So, assuming that you just want to run your own server, I’ve created a test-project to help you out. Make sure to install the requirements before running it. The project depends on the DAAG package. This is a companion package to a book that we use for the dataset in a spline example.

This project was created on top of Rook, a fairly low-level R package that removes most of the semantics of serving web-requests while still leaving you buried in the flow. This was brought about by Jeffrey Horner who had previously introduced both rApache and Brew. He was also involved in Shiny Server’s implementation.

We’ll include a couple of excerpts from the project, here. For more information on running the example project, go to the project website.

The main routing code:

#!/usr/bin/env Rscript

library(Rook)

source('ajax.r')

main.app <- Builder$new(
    # Static assets (images, Javascript, CSS)
    Static$new(
        urls = c('/static'),
        root = '.'
    ),

    # Webpage serving.
    Static$new(urls='/html',root='.'), 

    Rook::URLMap$new(
        '/ajax/lambda/result' = lambda.ajax.handler,
        '/ajax/lambda/image' = lambda.image.ajax.handler,
        '/' = Redirect$new('/html/index.html')
    )
)

s <- Rhttpd$new()
s$add(name='test_project',app=main.app)

s$start(port=5000)

while (TRUE) {
    Sys.sleep(0.5);
}

We loop at the bottom because, if you’re calling this as a script as intended, we want to keep it running in order to process requests.

The dynamic-request handlers:

library(jsonlite)
library(base64enc)

source('utility.r')

eval.code <- function(code, result_name=NULL) {
    message <- NULL
    cb_error <- function(e) {
        message <<- list(type='error', message=e$message)
    }
    cb_warning <- function(w) {
        message <<- list(type='warning', message=w$message)
    }

    tryCatch(eval(parse(text=code)), error=cb_error, warning=cb_warning)

    if(is.null(message)) {
        result <- list(success=TRUE)

        if(is.null(result_name) == FALSE) {
            if(exists(result_name) == FALSE) {
                result$found <- FALSE
            } else {
                result$found <- TRUE
                result$value <- mget(result_name)[[result_name]]
            }
        }

        return(result)
    } else {
        return(list(success=FALSE, message=message))
    }
}

lambda.ajax.handler <- function(env) {
    # Execute code and return the value for the variable of the given name.

    req <- Request$new(env)

    if(is.null(req$GET()$tab_name)) {
        # Parameters missing.
        res <- Response$new(status=500)
        write.text(res, "No 'tab_name' parameter provided.")
    } else if(is.null(req$GET()$result_name)) {
        # Parameters missing.
        res <- Response$new(status=500)
        write.text(res, "No 'result_name' parameter provided.")
    } else if(is.null(req$POST())) {
        # Body missing.
        res <- Response$new(status=500)
        write.text(res, "POST-data missing. Please provide code.")
    } else {
        # Execute code and return the result.

        res <- Response$new()

        result_name <- req$GET()$result_name
        code <- req$POST()[['code']]

        execution_result <- eval.code(code, result_name=result_name)
        execution_result$value = paste(capture.output(print(execution_result$value)), collapse='n')

        write.json(res, execution_result)
    }

    res$finish()
}

lambda.image.ajax.handler <- function(env) {
    # Execute code and return a base64-encoded image.

    req <- Request$new(env)

    if(is.null(req$GET()$tab_name)) {
        # Parameters missing.
        res <- Response$new(status=500)
        write.text(res, "No 'tab_name' parameter provided.")
    } else if(is.null(req$POST())) {
        # Body missing.
        res <- Response$new(status=500)
        write.text(res, "POST-data missing. Please provide code.")
    } else {
        # Execute code and return the result.

        # If we're returning an image, set the content-type and redirect 
        # the graphics device to a file.

        t <- tempfile()
        png(file=t)
        png(t, type="cairo", width=500, height=500)

        result_name <- req$GET()$result_name
        code <- req$POST()[['code']]

        execution_result <- eval.code(code, result_name=result_name)

        # If we're returning an image, stop the graphics device and return 
        # the data.

        dev.off()
        length <- file.info(t)$size

        if(length == 0) {
            res <- Response$new(status=500)
            res$header('Content-Type', 'text/plain')

            res$write("No image was generated. Your code is not complete.")
        } else {
            res <- Response$new()
            res$header('Content-Type', 'text/plain')

            data_uri <- dataURI(file=t, mime="image/png")
            res$write(data_uri)
        }
    }

    res$finish()
}

For reference, there is also another project called rapport that lets you produce HTML though not whole websites.

Programmatically-Driven Websites in Python (with HTTPHandler and SO_LINGER)

We’re going to write a website whose requests are handled by subroutines, and use Python’s logging.handlers.HTTPHandler class to send requests to it. Documentation and/or examples for the former are sparse, and I thought that an example of the latter connecting to the former would be useful.

Understanding the Webserver

Using the built-in BaseHTTPServer.BaseHTTPRequestHandler webserver, you can wire methods for individual verbs (GET, POST, PUT, etc..). Requests on verbs that aren’t handled will return a 501. Aside from having to write the headers at the top of the methods yourself and needing to read a specific quantity of data-bytes (or you’ll block forever), this is similar to every other web-framework that you’ve used.

The only things that you really need to know are the following instance variables:

  • headers: A dictionary-like collection of headers.
  • rfile: A file-like object that will contain your data (if you receive any).
  • wfile: A file-like object that will receive your response data (if you send any).

You’ll also need to deal with how to handle unsent data when you terminate. Even if you shutdown a socket, it may not be closed by the system immediately if data has already moved across it. This relates to why we inherit from SocketServer.TCPServer and change the one class variable. We’ll discuss this more, below.

import pprint
import urlparse

import BaseHTTPServer
import SocketServer

_PORT = 8000


class TCPServerReusableSocket(SocketServer.TCPServer):
    allow_reuse_address = True


class HookedHTTPRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
    def __send_headers(self):
        self.send_response(200)
        self.send_header("Content-type", 'text/plain')
        self.end_headers()

    def do_GET(self):
        self.__send_headers()

        print("Received GET request for: %s" % (self.path,))

        self.wfile.write("Test from GET!\n")

    def do_POST(self):
        self.__send_headers()

        print("Received POST request for: %s" % (self.path,))

        print('')
        print('Headers')
        print('=======')
        pprint.pprint(self.headers.items())
        print('=======')

        length = int(self.headers['content-length'])
        data_raw = self.rfile.read(length)
        data = urlparse.parse_qs(data_raw)

        print('')
        print('Received')
        print('========')
        pprint.pprint(data)
        print('========')
        print('')

        self.wfile.write("Test from POST!\n")

httpd = TCPServerReusableSocket(
            ('localhost', _PORT), 
            HookedHTTPRequestHandler)

httpd.serve_forever()

We expect that what we’ve done above is fairly obvious and does not need an explanation. You can implement your own log_request(code=None, size=None) method in HookedHTTPRequestHandler to change how the requests are printed, or to remove them.

To continue our remarks about buffered-data above, we add special handling so that we don’t encounter the “socket.error: [Errno 48] Address already in use” error if you kill the server and restart it a moment later. You may choose one of the following two strategies:

  1. Force the socket to close immediately.
  2. Allow the socket to already be open.

(1) should be fine for logging/etc. However, this might not be a great option if you’re handling actual data. (2) should probably be the preferred strategy, but you’ll also have to be sure to implement a PID file in your application so that you can be sure that only one instance is running (assuming that’s desired).

To implement (2), use SocketServer.TCPServer instead of our custom TCPServerReusableSocket. and, add the following imports:

import socket
import struct

Then, add the following after we define httpd but before we start the server, to tell the SO_LINGER socket option to kill all buffered data immediately:

l_onoff = 1                                                                                                                                                           
l_linger = 0                                                                                                                                                          

httpd.socket.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', l_onoff, l_linger))

You can test this using cURL, if you can’t wait to setup HTTPHandler:

$ curl -X POST -d abc=def http://localhost:8000
Test from POST!

The webserver process will show:

$ python http_server.py 
127.0.0.1 - - [19/Oct/2014 15:28:47] "POST / HTTP/1.1" 200 -
Received POST request for: /

Headers
=======
[('host', 'localhost:8000'),
 ('content-type', 'application/x-www-form-urlencoded'),
 ('content-length', '7'),
 ('accept', '*/*'),
 ('user-agent', 'curl/7.30.0')]
=======

Received
========
{'abc': ['def']}
========

Understanding logging.handlers.HTTPHandler

My own use-case for this was from a new MapReduce platform (JobX), and I wanted to potentially emit messages to another system if certain tasks were accomplished. I used the built-in webserver that we invoked, above, to see these messages from the development system.

import logging
import logging.handlers

logger = logging.getLogger(__name__)

_TARGET = 'localhost:8000'
_PATH = '/'
_VERB = 'post'

sh = logging.handlers.HTTPHandler(_TARGET, _PATH, method=_VERB)

logger.addHandler(sh)
logger.setLevel(logging.DEBUG)

logger.debug("Test message.")

This will be shown by the webserver:

127.0.0.1 - - [19/Oct/2014 15:45:02] "POST / HTTP/1.0" 200 -
Received POST request for: /

Headers
=======
[('host', 'localhost'),
 ('content-type', 'application/x-www-form-urlencoded'),
 ('content-length', '368')]
=======

Received
========
{'args': ['()'],
 'created': ['1413747902.18'],
 'exc_info': ['None'],
 'exc_text': ['None'],
 'filename': ['push_socket_log.py'],
 'funcName': ['<module>'],
 'levelname': ['DEBUG'],
 'levelno': ['10'],
 'lineno': ['17'],
 'module': ['push_socket_log'],
 'msecs': ['181.387901306'],
 'msg': ['Test message.'],
 'name': ['__main__'],
 'pathname': ['./push_socket_log.py'],
 'process': ['65486'],
 'processName': ['MainProcess'],
 'relativeCreated': ['12.6709938049'],
 'thread': ['140735262810896'],
 'threadName': ['MainThread']}
========

Note that each field is a list with one item. If you want the output to look a little nicer, alter the above to add the following to the top of the module:

import datetime

_FMT_DATETIME_STD = '%Y-%m-%d %H:%M:%S'

Then, add the __print_entry method:

    def __print_entry(self, entry):
        created_epoch = float(entry['created'][0])
        when_dt = datetime.datetime.fromtimestamp(created_epoch)
        timestamp_phrase = when_dt.strftime(_FMT_DATETIME_STD)
        where_name = entry['name'][0][:40]
        level_name = entry['levelname'][0]

        message = entry['msg'][0]

        print('%s  %40s  %9s  %s' % 
              (timestamp_phrase, where_name, level_name, message))

Then, change the last part of do_POST:

    def do_POST(self):
        self.__send_headers()

        length = int(self.headers['content-length'])
        data_raw = self.rfile.read(length)
        data = urlparse.parse_qs(data_raw)

        self.__print_entry(data)

The output will now look like:

2014-10-19 16:16:00       MR_HANDLER.HTTP.map_obfuscation_one       INFO  Socket message!
2014-10-19 16:16:00                           MR_HANDLER.HTTP      ERROR  Mapper invocation [789b7ca7fcb6cede9ae5557b2121d392469dfc26] under request [85394d5bdb34a09ffa045776cc69d1d4cd17d657] failed. HANDLER=[map_obfuscation_one]

There is one weird thing about HTTPHandler, and it’s this: Many/all of the fields will be stringified in order to serialized them. If you call the logger like logging.debug('Received arguments: [%s] [%s]', arg1, arg2), then we’ll receive Received argument: [%s] in the msg field (or rather the msg list), and the arguments as a stringified tuple like (u'abc', u'def'). To avoid dealing with this, I’ll send messages into a function that’s in charge of the notifications, and produce the final string before I send it to the logger.

The same thing applies to tracebacks. If you log an exception, you’ll only get this:

 'exc_info': ['(<type 'exceptions.NameError'>, NameError("global name 'client_id' is not defined",), <traceback object at 0x110c92878>)'],
 'exc_text': ['None'],

Again, you’ll have to concatenate this into the log-message by some intermediate function (so that the primary application logic doesn’t have to know about it, but so that you’ll still get this information).