Compiling your C/C++ and Obj C/C++ Code with “clang” (LLVM)

clang is a recent addition to the landscape of development within the C family. Though GCC is a household name (well, in my household), clang is built on LLVM, a modular and versatile compiler platform. In fact, because it’s built on LLVM, clang can emit a readable form of LLVM byte-code:

Source:

#include <stdio.h>

int main()
{
    printf("Testing.\n");

    return 0;
}

Command:

clang -emit-llvm -S main.c -o -

Output:

; ModuleID = 'main.c'
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32-S128"
target triple = "i386-pc-linux-gnu"

@.str = private unnamed_addr constant [10 x i8] c"Testing.\0A\00", align 1

define i32 @main() nounwind {
  %1 = alloca i32, align 4
  store i32 0, i32* %1
  %2 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([10 x i8]* @.str, i32 0, i32 0))
  ret i32 0
}

declare i32 @printf(i8*, ...)

Benefits of using clang versus gcc include the following:

  • Considerably better error messages (a very popular feature).
  • Considerable speed improvements and resource usage, across the board, per clang (http://clang.llvm.org/features.html#performance). This might not be the case, though, per the average discussion on Stack Overflow.
  • Its ASTs and code are allegedly simpler and more straight-forward for those individuals that would like to study them.
  • It’s a single parser for the C family of languages (including Objective C/C++, but does not include C#), while also promoting the ability to be further extended.
  • It’s built as an API, so it can be bound by other tools.

clang is not nearly as mature as GCC, though I haven’t seen (as a casual observer) much negative feedback due to this.

To do a two-part build like you would with GCC, the basic parameters are similar, though there are six-pages of parameters available:

clang -emit-llvm -o foo.bc -c foo.c
clang -o foo foo.bc

It’s important to mention that clang comes bundled with a static analyzer. This means that checking your code for bugs at a deeper level than the compiler is concerned with is that much more accessible. For example, if we adjust the code above to do an allocation, but neglect to free it:

#include <stdio.h>
#include <stdlib.h>

int main()
{
    printf("Testing.\n");

    void *new = malloc((size_t)2000);

    return 0;
}

We can build, while also telling clang to invoke the static-analyzer:

clang --analyze main.c -o main
main.c:8:11: warning: Value stored to 'new' during its initialization is never read
    void *new = malloc((size_t)2000);
          ^~~   ~~~~~~~~~~~~~~~~~~~~
main.c:10:5: warning: Memory is never released; potential leak of memory pointed to by 'new'
    return 0;
    ^
2 warnings generated.

In truth, I don’t know how clang’s static-analyzer compares with Valgrind, the standard, heavyweight, open-source static-analyzer. Though Valgrind can actually run your program and watch to make sure that your allocations are managed properly, I’m not yet sure if clang’s static-analyzer can do the same.

The Highway Guide to Writing gedit Plugins

This is going to be a very quick run-through of writing a “gedit” plugin using Python. This method allows you to rapidly produce plugins that require a minimum of code. Depending on what you want to do, only a few lines might be required. Look at a simple plugin such as FocusAutoSave, for an example.

Overview

A gedit plugin is comprised of extensions, where each extension represents functionality that you’re adding at the application-level, window-level, or view-level (where “view” often refers to a particular document).

A plugin has access to the full might of PyGTK. Regardless of which type of extension(s) you need to implement, each base class requires that a do_activate() and do_deactivate() method be implemented. It is from these methods that you either a) configure signals to be handled, or b) schedule a timer to invoke a callback.

To make gedit see your plugin, you have to store two files in ~/.local/share/gedit/plugins: abc.py, and abc.plugin . The latter file is an INI-type file that tells gedit about your plugin, and how to import it. Note that the plugins/ directory must have a __init__.py file in it (as all Python package directories must). Though the “Module” value in the plugin-info file must agree with the name of your Python module, the actual class names within it can have arbitrary names (they automatically wire themselves into GTK). To make an installer, just use “invoke“, make, etc..

Example

Plugin file:

[Plugin]
Loader=python
Module=dustin
IAge=3
Name=Dustin's Plugin
Description=A Python plugin example
Authors=Dustin Oprea 
Copyright=Copyright © 2013 Dustin Oprea 
Website=http://www.myplugin.com

Module file:

The practical purpose of this code is questionable. It’s really just provided as an example of a few different things. Note that the print()’s will be displayed in the console, which means that you should start gedit from the console if you wish to see them.

from gi.repository import GObject, Gedit, Gio
 
SETTINGS_KEY = "org.gnome.gedit.preferences.editor"
gedit_settings = Gio.Settings(SETTINGS_KEY)
 
class DustinPluginWindowExtension(GObject.Object, Gedit.WindowActivatable):
    "Our extension to the window's behavior."
 
    __gtype_name__ = "DustinPluginWindowExtension"
    window = GObject.property(type=Gedit.Window)
 
    def __init__(self):
        GObject.Object.__init__(self)
  
    def do_activate(self):
        "Called when the when is loaded."
 
        # To get a list of all unsaved documents:
        # self.window.get_unsaved_documents()
 
        # To list all available config options configured in the "preferences" window:
        # gedit_settings.keys()
 
        # The get_boolean() call seems like it can be used to either get a boolean 
        # value, or determine if a configurable is even present (for backwards-
        # compatibility).
        if gedit_settings.get_boolean("auto-save") is True:
            print(gedit_settings.get_uint("auto-save-interval"))
 
        # Schedule a callback to trigger in five seconds.
        self.timer_id = GObject.timeout_add_seconds(5, self.__window_callback)
 
    def __window_callback(self):
 
        print("Trigger.")
         
        # Return True to automatically schedule again.
        return True
 
    def do_deactivate(self):
        "We're being unloaded. Clean-up."
 
        # Clean-up our timer.
        GObject.source_remove(self.timer_id)
        self.timer_id = None
 
class DustinPluginViewExtension(GObject.Object, Gedit.ViewActivatable):
    "Our extension to the document's behavior."
 
    __gtype_name__ = "DustinPluginViewExtension"
    view = GObject.property(type=Gedit.View)
 
    def __init__(self):
        GObject.Object.__init__(self)
 
    def do_activate(self):
        # Get the document.
        self.__doc = self.view.get_buffer()
 
        # To get the name of the document as shown in the tab:
        # self.__doc.get_short_name_for_display()
 
        # To insert something at the current cursor position.
        self.__doc.insert_at_cursor("Hello World.\n")
 
        # Get the text of the document. This works using start/stop iterators 
        # (pointers to the left and right sides of the content to grab).
        # text = self.__doc.get_text(self.__doc.get_start_iter(), 
        #                            self.__doc.get_end_iter(), True)
 
        # Wire a handler to the "saved" signal.
        self.__sig_saved = self.__doc.connect("saved", self.__on_saved)
 
    def do_deactivate(self):
        self.__doc.disconnect(self.__sig_saved)
        del self.__sig_saved
 
    def __on_saved(self, widget, *args, **kwargs):
        print("Saved.")

To enable debug logging, just set the log-level at the top of your Python module. Logging should be printed out to the console.

When I was first looking at writing a gedit plugin, I had no direction for 1) how to retrieve the text, 2) how to properly schedule timers (which is a general GTK task), and 3) how to get values from gedit’s configuration. Hopefully this helps.

Additional resources that might be of some help:

Writing Plugins for gedit 3 with Python

Python Plugin How To for gedit 3

gedit Reference Manual (great reference for signals)

Vectors in C

I’ve implemented a vector-type called “list” in C. It uses contiguous blocks of memory and grows in an identical way as C++’s STL vectors.

This is the example that’s bundled with it:

#include <stdio.h>

#include "list.h"

static bool enumerate_cb(list_t *list, 
                         uint32_t index, 
                         void *value, 
                         void *context)
{
    char *text = (char *)value;
    printf("Item (%" PRIu8 "): [%s]\n", index, text);

    // Return false to stop enumeration (enumeration will return successful).
    return true;
}

int main()
{
    list_t list;
    const uint32_t entry_width = 20;
    
    if(list_init(&list, entry_width) != 0)
    {
        printf("Could not initialize list.\n");
        return 1;
    }

    char text[20];
    const uint8_t count = 10;
    uint8_t i = 0;
    while(i < count)
    {
        snprintf(text, 20, "Test: %" PRIu8, i);
        printf("Pushing: %s\n", text);

        if(list_push(&list, text) != 0)
        {
            printf("Could not push item.\n");
            return 2;
        }
    
        i++;
    }

    printf("\n");

    // NOTE: For efficiency, this is a reference to within the list. If you
    //       want a copy, make a copy. If you want to make sure this is thread-
    //       safe, use a lock.
    void *retrieved;
    if((retrieved = list_get(&list, 5)) == NULL)
    {
        printf("Could not retrieve item.\n");
        return 3;
    }

    printf("Retrieved: %s\n", (char *)retrieved);
    printf("Removing.\n");

    if(list_remove(&list, 5) != 0)
    {
        printf("Could not remove item.\n");
        return 4;
    }

    printf("\n");
    printf("Enumerating:\n");

    if(list_enumerate(&list, enumerate_cb, NULL) != 0)
    {
        printf("Could not enumerate list.\n");
        return 5;
    }

    if(list_destroy(&list) != 0)
    {
        printf("Could not destroy list.\n");
        return 6;
    }

    return 0;
}

Output:

$ ./example 
Pushing: Test: 0
Pushing: Test: 1
Pushing: Test: 2
Pushing: Test: 3
Pushing: Test: 4
Pushing: Test: 5
Pushing: Test: 6
Pushing: Test: 7
Pushing: Test: 8
Pushing: Test: 9

Retrieved: Test: 5
Removing.

Enumerating:
Item (0): [Test: 0]
Item (1): [Test: 1]
Item (2): [Test: 2]
Item (3): [Test: 3]
Item (4): [Test: 4]
Item (5): [Test: 6]
Item (6): [Test: 7]
Item (7): [Test: 8]
Item (8): [Test: 9]

CMake “Hello World” Tutorial

The make utility is a slightly-dated approach to building your projects. Now, I wouldn’t have a problem with it if it didn’t require me to use tabs (all of my editors are configured to expand tabs). However, these days, there are alternatives (qmake, cmake, ant, etc..).

Personally, I like CMake’s output.

Given a source-file named “source.c” and a target executable name “final_app”, consider the following for the CMakeLists.txt file that will be deposited in the root of your source path. We’ll also check-for and link a library. We’ll use Pthread in this example:

project(app_name C)

cmake_minimum_required(VERSION 2.6.0)

set(CMAKE_THREAD_PREFER_PTHREADS ON)
find_package(Threads)

add_executable(final_app source.c)
target_link_libraries (final_app pthread)

Create subdirectory “build”, change into it, and then run:

cmake ..

The output:

-- The C compiler identification is GNU 4.7.3
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Looking for include file pthread.h
-- Looking for include file pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Configuring done
-- Generating done
-- Build files have been written to: /home/xx/yy/app_name/build

Once this is done, run:

make

The output:

Scanning dependencies of target final_app
[100%] Building C object CMakeFiles/final_app.dir/source.c.o
Linking C executable final_app
[100%] Built target final_app

Obama and Github

Obama has found a new channel of controversy. As of the last week or two, the big news was that the government had been shutdown as an inevitable consequence of multiple trillions of dollars of expenses and pissing everyone off by underhandedly getting His healthcare bill to go through (hence forcing the Republicans to reciprocate in the only way they can). As of this week, the new news is that the http://healthcare.gov website has gone live, and not only has the lower-quality experience and colossal number of glitches pissed everyone off, but so has the thematically-consistent price tag.

In 2011, CGI Federal was contracted to create the website for $93-million. Apparently, the final price tag is upwards of $634-million. I don’t know whether people are angrier for what it costs, or that they could have hired a couple of college kids to do it for a few grand, and at least contributed to the educational sector.

If all of that wasn’t enough, they made some code available via Github that seems to only be three commits old. I don’t know what to make of that, but it makes me very uncomfortable.

CGI Federal on Healthcare.gov: No comment

Healthcare.gov is a Technological Disaster

Glitchy Healthcare.gov cost taxpayers more than $634 million to build

$634 million for Healthcare.gov website

We paid over $500 million for the Obamacare sites and all we got was this lousy 404

What went wrong with healthcare.gov: The front end and back end never talked

A Practitioner’s Overview to SSL, and Viewing the Certificate Chain from Python

The fundamental principle of SSL is this: a client connects to an SSL-enabled server, and the server returns enough information to a) encrypt the communication channel, and b) authenticate itself enough that you can prove that they’re the intended system. (b) is performed by the server providing both its certificate information as well as the CA (certificate authority) that produced it. This latter item is the area in which you might occasionally encounter problems when you have a client that complains about not being able to verify a hostname. This is where some people recommend just passing a flag to skip the check, thus completely compromising the integrity of SSL.

Depending on how reputable your CA is, they might provide additional CA authorities (referred to as IAs, or “intermediate authorities”), such that these authorities form a “certificate chain” that sufficiently proves that all of the authorities that lend their credibility to your certificate can be traced back to one very well-known authority (the “root CA”) that any client (browser, tool, or OS) would know about.

In special or proprietary situations, you might have to physically go into the configuration for your browser/tool/OS, and add a new root CA that the client did not previously know about. Otherwise, the client might forbid access to your website on that system. Unless your dealing with some hightened security situation regarding the intranet at your place of business, this is rarely necessary.

Sometimes, it’s necessary to physically inspect what CAs are being reported by the server, for as simple a reason as just verifying that you’ve configured it correctly. Until recently, and even, arguably, at the present time, Python has been unable to provide this information, as it comes prepackaged with a natively-compiled SSL module and the underlying mechanics simply don’t expose these calls. If you want this information, you’d be forced to just invoke OpenSSL’s “s_client” subcommand on the command-line.

Just recently, a patch was released that exposes this functionality. As a warning, since this hasn’t yet been introduced into the source-tree, its implementation might change by the time it has.

This is some cut-up and reassembled code, to show it in action:

import socket

from ssl import wrap_socket, CERT_NONE, PROTOCOL_SSLv23
from ssl import SSLContext  # Modern SSL?
from ssl import HAS_SNI  # Has SNI?

from pprint import pprint

def ssl_wrap_socket(sock, keyfile=None, certfile=None, cert_reqs=None,
                    ca_certs=None, server_hostname=None,
                    ssl_version=None):
    context = SSLContext(ssl_version)
    context.verify_mode = cert_reqs

    if ca_certs:
        try:
            context.load_verify_locations(ca_certs)
        # Py32 raises IOError
        # Py33 raises FileNotFoundError
        except Exception as e:  # Reraise as SSLError
            raise SSLError(e)

    if certfile:
        # FIXME: This block needs a test.
        context.load_cert_chain(certfile, keyfile)

    if HAS_SNI:  # Platform-specific: OpenSSL with enabled SNI
        return (context, context.wrap_socket(sock, server_hostname=server_hostname))

    return (context, context.wrap_socket(sock))

hostname = 'www.google.com'

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((hostname, 443))

(context, ssl_socket) = ssl_wrap_socket(s,
                                       ssl_version=2, 
                                       cert_reqs=2, 
                                       ca_certs='/usr/local/lib/python3.3/dist-packages/requests/cacert.pem', 
                                       server_hostname=hostname)

pprint(ssl_socket.getpeercertchain())

s.close()

The output is a tuple of dictionaries:

({'issuer': ((('countryName', 'US'),),
             (('organizationName', 'Google Inc'),),
             (('commonName', 'Google Internet Authority G2'),)),
  'notAfter': 'Sep 11 11:04:38 2014 GMT',
  'notBefore': 'Sep 11 11:04:38 2013 GMT',
  'serialNumber': '50C71E48BCC50676',
  'subject': ((('countryName', 'US'),),
              (('stateOrProvinceName', 'California'),),
              (('localityName', 'Mountain View'),),
              (('organizationName', 'Google Inc'),),
              (('commonName', 'www.google.com'),)),
  'subjectAltName': (('DNS', 'www.google.com'),),
  'version': 3},
 {'issuer': ((('countryName', 'US'),),
             (('organizationName', 'GeoTrust Inc.'),),
             (('commonName', 'GeoTrust Global CA'),)),
  'notAfter': 'Apr  4 15:15:55 2015 GMT',
  'notBefore': 'Apr  5 15:15:55 2013 GMT',
  'serialNumber': '023A69',
  'subject': ((('countryName', 'US'),),
              (('organizationName', 'Google Inc'),),
              (('commonName', 'Google Internet Authority G2'),)),
  'version': 3},
 {'issuer': ((('countryName', 'US'),),
             (('organizationName', 'Equifax'),),
             (('organizationalUnitName',
               'Equifax Secure Certificate Authority'),)),
  'notAfter': 'Aug 21 04:00:00 2018 GMT',
  'notBefore': 'May 21 04:00:00 2002 GMT',
  'serialNumber': '12BBE6',
  'subject': ((('countryName', 'US'),),
              (('organizationName', 'GeoTrust Inc.'),),
              (('commonName', 'GeoTrust Global CA'),)),
  'version': 3},
 {'issuer': ((('countryName', 'US'),),
             (('organizationName', 'Equifax'),),
             (('organizationalUnitName',
               'Equifax Secure Certificate Authority'),)),
  'notAfter': 'Aug 22 16:41:51 2018 GMT',
  'notBefore': 'Aug 22 16:41:51 1998 GMT',
  'serialNumber': '35DEF4CF',
  'subject': ((('countryName', 'US'),),
              (('organizationName', 'Equifax'),),
              (('organizationalUnitName',
                'Equifax Secure Certificate Authority'),)),
  'version': 3})

The topmost item is the most specific and describes the certificate for the domain itself, whereas the bottommost one is the least specific, and describes the highest, most well known authority involved in the operation (in this case, Equifax).

“Package Backup” for Free Linux Backups

This post is both an announcement of a new service called Package Backup, and a brief introduction to it.

Backing-Up a Modern Linux System

When it comes to backing-up a Linux system, backups are a piece of cake, as long as you follow a couple of standard rules. As a general rule of thumb, there are three main areas of concern for backups on a modern Linux system:

  1. Packages installed via dpkg, pacman, etc..
  2. Configuration in /etc
  3. User home directories

I’m omitting the web-server directories (usually in /var/www), custom-built applications (usually a trivial concern), and the root home directory (which shouldn’t have anything substantial in it).

Theoretically, if you backup /etc, the home directories, and a list of the installed packages, you can recover immediately after a catastrophic event. This can be further simplified by moving your home directories to some central location on a RAID volume, on another system, and mounting it locally via NFS.

How do you restore all previously-installed applications? Using Debian/Ubuntu:

dpkg --set-selections < selections.txt
dselect update
apt-get dselect-upgrade

It’s even simpler on Arch:

pacman -S $(< selections.txt) 

It’s awesome, and the data is very light.

As for the practical aspects of backing-up a personal system, /etc will only occasionally change, so the occasional manual backup will be sufficient. For personal files, all of my code is stored in remote source-control, and all of my data is stored on a central server. So, my home directory is [arguably] disposable.

Usually, the only important thing to me is that, when my system crashes or I need to build another machine, I have a recent list of packages to install from, after I install the OS. This means that I’ll forego tons of frustration having to encounter and install missing applications and tools for the next month or two. For companies, this means that you can install the OS and get a system with identical applications/tools installed in minutes.

To get the package-list backups going, you’ll write a dump script, schedule it in Cron, and schedule a job to sync those lists to some central, safe backup location.

Though this is the ideal solution, these are the problems you might encounter:

  • You’ll have to migrate scripts and schedule jobs on every machine you want backed-up.
  • You’ll have to kill old package lists every couple of months.
  • Since the need for these lists rarely come up, it’s fairly easy to forget they’re there, or to forget to configure the backup on new systems that you suddenly have to rebuild.
  • When you want to retrieve a list, you’ll have to browse through the lists on the remote system, which may not be connectable just after you built a new system.

Package Backup

This is the purpose for Package Backup. It’s a centralize service that puts all of your package-lists in the same place, and hides them behind a nice little DatePicker calendar. You can monitor all of the systems that are pushing lists, can see the last time that they pushed, and can recover your lists by either downloading one from the webpage, or using the console tool to pull one. At its simplest, you can run a command like “pb_getlist_dpkg -” from the console, and the most recent list is printed to standard output. It plugs right in to the recovery commands, above.

An additional bonus of using this service is that you can see when a list has changed since the previous list that was pushed. You can also monitor the OS versions running on each of your systems.

Future Features

Package Backup will soon be adding SFTP functionality, for convenience. You’ll be able to use pbclient to automatically sync a set of physical files to a remote system. The functionality will be similar to scp, rsync, etc.., except that the same solution manages everything while still allowing you to specify paths, certificates, hostnames, etc..

For more information, see the Package Backup homepage.

System info:

System Listing Example

Pushes for system:

Lists Example

Using HMACs instead of Plain Hashes for Security

We all know that everyone should be storing passwords as cryptographic hashes (SHA256, for example), these days. The Internet has now been around too long for people to not know this. In addition, engineers that write communications code will often employ “message authentication codes” (“MACs”). These are hashes calculated over data that is passed back and forth from the remote system, and can be used to verify that various data hasn’t been modified (preventing “man in the middle” attacks).

However, it is much lesser known that hashes used in the manner that they often are widely vulnerable to attack. For example (in Python):

from hashlib import sha256

salt = b"3qk4yfbgql"
message = b"some data"
salted_data = salt + message
digest_ = sha256(salted_data).hexdigest()

This is no good. Our hash functions are referred to as “iterative hash functions”. In other words, they calculate the hash by iterating, one chunk at a time, through the cleartext. They suffer from what Bruce Schneier refers to as the “length extension bug”. In other words, it’s trivial for a third-party to append data to the message, even though it’s salted, and calculate another, completely valid, MAC. If you move the salt to the end of the message, there are different problems.

Enter the HMAC (“keyed hashed message authentication code”). An HMAC tool will take the salt, data, and a hash function, and do this: hash(salt + hash(salt + message))

import hmac
from hashlib import sha256

salt = b"3qk4yfbgql"
message = b"some data"
hmac_ = hmac.new(salt, message, sha256)
digest_ = hmac_.hexdigest()

Problem solved (for now).

Scheduling Cron Jobs From Python

Inevitably, you’ll have to write a post-install routine in Python that has to schedule a Cron job. For this, python-crontab is a great solution.

Create the job using the following. This will not commit the changes, yet.

from crontab import CronTab    
cron = CronTab()
job = cron.new(command='/usr/bin/tool')

There are at least three ways to configure the schedule:

Using method calls:

job.minute.during(5,50).every(5)
job.hour.every(4)

Using an existing crontab row:

job.parse("* * * * * /usr/bin/tool")

Obviously in the last case, you can omit the command when we created the job.

Using an alias:

job.special = '@daily'

You can commit your changes by then calling:

job.write()

This is a very limited introduction that elaborates on how to use aliases and parsing to schedule, since the documentation doesn’t go into it, but omits other features such as:

  • indicating the user to schedule for (the default is “root”)
  • finding existing jobs by command or comment
  • enumerating jobs
  • enumerating the schedule that a particular job will follow
  • enumerating the logs for a particular job
  • removing jobs
  • editing a crontab file directly, or building one in-memory
  • enabling/disabling jobs

The official documentation (follow the link, above) describes all of the features on this list.