Compiling your C/C++ and Obj C/C++ Code with “clang” (LLVM)

clang is a recent addition to the landscape of development within the C family. Though GCC is a household name (well, in my household), clang is built on LLVM, a modular and versatile compiler platform. In fact, because it’s built on LLVM, clang can emit a readable form of LLVM byte-code:

Source:

#include <stdio.h>

int main()
{
    printf("Testing.\n");

    return 0;
}

Command:

clang -emit-llvm -S main.c -o -

Output:

; ModuleID = 'main.c'
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32-S128"
target triple = "i386-pc-linux-gnu"

@.str = private unnamed_addr constant [10 x i8] c"Testing.\0A\00", align 1

define i32 @main() nounwind {
  %1 = alloca i32, align 4
  store i32 0, i32* %1
  %2 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([10 x i8]* @.str, i32 0, i32 0))
  ret i32 0
}

declare i32 @printf(i8*, ...)

Benefits of using clang versus gcc include the following:

  • Considerably better error messages (a very popular feature).
  • Considerable speed improvements and resource usage, across the board, per clang (http://clang.llvm.org/features.html#performance). This might not be the case, though, per the average discussion on Stack Overflow.
  • Its ASTs and code are allegedly simpler and more straight-forward for those individuals that would like to study them.
  • It’s a single parser for the C family of languages (including Objective C/C++, but does not include C#), while also promoting the ability to be further extended.
  • It’s built as an API, so it can be bound by other tools.

clang is not nearly as mature as GCC, though I haven’t seen (as a casual observer) much negative feedback due to this.

To do a two-part build like you would with GCC, the basic parameters are similar, though there are six-pages of parameters available:

clang -emit-llvm -o foo.bc -c foo.c
clang -o foo foo.bc

It’s important to mention that clang comes bundled with a static analyzer. This means that checking your code for bugs at a deeper level than the compiler is concerned with is that much more accessible. For example, if we adjust the code above to do an allocation, but neglect to free it:

#include <stdio.h>
#include <stdlib.h>

int main()
{
    printf("Testing.\n");

    void *new = malloc((size_t)2000);

    return 0;
}

We can build, while also telling clang to invoke the static-analyzer:

clang --analyze main.c -o main
main.c:8:11: warning: Value stored to 'new' during its initialization is never read
    void *new = malloc((size_t)2000);
          ^~~   ~~~~~~~~~~~~~~~~~~~~
main.c:10:5: warning: Memory is never released; potential leak of memory pointed to by 'new'
    return 0;
    ^
2 warnings generated.

In truth, I don’t know how clang’s static-analyzer compares with Valgrind, the standard, heavyweight, open-source static-analyzer. Though Valgrind can actually run your program and watch to make sure that your allocations are managed properly, I’m not yet sure if clang’s static-analyzer can do the same.

The Highway Guide to Writing gedit Plugins

This is going to be a very quick run-through of writing a “gedit” plugin using Python. This method allows you to rapidly produce plugins that require a minimum of code. Depending on what you want to do, only a few lines might be required. Look at a simple plugin such as FocusAutoSave, for an example.

Overview

A gedit plugin is comprised of extensions, where each extension represents functionality that you’re adding at the application-level, window-level, or view-level (where “view” often refers to a particular document).

A plugin has access to the full might of PyGTK. Regardless of which type of extension(s) you need to implement, each base class requires that a do_activate() and do_deactivate() method be implemented. It is from these methods that you either a) configure signals to be handled, or b) schedule a timer to invoke a callback.

To make gedit see your plugin, you have to store two files in ~/.local/share/gedit/plugins: abc.py, and abc.plugin . The latter file is an INI-type file that tells gedit about your plugin, and how to import it. Note that the plugins/ directory must have a __init__.py file in it (as all Python package directories must). Though the “Module” value in the plugin-info file must agree with the name of your Python module, the actual class names within it can have arbitrary names (they automatically wire themselves into GTK). To make an installer, just use “invoke“, make, etc..

Example

Plugin file:

[Plugin]
Loader=python
Module=dustin
IAge=3
Name=Dustin's Plugin
Description=A Python plugin example
Authors=Dustin Oprea 
Copyright=Copyright © 2013 Dustin Oprea 
Website=http://www.myplugin.com

Module file:

The practical purpose of this code is questionable. It’s really just provided as an example of a few different things. Note that the print()’s will be displayed in the console, which means that you should start gedit from the console if you wish to see them.

from gi.repository import GObject, Gedit, Gio
 
SETTINGS_KEY = "org.gnome.gedit.preferences.editor"
gedit_settings = Gio.Settings(SETTINGS_KEY)
 
class DustinPluginWindowExtension(GObject.Object, Gedit.WindowActivatable):
    "Our extension to the window's behavior."
 
    __gtype_name__ = "DustinPluginWindowExtension"
    window = GObject.property(type=Gedit.Window)
 
    def __init__(self):
        GObject.Object.__init__(self)
  
    def do_activate(self):
        "Called when the when is loaded."
 
        # To get a list of all unsaved documents:
        # self.window.get_unsaved_documents()
 
        # To list all available config options configured in the "preferences" window:
        # gedit_settings.keys()
 
        # The get_boolean() call seems like it can be used to either get a boolean 
        # value, or determine if a configurable is even present (for backwards-
        # compatibility).
        if gedit_settings.get_boolean("auto-save") is True:
            print(gedit_settings.get_uint("auto-save-interval"))
 
        # Schedule a callback to trigger in five seconds.
        self.timer_id = GObject.timeout_add_seconds(5, self.__window_callback)
 
    def __window_callback(self):
 
        print("Trigger.")
         
        # Return True to automatically schedule again.
        return True
 
    def do_deactivate(self):
        "We're being unloaded. Clean-up."
 
        # Clean-up our timer.
        GObject.source_remove(self.timer_id)
        self.timer_id = None
 
class DustinPluginViewExtension(GObject.Object, Gedit.ViewActivatable):
    "Our extension to the document's behavior."
 
    __gtype_name__ = "DustinPluginViewExtension"
    view = GObject.property(type=Gedit.View)
 
    def __init__(self):
        GObject.Object.__init__(self)
 
    def do_activate(self):
        # Get the document.
        self.__doc = self.view.get_buffer()
 
        # To get the name of the document as shown in the tab:
        # self.__doc.get_short_name_for_display()
 
        # To insert something at the current cursor position.
        self.__doc.insert_at_cursor("Hello World.\n")
 
        # Get the text of the document. This works using start/stop iterators 
        # (pointers to the left and right sides of the content to grab).
        # text = self.__doc.get_text(self.__doc.get_start_iter(), 
        #                            self.__doc.get_end_iter(), True)
 
        # Wire a handler to the "saved" signal.
        self.__sig_saved = self.__doc.connect("saved", self.__on_saved)
 
    def do_deactivate(self):
        self.__doc.disconnect(self.__sig_saved)
        del self.__sig_saved
 
    def __on_saved(self, widget, *args, **kwargs):
        print("Saved.")

To enable debug logging, just set the log-level at the top of your Python module. Logging should be printed out to the console.

When I was first looking at writing a gedit plugin, I had no direction for 1) how to retrieve the text, 2) how to properly schedule timers (which is a general GTK task), and 3) how to get values from gedit’s configuration. Hopefully this helps.

Additional resources that might be of some help:

Writing Plugins for gedit 3 with Python

Python Plugin How To for gedit 3

gedit Reference Manual (great reference for signals)

Vectors in C

I’ve implemented a vector-type called “list” in C. It uses contiguous blocks of memory and grows in an identical way as C++’s STL vectors.

This is the example that’s bundled with it:

#include <stdio.h>

#include "list.h"

static bool enumerate_cb(list_t *list, 
                         uint32_t index, 
                         void *value, 
                         void *context)
{
    char *text = (char *)value;
    printf("Item (%" PRIu8 "): [%s]\n", index, text);

    // Return false to stop enumeration (enumeration will return successful).
    return true;
}

int main()
{
    list_t list;
    const uint32_t entry_width = 20;
    
    if(list_init(&list, entry_width) != 0)
    {
        printf("Could not initialize list.\n");
        return 1;
    }

    char text[20];
    const uint8_t count = 10;
    uint8_t i = 0;
    while(i < count)
    {
        snprintf(text, 20, "Test: %" PRIu8, i);
        printf("Pushing: %s\n", text);

        if(list_push(&list, text) != 0)
        {
            printf("Could not push item.\n");
            return 2;
        }
    
        i++;
    }

    printf("\n");

    // NOTE: For efficiency, this is a reference to within the list. If you
    //       want a copy, make a copy. If you want to make sure this is thread-
    //       safe, use a lock.
    void *retrieved;
    if((retrieved = list_get(&list, 5)) == NULL)
    {
        printf("Could not retrieve item.\n");
        return 3;
    }

    printf("Retrieved: %s\n", (char *)retrieved);
    printf("Removing.\n");

    if(list_remove(&list, 5) != 0)
    {
        printf("Could not remove item.\n");
        return 4;
    }

    printf("\n");
    printf("Enumerating:\n");

    if(list_enumerate(&list, enumerate_cb, NULL) != 0)
    {
        printf("Could not enumerate list.\n");
        return 5;
    }

    if(list_destroy(&list) != 0)
    {
        printf("Could not destroy list.\n");
        return 6;
    }

    return 0;
}

Output:

$ ./example 
Pushing: Test: 0
Pushing: Test: 1
Pushing: Test: 2
Pushing: Test: 3
Pushing: Test: 4
Pushing: Test: 5
Pushing: Test: 6
Pushing: Test: 7
Pushing: Test: 8
Pushing: Test: 9

Retrieved: Test: 5
Removing.

Enumerating:
Item (0): [Test: 0]
Item (1): [Test: 1]
Item (2): [Test: 2]
Item (3): [Test: 3]
Item (4): [Test: 4]
Item (5): [Test: 6]
Item (6): [Test: 7]
Item (7): [Test: 8]
Item (8): [Test: 9]

CMake “Hello World” Tutorial

The make utility is a slightly-dated approach to building your projects. Now, I wouldn’t have a problem with it if it didn’t require me to use tabs (all of my editors are configured to expand tabs). However, these days, there are alternatives (qmake, cmake, ant, etc..).

Personally, I like CMake’s output.

Given a source-file named “source.c” and a target executable name “final_app”, consider the following for the CMakeLists.txt file that will be deposited in the root of your source path. We’ll also check-for and link a library. We’ll use Pthread in this example:

project(app_name C)

cmake_minimum_required(VERSION 2.6.0)

set(CMAKE_THREAD_PREFER_PTHREADS ON)
find_package(Threads)

add_executable(final_app source.c)
target_link_libraries (final_app pthread)

Create subdirectory “build”, change into it, and then run:

cmake ..

The output:

-- The C compiler identification is GNU 4.7.3
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Looking for include file pthread.h
-- Looking for include file pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Configuring done
-- Generating done
-- Build files have been written to: /home/xx/yy/app_name/build

Once this is done, run:

make

The output:

Scanning dependencies of target final_app
[100%] Building C object CMakeFiles/final_app.dir/source.c.o
Linking C executable final_app
[100%] Built target final_app

Obama and Github

Obama has found a new channel of controversy. As of the last week or two, the big news was that the government had been shutdown as an inevitable consequence of multiple trillions of dollars of expenses and pissing everyone off by underhandedly getting His healthcare bill to go through (hence forcing the Republicans to reciprocate in the only way they can). As of this week, the new news is that the http://healthcare.gov website has gone live, and not only has the lower-quality experience and colossal number of glitches pissed everyone off, but so has the thematically-consistent price tag.

In 2011, CGI Federal was contracted to create the website for $93-million. Apparently, the final price tag is upwards of $634-million. I don’t know whether people are angrier for what it costs, or that they could have hired a couple of college kids to do it for a few grand, and at least contributed to the educational sector.

If all of that wasn’t enough, they made some code available via Github that seems to only be three commits old. I don’t know what to make of that, but it makes me very uncomfortable.

CGI Federal on Healthcare.gov: No comment

Healthcare.gov is a Technological Disaster

Glitchy Healthcare.gov cost taxpayers more than $634 million to build

$634 million for Healthcare.gov website

We paid over $500 million for the Obamacare sites and all we got was this lousy 404

What went wrong with healthcare.gov: The front end and back end never talked

A Practitioner’s Overview to SSL, and Viewing the Certificate Chain from Python

The fundamental principle of SSL is this: a client connects to an SSL-enabled server, and the server returns enough information to a) encrypt the communication channel, and b) authenticate itself enough that you can prove that they’re the intended system. (b) is performed by the server providing both its certificate information as well as the CA (certificate authority) that produced it. This latter item is the area in which you might occasionally encounter problems when you have a client that complains about not being able to verify a hostname. This is where some people recommend just passing a flag to skip the check, thus completely compromising the integrity of SSL.

Depending on how reputable your CA is, they might provide additional CA authorities (referred to as IAs, or “intermediate authorities”), such that these authorities form a “certificate chain” that sufficiently proves that all of the authorities that lend their credibility to your certificate can be traced back to one very well-known authority (the “root CA”) that any client (browser, tool, or OS) would know about.

In special or proprietary situations, you might have to physically go into the configuration for your browser/tool/OS, and add a new root CA that the client did not previously know about. Otherwise, the client might forbid access to your website on that system. Unless your dealing with some hightened security situation regarding the intranet at your place of business, this is rarely necessary.

Sometimes, it’s necessary to physically inspect what CAs are being reported by the server, for as simple a reason as just verifying that you’ve configured it correctly. Until recently, and even, arguably, at the present time, Python has been unable to provide this information, as it comes prepackaged with a natively-compiled SSL module and the underlying mechanics simply don’t expose these calls. If you want this information, you’d be forced to just invoke OpenSSL’s “s_client” subcommand on the command-line.

Just recently, a patch was released that exposes this functionality. As a warning, since this hasn’t yet been introduced into the source-tree, its implementation might change by the time it has.

This is some cut-up and reassembled code, to show it in action:

import socket

from ssl import wrap_socket, CERT_NONE, PROTOCOL_SSLv23
from ssl import SSLContext  # Modern SSL?
from ssl import HAS_SNI  # Has SNI?

from pprint import pprint

def ssl_wrap_socket(sock, keyfile=None, certfile=None, cert_reqs=None,
                    ca_certs=None, server_hostname=None,
                    ssl_version=None):
    context = SSLContext(ssl_version)
    context.verify_mode = cert_reqs

    if ca_certs:
        try:
            context.load_verify_locations(ca_certs)
        # Py32 raises IOError
        # Py33 raises FileNotFoundError
        except Exception as e:  # Reraise as SSLError
            raise SSLError(e)

    if certfile:
        # FIXME: This block needs a test.
        context.load_cert_chain(certfile, keyfile)

    if HAS_SNI:  # Platform-specific: OpenSSL with enabled SNI
        return (context, context.wrap_socket(sock, server_hostname=server_hostname))

    return (context, context.wrap_socket(sock))

hostname = 'www.google.com'

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((hostname, 443))

(context, ssl_socket) = ssl_wrap_socket(s,
                                       ssl_version=2, 
                                       cert_reqs=2, 
                                       ca_certs='/usr/local/lib/python3.3/dist-packages/requests/cacert.pem', 
                                       server_hostname=hostname)

pprint(ssl_socket.getpeercertchain())

s.close()

The output is a tuple of dictionaries:

({'issuer': ((('countryName', 'US'),),
             (('organizationName', 'Google Inc'),),
             (('commonName', 'Google Internet Authority G2'),)),
  'notAfter': 'Sep 11 11:04:38 2014 GMT',
  'notBefore': 'Sep 11 11:04:38 2013 GMT',
  'serialNumber': '50C71E48BCC50676',
  'subject': ((('countryName', 'US'),),
              (('stateOrProvinceName', 'California'),),
              (('localityName', 'Mountain View'),),
              (('organizationName', 'Google Inc'),),
              (('commonName', 'www.google.com'),)),
  'subjectAltName': (('DNS', 'www.google.com'),),
  'version': 3},
 {'issuer': ((('countryName', 'US'),),
             (('organizationName', 'GeoTrust Inc.'),),
             (('commonName', 'GeoTrust Global CA'),)),
  'notAfter': 'Apr  4 15:15:55 2015 GMT',
  'notBefore': 'Apr  5 15:15:55 2013 GMT',
  'serialNumber': '023A69',
  'subject': ((('countryName', 'US'),),
              (('organizationName', 'Google Inc'),),
              (('commonName', 'Google Internet Authority G2'),)),
  'version': 3},
 {'issuer': ((('countryName', 'US'),),
             (('organizationName', 'Equifax'),),
             (('organizationalUnitName',
               'Equifax Secure Certificate Authority'),)),
  'notAfter': 'Aug 21 04:00:00 2018 GMT',
  'notBefore': 'May 21 04:00:00 2002 GMT',
  'serialNumber': '12BBE6',
  'subject': ((('countryName', 'US'),),
              (('organizationName', 'GeoTrust Inc.'),),
              (('commonName', 'GeoTrust Global CA'),)),
  'version': 3},
 {'issuer': ((('countryName', 'US'),),
             (('organizationName', 'Equifax'),),
             (('organizationalUnitName',
               'Equifax Secure Certificate Authority'),)),
  'notAfter': 'Aug 22 16:41:51 2018 GMT',
  'notBefore': 'Aug 22 16:41:51 1998 GMT',
  'serialNumber': '35DEF4CF',
  'subject': ((('countryName', 'US'),),
              (('organizationName', 'Equifax'),),
              (('organizationalUnitName',
                'Equifax Secure Certificate Authority'),)),
  'version': 3})

The topmost item is the most specific and describes the certificate for the domain itself, whereas the bottommost one is the least specific, and describes the highest, most well known authority involved in the operation (in this case, Equifax).