Discussion of Trying to Make GCC Work With LLVM in the Pre-Clang Days

From back in 2005:

On the positive side, LLVM has a lot more experience in managing whole program compilation, and it is a much cleaner software base. It would be a good to find some mechanism to take advantage of that experience. Trying to make the hippo dance is not really a lot of fun.

Re: LLVM/GCC Integration Proposal

Displaying C++ vtables

A vtable is a mapping that allows your C++ application properly reconcile the function pointers for the base classes that have virtual methods and the child classes that override those methods (or do not override them). A class that does not have virtual methods will not have a vtable.

A vtable is pointed to by a pointer (“vpointer”) at the top of each object, usually, where the vtable is the same for all objects of a particular class.

Though you can derive the pointer yourself, you can use gdb, ddd, etc.. to display it:

Source code:

class BaseClass
{
    public:

    virtual int call_me1()
    {
        return 5;
    }

    virtual int call_me2()
    {
        return 10;
    }

    int call_me3()
    {
        return 15;
    }
};

class ChildClass : public BaseClass
{
    public:

    int call_me1()
    {
        return 20;
    }

    int call_me2()
    {
        return 25;
    }
};

Compile this with:

g++ -fdump-class-hierarchy -o vtable_example vtable_example.cpp

This emits a “.class” file that has the following (I’ve skipped some irrelevant information at the top, about other types:

Vtable for BaseClass
BaseClass::_ZTV9BaseClass: 4u entries
0     (int (*)(...))0
4     (int (*)(...))(& _ZTI9BaseClass)
8     (int (*)(...))BaseClass::call_me1
12    (int (*)(...))BaseClass::call_me2

Class BaseClass
   size=4 align=4
   base size=4 base align=4
BaseClass (0x0xb6a09230) 0 nearly-empty
    vptr=((& BaseClass::_ZTV9BaseClass) + 8u)

Vtable for ChildClass
ChildClass::_ZTV10ChildClass: 4u entries
0     (int (*)(...))0
4     (int (*)(...))(& _ZTI10ChildClass)
8     (int (*)(...))ChildClass::call_me1
12    (int (*)(...))ChildClass::call_me2

Class ChildClass
   size=4 align=4
   base size=4 base align=4
ChildClass (0x0xb76fdc30) 0 nearly-empty
    vptr=((& ChildClass::_ZTV10ChildClass) + 8u)
  BaseClass (0x0xb6a092a0) 0 nearly-empty
      primary-for ChildClass (0x0xb76fdc30)

Compiling your C/C++ and Obj C/C++ Code with “clang” (LLVM)

clang is a recent addition to the landscape of development within the C family. Though GCC is a household name (well, in my household), clang is built on LLVM, a modular and versatile compiler platform. In fact, because it’s built on LLVM, clang can emit a readable form of LLVM byte-code:

Source:

#include <stdio.h>

int main()
{
    printf("Testing.\n");

    return 0;
}

Command:

clang -emit-llvm -S main.c -o -

Output:

; ModuleID = 'main.c'
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32-S128"
target triple = "i386-pc-linux-gnu"

@.str = private unnamed_addr constant [10 x i8] c"Testing.\0A\00", align 1

define i32 @main() nounwind {
  %1 = alloca i32, align 4
  store i32 0, i32* %1
  %2 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([10 x i8]* @.str, i32 0, i32 0))
  ret i32 0
}

declare i32 @printf(i8*, ...)

Benefits of using clang versus gcc include the following:

  • Considerably better error messages (a very popular feature).
  • Considerable speed improvements and resource usage, across the board, per clang (http://clang.llvm.org/features.html#performance). This might not be the case, though, per the average discussion on Stack Overflow.
  • Its ASTs and code are allegedly simpler and more straight-forward for those individuals that would like to study them.
  • It’s a single parser for the C family of languages (including Objective C/C++, but does not include C#), while also promoting the ability to be further extended.
  • It’s built as an API, so it can be bound by other tools.

clang is not nearly as mature as GCC, though I haven’t seen (as a casual observer) much negative feedback due to this.

To do a two-part build like you would with GCC, the basic parameters are similar, though there are six-pages of parameters available:

clang -emit-llvm -o foo.bc -c foo.c
clang -o foo foo.bc

It’s important to mention that clang comes bundled with a static analyzer. This means that checking your code for bugs at a deeper level than the compiler is concerned with is that much more accessible. For example, if we adjust the code above to do an allocation, but neglect to free it:

#include <stdio.h>
#include <stdlib.h>

int main()
{
    printf("Testing.\n");

    void *new = malloc((size_t)2000);

    return 0;
}

We can build, while also telling clang to invoke the static-analyzer:

clang --analyze main.c -o main
main.c:8:11: warning: Value stored to 'new' during its initialization is never read
    void *new = malloc((size_t)2000);
          ^~~   ~~~~~~~~~~~~~~~~~~~~
main.c:10:5: warning: Memory is never released; potential leak of memory pointed to by 'new'
    return 0;
    ^
2 warnings generated.

In truth, I don’t know how clang’s static-analyzer compares with Valgrind, the standard, heavyweight, open-source static-analyzer. Though Valgrind can actually run your program and watch to make sure that your allocations are managed properly, I’m not yet sure if clang’s static-analyzer can do the same.