C++: Embedding the V8 JavaScript Engine

V8 is Chromium’s JavaScript interpreter. Not only can you use the included D8 utility to open a console and execute JavaScript code directly, but it is not as tough as you would think to integrate your native functions from your JavaScript and your JavaScript functions from your native functions. Of course, you have to get your head around all of the layers of isolates, context, and scoping.

For a walkthrough of embedding V8, start here.

In order to build a quick example of how to implement a native function for consumption from your JavaScript, go ahead and build V8. You will have to choose how to build the project. You’ll provide the name of a particular build configuration, which will write a few build parameters (called “args”) to the “out.gn//args.gn” file. This is a GN thing. GN is a build tool from the Chromium depot-tools project that generates Ninja scripts for doing the actual build.

For a list of build configurations, run:

$ tools/dev/v8gen.py list
...
s390.optdebug
s390.optdebug.sim
s390.release
s390.release.sim
s390x.debug
s390x.debug.sim
s390x.optdebug
s390x.optdebug.sim
s390x.release
s390x.release.sim
x64.debug
x64.optdebug
x64.release
x64.release.sample

(list shortened for simplicity)

We’ve used the “x64.release.sample” configuration for our example. It is straightforward for the purpose of this example. Once you have V8 downloaded and the dependencies installed, the actual project builds in about fifteen-minutes (with sixteen threads).

Drop this example into a file named “native_function.cpp” (or update the build command below to whatever you go with). This is an amalgam of excerpts from the samples in the project, examples in the documentation, and some customization on our part.

#include <sstream>

#include "libplatform/libplatform.h"
#include "v8.h"

// This is the function that we'll register into JS as a global function.
static void TestCallback(const v8::FunctionCallbackInfo<v8::Value>& args) {
  v8::Isolate* isolate = args.GetIsolate();
  v8::HandleScope scope(isolate);

  // Return [the arbitrary integer] 55 as our result.
  auto result = v8::Integer::New(isolate, 55);

  // Set the result.
  args.GetReturnValue().Set(result);
}

int main(int argc, char* argv[]) {

  // Initialize V8.
  v8::V8::InitializeICUDefaultLocation(argv[0]);
  v8::V8::InitializeExternalStartupData(argv[0]);
  std::unique_ptr<v8::Platform> platform = v8::platform::NewDefaultPlatform();
  v8::V8::InitializePlatform(platform.get());
  v8::V8::Initialize();

  // Create a new Isolate and make it the current one.
  v8::Isolate::CreateParams create_params;
  create_params.array_buffer_allocator =
      v8::ArrayBuffer::Allocator::NewDefaultAllocator();

  v8::Isolate* isolate = v8::Isolate::New(create_params);

  // Scoping in the C++ is tightly related to scoping in the JS. So, we'll 
  // retain the organizational blocks from the samples.
  {
    v8::Isolate::Scope isolate_scope(isolate);

    // Create a stack-allocated handle scope.
    v8::HandleScope handle_scope(isolate);

    // Create a global object to install our function into.
    v8::Local<v8::ObjectTemplate> global = v8::ObjectTemplate::New(isolate);

    // Register our global function.
    global->Set(
      v8::String::NewFromUtf8(isolate, "testCall", v8::NewStringType::kNormal).ToLocalChecked(),
      v8::FunctionTemplate::New(isolate, TestCallback)
    );

    // Create a new context and apply the global object.
    v8::Local<v8::Context> context = v8::Context::New(isolate, NULL, global);

    // Enter the context for compiling and running the hello world script.
    v8::Context::Scope context_scope(context);

    // Load the script.

    std::stringstream sourceStream;

    sourceStream
      << "testCall();" << std::endl;

    v8::Local<v8::String> source =
      v8::String::NewFromUtf8(
          isolate, 
          sourceStream.str().c_str(), 
          v8::NewStringType::kNormal)
        .ToLocalChecked();

    // Compile the source code.
    v8::Local<v8::Script> script =
      v8::Script::Compile(context, source)
        .ToLocalChecked();

    // Run the script.
    v8::Local<v8::Value> result = script->Run(context).ToLocalChecked();

    // Print the screen output.
    v8::String::Utf8Value output(isolate, result);
    printf("%s\n", *output);
  }

  // Dispose the isolate and tear down V8.
  isolate->Dispose();
  v8::V8::Dispose();
  v8::V8::ShutdownPlatform();
  delete create_params.array_buffer_allocator;

  return 0;
}

To build, set V8PATH to the absolute path of your “v8/v8” directory (the main V8 path established in the getting-started document above), and run the following:

$ g++ "-I${V8PATH}/include" -o native_function native_function.cpp -lv8_monolith "-L${V8PATH}/out.gn/x64.release.sample/obj" -pthread -std=c++0x

The example implements a simple, native function that just returns (55), registers it as a global JS function, creates a simple JS script that calls it, and then runs the script and prints the screen output. The screen output is just the result of that function printed to the screen since it was not otherwise assigned to a variable:

$ ./native_function 
55

For a couple of more examples, see the project repository.

Advertisements

MSBuild/C#: How to Manage the Application Version Using a Text-File

Overview

C# applications have an “AssemblyInfo.cs” file that describe the assembly and executable versions of a project. Unfortunately, sometimes it is not possible to access this from the code. Other times, you need to drive this version from external sources (like a build system) and then use it for the build.

The approach this by keeping the version in a text-file:

  1. Manually set/update the version in a text-file.
  2. Install a package that helps us with string-replacements.
  3. Inject this to AssemblyInfo.cs during the build.
  4. Embed this file into the executing assembly.
  5. Extract this file file the executing assembly when you need to know it during execution.

The title of this post is a simplification for lack of an easy way to succinctly describe five steps in a couple of words.

Do It

Feel free to modify/customize these steps as suits your needs.

1. Create the Version File

Create a file called “executable.version” in the “Properties\” folder of your executable project. Make sure to include this in your project. In the “Properties” window, set “Build Action” to “Embedded Resource”.

2. Install the “MSBuild Community Tasks” NuGet Package

This is the “MSBuildTasks” package. This provides us a regular-expression string-replacement MSBuild task.

3. Create a template “AssemblyVersion.cs” File

Copy “Properties\AssemblyInfo.cs” to “Properties\AssemblyInfo.cs.use_this” and update the two version attributes as the bottom to be the following:

[assembly: AssemblyVersion("__EXECUTABLE_VERSION__")]
[assembly: AssemblyFileVersion("__EXECUTABLE_VERSION__")]

Make sure to include this new file in the project. Note that we name this so as to not have the “.cs” extension because, otherwise, Visual Studio will try to parse it and complain about the attributes being duplicated from the “AssemblyInfo.cs” file.

4. Add the Build Step

We are going to add a custom build target to inject the version. We personally chose to put this into a separate rules file in order to make it clear which of the build-logic was ours, but this is up to you. It would just as easily work if it were included at the bottom of your project-file. Create “Properties\build.targets” with the following:

(build.targets)

<?xml version="1.0" encoding="utf-8" ?>
<Project ToolsVersion="12.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

<!-- Inject a version from a text-file into AssemblyVersion.cs . We do this 
 so that it's easier for the application to know its own version [by 
 reading the text file].
 -->
 <Import Project="$(ProjectDir)..\packages\MSBuildTasks.1.5.0.196\tools\MSBuild.Community.Tasks.Targets" /> 
 <Target Name="InjectVersion" BeforeTargets="BeforeBuild">
 <!-- Read the version from our text file. This appears to automatically 
 trim (probably per line). This is located in the project root so 
 that we copy the file to the output-path rather than establishing 
 a whole Properties/ directory in the output path.
 -->
 <ReadLinesFromFile File="$(ProjectDir)Properties\executable.version">
 <Output TaskParameter="Lines" PropertyName="ExecutableVersion" />
 </ReadLinesFromFile>

<!-- Print it to the build output whether we're in debug-mode or not. -->
 <Message Importance="High" Text="Executable version is [$(ExecutableVersion)]"/>

<!-- Copy our template file to the output file. -->
 <Copy SourceFiles="$(ProjectDir)Properties/AssemblyInfo.cs.use_this" DestinationFiles="$(ProjectDir)Properties/AssemblyInfo.cs"/>

<!-- Do an RX replace of the version on to the token. -->

<ItemGroup>
 <WriteFiles Include='$(ProjectDir)Properties/AssemblyInfo.cs' />
 </ItemGroup>

<FileUpdate 
 Files="@(WriteFiles)"
 Regex="__EXECUTABLE_VERSION__"
 ReplacementText="$(ExecutableVersion)"
 />

<!-- Replace the cautionary note about how to use the file with one 
 saying that any changes will be lost (if made to the output file). 
 -->
 <FileUpdate 
 Files="@(WriteFiles)"
 Regex="// TEMPLATE:.+"
 ReplacementText="// THIS FILE IS GENERATED! Apply any changes to 'AssemblyInfo.cs.use_this', instead."
 />
 </Target>
</Project>

IMPORTANT: Notice that we have to import the build targets provided by the “MSBuildTasks” package:

$(ProjectDir)..\packages\MSBuildTasks.1.5.0.196\tools\MSBuild.Community.Tasks.Targets

For us, NuGet packages go into the “packages” directory that is in the parent directory of our project directory. Also notice that we have to embed the version for this NuGet package. If your package is a different version or is located in a different place, you will have to update the example to be accurate.

NOTE: One way to get around having to embed the version is to bypass putting this package in your “packages.config file” and, instead, do a manual NuGet install of this package from a build-task to your packages directory (whereever it is) while also passing the “-ExcludeVersion” argument so as to not put the version in the package’s directory name.

Now, import the “build.targets” file from your project file. Put it somewhere near the bottom. Since it will run before the “BeforeBuild” target, we put it before that (which will be commented-out unless you use it):

 <Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" />
 <Import Project="Properties\build.targets" />

5. Reading the Version From the Application

At this point, you should be able to build your project. The only thing that might be considered a disadvantage to this method is that, every time you build your project from inside Visual Studio, you will be prompted to reload the “AssemblyInfo.cs” file because it has been updated from outside of VS even if it has not changed (which is no stupider than the amount of work that we are required to do in order to find our own version). It would be easiest to check the box in this popup that says to only tell you if you happen to have unsaved changes to a file that has been changed from outside VS.

In our case, we are using the CLAP command-line parser. So, we added a private “ExecutableVersion” getter on the class that we are using to handle our subcommands. Then, we added a “version” subcommand that reads and prints the new property. Code for the property:

(Program.cs)

private string executableVersion = null;

private string ExecutableVersion
{
    get
    {
        if (executableVersion == null)
        {
            Assembly assembly = Assembly.GetExecutingAssembly();
            string assemblyName = assembly.GetName().Name;

            // "Properties" is required since it is located in the 
            // Properties folder of the project and was thusly embedded 
            // as such.
            string filepath = assemblyName + @".Properties.executable.version";

            string[] names = assembly.GetManifestResourceNames();
            var stream = assembly.GetManifestResourceStream(filepath);
            if (stream == null)
            {
                throw new Exception(String.Format("Could not get resource-stream with name [{0}] for version content from assembly [{1}]. Available: {2}", filepath, assembly.FullName, String.Join(",", names)));
            }

            TextReader tr = new StreamReader(stream);
            executableVersion = tr.ReadToEnd().Trim();
        }

        return executableVersion;
    }
}

 

C#: Parsing a CSPROJ (Project) File Using XPath

Using XPath in C# can be done several different ways through a several built-in libraries, and none of them work unless you are a lot more familiar with the file than would be required in many other languages. However, to make matters worse, you might be further required to do some unintuitive shenanigans. In the way of an example, this is how to retrieve the assembly-name:

XNamespace xmlns = "http://schemas.microsoft.com/developer/msbuild/2003";
XDocument projDefinition = XDocument.Load(projectFilepath);

IEnumerable<XNode> assemblyResultsEnumerable = projDefinition
	.Element(xmlns + "Project")
	.Elements(xmlns + "PropertyGroup")
	.Elements(xmlns + "AssemblyName").Nodes<XContainer>();

IList<XNode> assemblyResults = new List<XNode>(assemblyResultsEnumerable);
if(assemblyResults.Count == 0)
{
	throw new Exception(String.Format("The project file isn't correctly structured: [{0}]", projectFilepath));
}

string assemblyName = assemblyResults[0].ToString();

Notice that we have to mash the namespace URL with the node-name in order to find the node.

Best Argument-Processing for .NET/C#

After trying NDesk.Options and Fluent, I am nothing but impressed with CLAP (“Command-Line Auto-Parser”). It completely relies on reflection and parameter attributes (usually just one or two) to automatically marshal your values, assign defaults, enforce requiredness, and provide command-line documentation. It’s beautiful and, so far, flawless. Well done.

using CLAP;

namespace MyNamespace
{
    class Program
    {
        [Verb(IsDefault = true, Description = &quot;Print the current version of the given package and, optionally, increment it.&quot;)]
        public void Version(
            [Description(&quot;Project path&quot;)]
            [Required]
            string projectPath,

            [Description(&quot;Package name&quot;)]
            [Required]
            string packageName,

            [Description(&quot;Base version to increment from (if lower than current, else use current)&quot;)]
            string baseVersion = null,

            [Description(&quot;Increment the version before returning&quot;)]
            bool increment = false
        )
        {
            // ...
        }
    }
}

If you don’t decorate with the “Required” attribute and don’t provide a default value the parameter will default to null. I explicitly set baseVersion to a default of null because I prefer being explicit.

Embedded SQL

A nostalgic visit from the past: Embedded SQL, where you can inject live SQL directly into your C code.

EMBEDDED SQL IN C
Introduction to Pro*C

The second refers to such development using Oracle. Example from the second:

for (;;) {
    printf("Give student id number : ");
    scanf("%d", &id);
    EXEC SQL WHENEVER NOT FOUND GOTO notfound;
    EXEC SQL SELECT studentname INTO :st_name
             FROM   student
             WHERE  studentid = :id;
    printf("Name of student is %s.\n", st_name);
    continue;
notfound:
    printf("No record exists for id %d!\n", id);
}

It’s worth mentioning just to have some central place to search for it later.

Scriptable C++

What if C++ were a scripting language that you could eval from your native C++?

ChaiScript: http://chaiscript.com

Example (from the homepage):

#include <chaiscript/chaiscript.hpp>

std::string helloWorld(const std::string &t_name)
{
  return "Hello " + t_name + "!";
}

int main()
{
  chaiscript::ChaiScript chai;
  chai.add(chaiscript::fun(&helloWorld), 
           "helloWorld");

  chai.eval("puts(helloWorld("Bob"));");
}

Use TightOCR for Easy OCR from Python

When it comes to recognizing documents from images in Python, there are precious few options, and a couple of good reasons why.

Tesseract is the world’s best OCR solution, and is currently maintained by Google. Unlike other solutions, it comes prepackaged with knowledge for a bunch of languages, so the machine-learning aspects of OCR don’t necessarily have to be a concern of yours, unless you want to recognize for an unknown language, font, potential set of distortions, etc…

However, Tesseract comes as a C++ library, which basically takes it out of the running for use with Python’s ctypes. This isn’t a fault of ctypes, but rather of a lack of standardization in symbol-naming among the C++ compilers (there’s no way to know how to determine the naming for a symbol in the library from Python).

There is an existing Python solution, which comes in the form of a very heavy Python wrapper called python-tesseract, which is built on SWIG. It also requires a couple of extra libraries, like OpenCV and numpy, even if you don’t seem to be using them.

Even if you decide to go the python-tesseract route, you will only have the ability to return the complete document as text, as their support for iteration through the parts of the document is broken (see the bug).

So, with all of that said, we accomplished lightweight access to Tesseract from Python by first building CTesseract (which produces a C wrapper for Tesseract.. see here), and then writing TightOCR (for Python) around that.

This is the result:

from tightocr.adapters.api_adapter import TessApi
from tightocr.adapters.lept_adapter import pix_read
from tightocr.constants import RIL_PARA

t = TessApi(None, 'eng');
p = pix_read('receipt.png')
t.set_image_pix(p)
t.recognize()

if t.mean_text_confidence() < 85:
    raise Exception("Too much error.")

for block in t.iterate(RIL_PARA):
    print(block)

Of course, you can still recognize the document in one pass, too:

from tightocr.adapters.api_adapter import TessApi
from tightocr.adapters.lept_adapter import pix_read
from tightocr.constants import RIL_PARA

t = TessApi(None, 'eng');
p = pix_read('receipt.png')
t.set_image_pix(p)
t.recognize()

if t.mean_text_confidence() < 85:
    raise Exception("Too much error.")

print(t.get_utf8_text())

With the exception of renaming “mean_text_conf” to “mean_text_confidence”, the library keeps most of the names from the original Tesseract API. So, if you’re comfortable with that, you should have no problem with this (if you even have to do more than the above).

I should mention that the original Tesseract library, though a universal and popular OCR solution, is very dismally documented. Therefore, there are many functions that I’ve left scaffolding for in the project, without being entirely sure how to use/test them nor having any need for them myself. So, I could use help in that area. Just submit issues or pull-requests if you want to contribute.