MSBuild/C#: How to Manage the Application Version Using a Text-File

Overview

C# applications have an “AssemblyInfo.cs” file that describe the assembly and executable versions of a project. Unfortunately, sometimes it is not possible to access this from the code. Other times, you need to drive this version from external sources (like a build system) and then use it for the build.

The approach this by keeping the version in a text-file:

  1. Manually set/update the version in a text-file.
  2. Install a package that helps us with string-replacements.
  3. Inject this to AssemblyInfo.cs during the build.
  4. Embed this file into the executing assembly.
  5. Extract this file file the executing assembly when you need to know it during execution.

The title of this post is a simplification for lack of an easy way to succinctly describe five steps in a couple of words.

Do It

Feel free to modify/customize these steps as suits your needs.

1. Create the Version File

Create a file called “executable.version” in the “Properties\” folder of your executable project. Make sure to include this in your project. In the “Properties” window, set “Build Action” to “Embedded Resource”.

2. Install the “MSBuild Community Tasks” NuGet Package

This is the “MSBuildTasks” package. This provides us a regular-expression string-replacement MSBuild task.

3. Create a template “AssemblyVersion.cs” File

Copy “Properties\AssemblyInfo.cs” to “Properties\AssemblyInfo.cs.use_this” and update the two version attributes as the bottom to be the following:

[assembly: AssemblyVersion("__EXECUTABLE_VERSION__")]
[assembly: AssemblyFileVersion("__EXECUTABLE_VERSION__")]

Make sure to include this new file in the project. Note that we name this so as to not have the “.cs” extension because, otherwise, Visual Studio will try to parse it and complain about the attributes being duplicated from the “AssemblyInfo.cs” file.

4. Add the Build Step

We are going to add a custom build target to inject the version. We personally chose to put this into a separate rules file in order to make it clear which of the build-logic was ours, but this is up to you. It would just as easily work if it were included at the bottom of your project-file. Create “Properties\build.targets” with the following:

(build.targets)

<?xml version="1.0" encoding="utf-8" ?>
<Project ToolsVersion="12.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

<!-- Inject a version from a text-file into AssemblyVersion.cs . We do this 
 so that it's easier for the application to know its own version [by 
 reading the text file].
 -->
 <Import Project="$(ProjectDir)..\packages\MSBuildTasks.1.5.0.196\tools\MSBuild.Community.Tasks.Targets" /> 
 <Target Name="InjectVersion" BeforeTargets="BeforeBuild">
 <!-- Read the version from our text file. This appears to automatically 
 trim (probably per line). This is located in the project root so 
 that we copy the file to the output-path rather than establishing 
 a whole Properties/ directory in the output path.
 -->
 <ReadLinesFromFile File="$(ProjectDir)Properties\executable.version">
 <Output TaskParameter="Lines" PropertyName="ExecutableVersion" />
 </ReadLinesFromFile>

<!-- Print it to the build output whether we're in debug-mode or not. -->
 <Message Importance="High" Text="Executable version is [$(ExecutableVersion)]"/>

<!-- Copy our template file to the output file. -->
 <Copy SourceFiles="$(ProjectDir)Properties/AssemblyInfo.cs.use_this" DestinationFiles="$(ProjectDir)Properties/AssemblyInfo.cs"/>

<!-- Do an RX replace of the version on to the token. -->

<ItemGroup>
 <WriteFiles Include='$(ProjectDir)Properties/AssemblyInfo.cs' />
 </ItemGroup>

<FileUpdate 
 Files="@(WriteFiles)"
 Regex="__EXECUTABLE_VERSION__"
 ReplacementText="$(ExecutableVersion)"
 />

<!-- Replace the cautionary note about how to use the file with one 
 saying that any changes will be lost (if made to the output file). 
 -->
 <FileUpdate 
 Files="@(WriteFiles)"
 Regex="// TEMPLATE:.+"
 ReplacementText="// THIS FILE IS GENERATED! Apply any changes to 'AssemblyInfo.cs.use_this', instead."
 />
 </Target>
</Project>

IMPORTANT: Notice that we have to import the build targets provided by the “MSBuildTasks” package:

$(ProjectDir)..\packages\MSBuildTasks.1.5.0.196\tools\MSBuild.Community.Tasks.Targets

For us, NuGet packages go into the “packages” directory that is in the parent directory of our project directory. Also notice that we have to embed the version for this NuGet package. If your package is a different version or is located in a different place, you will have to update the example to be accurate.

NOTE: One way to get around having to embed the version is to bypass putting this package in your “packages.config file” and, instead, do a manual NuGet install of this package from a build-task to your packages directory (whereever it is) while also passing the “-ExcludeVersion” argument so as to not put the version in the package’s directory name.

Now, import the “build.targets” file from your project file. Put it somewhere near the bottom. Since it will run before the “BeforeBuild” target, we put it before that (which will be commented-out unless you use it):

 <Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" />
 <Import Project="Properties\build.targets" />

5. Reading the Version From the Application

At this point, you should be able to build your project. The only thing that might be considered a disadvantage to this method is that, every time you build your project from inside Visual Studio, you will be prompted to reload the “AssemblyInfo.cs” file because it has been updated from outside of VS even if it has not changed (which is no stupider than the amount of work that we are required to do in order to find our own version). It would be easiest to check the box in this popup that says to only tell you if you happen to have unsaved changes to a file that has been changed from outside VS.

In our case, we are using the CLAP command-line parser. So, we added a private “ExecutableVersion” getter on the class that we are using to handle our subcommands. Then, we added a “version” subcommand that reads and prints the new property. Code for the property:

(Program.cs)

private string executableVersion = null;

private string ExecutableVersion
{
    get
    {
        if (executableVersion == null)
        {
            Assembly assembly = Assembly.GetExecutingAssembly();
            string assemblyName = assembly.GetName().Name;

            // "Properties" is required since it is located in the 
            // Properties folder of the project and was thusly embedded 
            // as such.
            string filepath = assemblyName + @".Properties.executable.version";

            string[] names = assembly.GetManifestResourceNames();
            var stream = assembly.GetManifestResourceStream(filepath);
            if (stream == null)
            {
                throw new Exception(String.Format("Could not get resource-stream with name [{0}] for version content from assembly [{1}]. Available: {2}", filepath, assembly.FullName, String.Join(",", names)));
            }

            TextReader tr = new StreamReader(stream);
            executableVersion = tr.ReadToEnd().Trim();
        }

        return executableVersion;
    }
}

 

Advertisements

C#: Parsing a CSPROJ (Project) File Using XPath

Using XPath in C# can be done several different ways through a several built-in libraries, and none of them work unless you are a lot more familiar with the file than would be required in many other languages. However, to make matters worse, you might be further required to do some unintuitive shenanigans. In the way of an example, this is how to retrieve the assembly-name:

XNamespace xmlns = "http://schemas.microsoft.com/developer/msbuild/2003";
XDocument projDefinition = XDocument.Load(projectFilepath);

IEnumerable<XNode> assemblyResultsEnumerable = projDefinition
	.Element(xmlns + "Project")
	.Elements(xmlns + "PropertyGroup")
	.Elements(xmlns + "AssemblyName").Nodes<XContainer>();

IList<XNode> assemblyResults = new List<XNode>(assemblyResultsEnumerable);
if(assemblyResults.Count == 0)
{
	throw new Exception(String.Format("The project file isn't correctly structured: [{0}]", projectFilepath));
}

string assemblyName = assemblyResults[0].ToString();

Notice that we have to mash the namespace URL with the node-name in order to find the node.

Best Argument-Processing for .NET/C#

After trying NDesk.Options and Fluent, I am nothing but impressed with CLAP (“Command-Line Auto-Parser”). It completely relies on reflection and parameter attributes (usually just one or two) to automatically marshal your values, assign defaults, enforce requiredness, and provide command-line documentation. It’s beautiful and, so far, flawless. Well done.

using CLAP;

namespace MyNamespace
{
    class Program
    {
        [Verb(IsDefault = true, Description = &quot;Print the current version of the given package and, optionally, increment it.&quot;)]
        public void Version(
            [Description(&quot;Project path&quot;)]
            [Required]
            string projectPath,

            [Description(&quot;Package name&quot;)]
            [Required]
            string packageName,

            [Description(&quot;Base version to increment from (if lower than current, else use current)&quot;)]
            string baseVersion = null,

            [Description(&quot;Increment the version before returning&quot;)]
            bool increment = false
        )
        {
            // ...
        }
    }
}

If you don’t decorate with the “Required” attribute and don’t provide a default value the parameter will default to null. I explicitly set baseVersion to a default of null because I prefer being explicit.

Embedded SQL

A nostalgic visit from the past: Embedded SQL, where you can inject live SQL directly into your C code.

EMBEDDED SQL IN C
Introduction to Pro*C

The second refers to such development using Oracle. Example from the second:

for (;;) {
    printf("Give student id number : ");
    scanf("%d", &id);
    EXEC SQL WHENEVER NOT FOUND GOTO notfound;
    EXEC SQL SELECT studentname INTO :st_name
             FROM   student
             WHERE  studentid = :id;
    printf("Name of student is %s.\n", st_name);
    continue;
notfound:
    printf("No record exists for id %d!\n", id);
}

It’s worth mentioning just to have some central place to search for it later.

Scriptable C++

What if C++ were a scripting language that you could eval from your native C++?

ChaiScript: http://chaiscript.com

Example (from the homepage):

#include <chaiscript/chaiscript.hpp>

std::string helloWorld(const std::string &t_name)
{
  return "Hello " + t_name + "!";
}

int main()
{
  chaiscript::ChaiScript chai;
  chai.add(chaiscript::fun(&helloWorld), 
           "helloWorld");

  chai.eval("puts(helloWorld("Bob"));");
}

Use TightOCR for Easy OCR from Python

When it comes to recognizing documents from images in Python, there are precious few options, and a couple of good reasons why.

Tesseract is the world’s best OCR solution, and is currently maintained by Google. Unlike other solutions, it comes prepackaged with knowledge for a bunch of languages, so the machine-learning aspects of OCR don’t necessarily have to be a concern of yours, unless you want to recognize for an unknown language, font, potential set of distortions, etc…

However, Tesseract comes as a C++ library, which basically takes it out of the running for use with Python’s ctypes. This isn’t a fault of ctypes, but rather of a lack of standardization in symbol-naming among the C++ compilers (there’s no way to know how to determine the naming for a symbol in the library from Python).

There is an existing Python solution, which comes in the form of a very heavy Python wrapper called python-tesseract, which is built on SWIG. It also requires a couple of extra libraries, like OpenCV and numpy, even if you don’t seem to be using them.

Even if you decide to go the python-tesseract route, you will only have the ability to return the complete document as text, as their support for iteration through the parts of the document is broken (see the bug).

So, with all of that said, we accomplished lightweight access to Tesseract from Python by first building CTesseract (which produces a C wrapper for Tesseract.. see here), and then writing TightOCR (for Python) around that.

This is the result:

from tightocr.adapters.api_adapter import TessApi
from tightocr.adapters.lept_adapter import pix_read
from tightocr.constants import RIL_PARA

t = TessApi(None, 'eng');
p = pix_read('receipt.png')
t.set_image_pix(p)
t.recognize()

if t.mean_text_confidence() < 85:
    raise Exception("Too much error.")

for block in t.iterate(RIL_PARA):
    print(block)

Of course, you can still recognize the document in one pass, too:

from tightocr.adapters.api_adapter import TessApi
from tightocr.adapters.lept_adapter import pix_read
from tightocr.constants import RIL_PARA

t = TessApi(None, 'eng');
p = pix_read('receipt.png')
t.set_image_pix(p)
t.recognize()

if t.mean_text_confidence() < 85:
    raise Exception("Too much error.")

print(t.get_utf8_text())

With the exception of renaming “mean_text_conf” to “mean_text_confidence”, the library keeps most of the names from the original Tesseract API. So, if you’re comfortable with that, you should have no problem with this (if you even have to do more than the above).

I should mention that the original Tesseract library, though a universal and popular OCR solution, is very dismally documented. Therefore, there are many functions that I’ve left scaffolding for in the project, without being entirely sure how to use/test them nor having any need for them myself. So, I could use help in that area. Just submit issues or pull-requests if you want to contribute.

OCR a Document by Section

A previous post described how to use Tesseract to OCR a document to a single string. This post will describe how to take advantage of Tesseract’s internal layout processing to iterate through the documents sections (as determined by Tesseract).

This is the core logic to iterate through the document. Each section is referred to as a “paragraph”:

if(api->Recognize(NULL) < 0)
{
    api->End();
    pixDestroy(&image);

    return 3;
}

tesseract::ResultIterator *it = api->GetIterator();

do 
{
    if(it->Empty(tesseract::RIL_PARA))
        continue;

    char *para_text = it->GetUTF8Text(tesseract::RIL_PARA);

// para_text is the recognized text. It [usually] has a 
// newline on the end.

    delete[] para_text;
} while (it->Next(tesseract::RIL_PARA));

delete it;

To add some validation to the recognized content, we’ll also check the recognition confidence. In my experience, it seems like any recognition scoring less than 80% turned out as gibberish. In that case, you’ll have to do some preprocessing to remove some of the risk.

int confidence = api->MeanTextConf();
printf("Confidence: %d\n", confidence);
if(confidence < 80)
    printf("Confidence is low!\n");

The whole program:

#include <stdio.h>

#include <tesseract/baseapi.h>
#include <tesseract/resultiterator.h>

#include <leptonica/allheaders.h>

int main()
{
    tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();

    // Initialize tesseract-ocr with English, without specifying tessdata path.
    if (api->Init(NULL, "eng")) 
    {
        api->End();

        return 1;
    }

    // Open input image with leptonica library
    Pix *image;
    if((image = pixRead("receipt.png")) == NULL)
    {
        api->End();

        return 2;
    }
    
    api->SetImage(image);
    
    if(api->Recognize(NULL) < 0)
    {
        api->End();
        pixDestroy(&image);

        return 3;
    }

    int confidence = api->MeanTextConf();
    printf("Confidence: %d\n", confidence);
    if(confidence < 80)
        printf("Confidence is low!\n");

    tesseract::ResultIterator *it = api->GetIterator();

    do 
    {
        printf("=================\n");
        if(it->Empty(tesseract::RIL_PARA))
            continue;

        char *para_text = it->GetUTF8Text(tesseract::RIL_PARA);
        printf("%s", para_text);
        delete[] para_text;
    } while (it->Next(tesseract::RIL_PARA));
  
    delete it;

    api->End();
    pixDestroy(&image);

    return 0;
}

Applying this routine to an invoice (randomly found with Google), it is far easier to identify the address, tax, total, etc.. then with the previous method (which was naive about layout):

Confidence: 89
=================
Invoice |NV0010 '
To
Jackie Kensington
18 High St
Wickham
 Electrical
Sevices Limited

=================
Certificate Number CER/123-34 From
  1”” E'e°‘''°‘'’‘'Se”'°‘*’
17 Harold Grove, Woodhouse, Leeds,

=================
West Yorkshire, LS6 2EZ

=================
Email: info@ mj-e|ectrcia|.co.uk

=================
Tel: 441132816781

=================
Due Date : 17th Mar 2012

=================
Invoice Date : 16th Feb 2012

=================
Item Quantity Unit Price Line Price
Electrical Labour 4 £33.00 £132.00

=================
Installation carried out on flat 18. Installed 3 new
brass effect plug fittings. Checked fuses.
Reconnected light switch and fitted dimmer in living
room. 4 hours on site at £33 per hour.

=================
Volex 13A 2G DP Sw Skt Wht Ins BB Round Edge Brass Effect 3 £15.57 £46.71
Volex 4G 1W 250W Dimmer Brushed Brass Round Edge 1 £32.00 £32.00
Subtotal £210.71

=================
VAT £42.14

=================
Total £252.85

=================
Thank you for your business — Please make all cheques payable to ‘Company Name’. For bank transfers ‘HSBC’, sort code
00-00-00, Account no. 01010101.
MJ Electrical Services, Registered in England & Wales, VAT number 9584 158 35
 Q '|'.~..a::