OCR a Document with C++

There is an OCR library developed by HP and maintained by Google called Tesseract. It works immediately, and does not require training.

Building it is trivial. What’s more trivial is just installing it from packages:

$ sudo apt-get install libtesseract3 libtesseract-dev
$ sudo apt-get install liblept3 libleptonica-dev
$ sudo apt-get install tesseract-ocr-eng

Note that this installs the data for recognizing English.

Now, go and get the example code from the Google Code wiki for the project, and paste it into a file called ocr-test.cpp . Also, right-click and save the example document image (a random image I found with Google). You don’t have to use this particular document, as long as what is used is sufficiently clear at a high-enough resolution (the example is about 1500×2000).

Now, change the location of the file referred-to by the example code:

Pix *image = pixRead("letter.jpg");

Compile/link it:

$ g++ -o ocr-test ocr-test.cpp -ltesseract -llept

Run the example:


You’re done. The following will be displayed:

OCR output:
fie’  1?/2440
BARROSO (2012) 1300171
BARROSO (2012)

Dear Lord Tugendhat.

Thank you for your letter of 29 October and for inviting the European Commission to

contribute in the context of the Economic Aflairs Committee's inquiry into "The

Economic lmplicationsfirr the United Kingdom of Scottish Independence ".

The Committee will understand that it is not the role of the European Commission to

express a position on questions of internal organisation related to the constitutional

arrangements of a particular Member State.

Whilst refraining from comment on possible fitture scenarios. the European Commission

has expressed its views in general in response to several parliamentary questions from

Members of the European Parliament. In these replies the European Commission has 
noted that scenarios such as the separation of one part of a Member State or the creation 
of a new state would not be neutral as regards the EU Treaties. The European 
Commission would express its opinion on the legal consequences under EU law upon ;
requestfiom a Member State detailing a precise scenario. :
The EU is founded on the Treaties which apply only to the Member States who have 3
agreed and ratified them. if part of the territory of a Member State would cease to be ,
part of that state because it were to become a new independent state, the Treaties would

no longer apply to that territory. In other words, a new independent state would, by the
fact of its independence, become a third country with respect to the E U and the Treaties

would no longer apply on its territory. ‘


Acting Chairman

House of Lords q
Committee Oflice
E-mail: economicaflairs@par1igment.ttk

One thought on “OCR a Document with C++

Comments are closed.