Step by Step Guide to Using Different UiPath OCR Engines
This is the descriptive blog post to use of OCR engine with UiPath.
In this detailed guide I’ll cover :
- Different Types of OCR used in UiPath
- How to choose the right OCR for your next automation
- Working example with problem statement & different approach
- Pro /Cons of different types of OCR
- Lots more
- Working workflow example
So if you want learn how to use different ocr engine with UiPath, You will love this case study and guide.
Let’s get started.
Different Types of Engine for Uipath OCR
- Tesseract /Google OCR – This actually uses the open-source Tesseract OCR Engine, so it is free to use. Also, this processing is done on the local machine where UiPath is running.
- Google Cloud OCR – This requires a Google Cloud API Key, which has a free trial.
- Microsoft OCR – This uses the MODI OCR Engine, which is also free to use, and the processing is done locally like Google OCR.
- Microsoft Cloud OCR – This uses the Microsoft Computer Vision API, which is also free to sign up for. Also known as Microsoft Azure ComputerVision OCR.
- Abbyy OCR – This requires you to install Abbyy FineReader on your local machine and purchase a license.
- Abbyy Cloud OCR – This requires a subscription. We will use Abbyy Cloud OCR for our use case.
There are few other options available but based on the various question asked on forums, I have selected the top six to perform the experiment with.
let me tell you working of all the ocr engine follow similar steps …so it wont be challenge for you if you would like to experiment with other OCR Tools.
In Case you love to work with Python for OCR You can read our detailed article on how to use OCR with Python in UiPath
Please feel free to reach me in case you wish to include any other OCR engine in the blogpost. I will be happy to include in the next version.
With so many ocr-recognition engine available in market its quite obvious to have query on which one to be used ? which will solve my problem and so on …
The problem statement is generic in nature – What we are intended to do is .. to decide and see what works best with which OCR Engine…
We will build matrix by end of exercise to give idea about when to use which OCR…
So On High-level by the end of post you should be able to gain insight on –
- Which OCR engine works best with UiPath
- Differences in terms of processing when using Paid vs Free OCR
- Which one is recommended for Handwritten Materials (like meals receipts, hotels invoices, taxi fares, parking receipts)
- Which engines read low-quality scans perfectly
- What to consider before starting the OCR Project
- Best practices and guidelines
As discussed in the problem statement we need to perform the task on different types of PDF so that we should be able to factor multiple things.
for this example we will play with following data set –
- Sample Pdf File(Structure PDF- Say it 01-Invoice.pdf)
- Short Story(Full Page Text – Say it 02-Short-Stories.pdf)
- The invoice with sample text(Tabular Invoice say it 03-Invoice-sample.pdf)
- Scanned Invoice with Handwritten Text(Say it 04-Scanned-Invoice-Handwriiten.pdf)
- Bills with Handwritten text (Say it 05-good-hand-written-bill.pdf)
Running hundred of test cases are beyond the scope of this blog post and in case you wish to use any ocr engine in your production environment i will suggest to run regression on various input as none of them provide 100 % accurate result and Confidence Scores Matter.
For Test cases we will use the input pdfs listed above to be scanned with all types of ocr engines.
In terms of output we will focus on –
- Extracted data in (key, value) pair or excel sheet
- Confidence Scores
First thing first – what all dependencies need to be added into project?
As we are working with six ocr engines at the same time so it requires multiple things such as installing uipath package getting some external package installed, sign-up on external website and create keys for integration etc…
So we have broken the procedure step by step and required details are covered in the rel vent section (for example what is needed for Abbyy Cloud OCR is covered in Abbyy Cloud OCR section ) of the blog post.
Before starting on specific ocr tools let me explain few common input/output properties of these activities in Uipath as the working of these activities are similar in nature so will cover the details before trying different ocr engine.
for our case study, we have taken 5 pdf of different types and they are kept in the same folder with numbers. These pdf are included in the workflow data folder and can be downloaded with a complete working example from the link at the end of the article.
for sake of simplicity we have not used RE Framework or complex example. Project consist of –
- Data Directory – for input pdfs and result in excel
- 6 separate workflow for each ocr types.
- Main workflow to loop over input pdf to perform ocr and save the result in the output directory
Code is pretty simple and contains only a few activities which you can easily understand if you know working of OCR.
So next thing? How we can use different types of UiPath Ocr. What are the properties you need to modify and how useful they are?
In the next section i have covered them in details.
How to use Different OCR engine in UiPath
- The Tesseract OCR engine used in UiPath is updated now to version 4.0. That contains an OCR engine – libtesseract and a command line program – tesseract.
- Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused online recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns.
- UiPath.Core.Activities.GoogleOCR is the activity that is used with other activity to read text from UI elements or image using the Tesseract OCR Engine. It can be also used with other OCR activities, such as Click OCR Text, Hover OCR Text, Double Click OCR Text, Get OCR Text, and Find OCR Text Position.
- For our example, we will use UiPath.PDF.Activities.ReadPDFWithOCR to read the input files and result will be stored in
Our work flow will look like this –
We will talk about the details of the workflow in details at a later stage, for now, you need to focus on the details of
- Its take input as Image type only
- You should be able to provide details of AllowedCharacters & DeniedCharacters
- The language field needs to contain the language file prefix, such as “ron” for Romanian, “ita” for Italian, and “fra” for French.
- The scale is important to factor here – You need to specify higher values in case you are reading from scan images. The default value is 2.
- Profile –
- None – if no preprocessing is required.
- Screen – Required for RDP application automation
- Scan – To be used with scanned images
- Legacy – Default settings for pre-processing images.
In the output you can get Text as extracted string and Result which contains words along with the screen position.
You can read more details at official documentation Link – here
You might be wondering what is the difference between two variants of OCR provided by google. You can find the details of difference below.
What is the difference between Google Cloud OCR and Google OCR in UiPath?
The difference is the engine as Google Could OCR is using the Google Cloud OCR engine and Google OCR is using Tesseract OCR Engine.
Also, Google OCR is using the Tesseract engine which is deployed locally (comes with UiPath Studio) and the image processing and text extraction is done locally, on your computer.
While Google Cloud OCR is uploading the image to be processed to Google server (cloud) and you get back the resulting text. So all the processing is done remotely on Google servers and you just get the result.
Google OCR is free while you need to pay for Google Cloud OCR (free trial is available with limitation on Usages)
Google Cloud Vision OCR works in the same way and the only difference would be in terms of properties you will be set while invoking the
UiPath.Core.Activities.GoogleCloudOCR activity. Most of the properties are same for Vision OCR except the ApiKey (Obvious to use cloud console) and ResizeToMaxLimitIfNecessary to attempts downsizing the target image so that it does not exceed the size limit of the Google Cloud Vision engine.
Rest other properties are same as Google OCR.
Before going to workflow let me tell you that you need to perform few steps on GCP to enable the Google Cloud Vision API.
- You need to sign-up on https://console.cloud.google.com/
- You need to provide the credit card details for billing (Don’t be afraid you will get 300 USD free credit on signup)
- you need to set up a new project say is My First Project
- You need to enable Cloud Vision API to get the API Key.
- You need to pass above key in your workflow by passing it with the ApiKey=API_KEY parameter along with other properties.
Sample workflow will look like this –
You can read more details at official documentation Link – here
#3 Using Microsoft OCR –
Microsoft uses the MODI OCR Engine, which is also free to use, and the processing is done locally like Google OCR.
The MODI (Microsoft Office Document Imaging) engine used by the Microsoft OCR activity relies on Microsoft technical support for Windows 7 and Windows 10.
Most of the input Properties are similar to Google OCR engine and it will output the extracted words along with their on-screen position as KeyValuePair
Sample workflow will look like this –
# 4 Using Microsoft Azure ComputerVision OCR –
Similar to Google cloud vision you need to perform certain steps on Azure before you can start using the
You need to perform following steps before you work with Computer Vision API –
- Signup for free account on Azure or Login using your pay-as-you-go account
- Sign in into Azure portal and add Computer Vision
- Check how to embed Computer Vision with quickstarts and documentation.
Extracting the text from images using Computer Vision API to extract printed and handwritten text from images/pdfs into machine-readable character stream is super easy all you need to know is the
- Endpoints of vision api ;
- Keys to connect those services
Bit caution here as azure provide two variants of Computer Vision API
- Read API
- OCR API
Azure Computer Vision OCR API recognizes printed text and supports a large variety of languages.
Azure Computer Vision Read API recognizes the handwritten and printed text, but temporary is available only in English.
The major difference among these two is that Read API uses the model that support only English language as of now while OCR supports more than 25 languages with auto detection and rotation of recognized text from Image.
Image Requirements –
- The image must be presented in JPEG, PNG, GIF, or BMP format
- The file size of the image must be less than 4 megabytes (MB)
- The dimensions of the image must be greater than 50 x 50 pixels
- For the Read API, the dimensions of the image must be between 50 x 50 and 10000 x 10000 pixels.
Distill actionable information from images ( 5,000 transactions, 20 per minute.)
You need to note down the Endpoints & the Keys for your further processing.
#5 Using Abbyy OCR
For Abby OCR activity to work, you need to install ABBYY FineReader Engine and purchase a license for it. After installing ABBYY FineReader Engine you must activate it.
- You will need ABBYY FineReader Engine
- You will need a runtime key that is provided by UiPath. (you must contact the sales team for this)
- You must follow Instructions to install it. Pay attention to install x32(x86) version. If you use the provided installation instruction the command is using x86 by default.
You can read more details on steps that you need to do to install/activate ABBYY FineReader – here
- Abbyy Cloud OCR SDK supports the recognition of printed text in more than 200 languages, including most Asian languages: Chinese, Japanese, Korean, Arabic, Farsi, Vietnamese, Thai and others using industry leading FineReader OCR technology.
- Abbyy Cloud OCR SDK recognizes both printed and hand-printed text within specific fields (zonal OCR).
- Its Cloud OCR recognition features are used for reading invoices, receipts, bills, business cards and many other document category. Not Only this it also support handwritten or manually filed forms extraction as well.
- Convert image/PDF to searchable PDF, PDF/A
- Convert image/PDF to Microsoft Word, Excel, PowerPoint
To start working with ABBYY Cloud OCR you need to setup things similar to Google & Microsoft Vision API.
UiPath.Abbyy.Activities.AbbyyCloudOCRin case not enable you can enable using the managed package from studio
- You need ApplicationID, Password, ServerUrl to be used with AbbyyCloudOCR Activity. So…You need to create a new Application after signup on their cloud platform
- Once you create the new application you need to note down –
- Display Name: RPABOTSWORLD
- Application ID: dd3410e2-e883-xxxx-xxxx-b4de7dd3d40f (Something like this)
- You need to note down your password as well along with the server URL for required properties configuration in the workflow.
- You will not see the password on web console it will be send to you on your email, However you can reset it from portal.
Your sample workflow will look like this-
The key important properties here are given below. Few other common properties have same meaning as in other ocr engine processing.
- ApplicationID – The application ID provided when subscribing to the Abbyy Cloud OCR service.
- Password – The password provided when subscribing to the Abbyy Cloud OCR service.
- ServerUrl – The Server URL provided when subscribing to the Abbyy Cloud OCR service.
Must Read – Comparison Cloud OCR SDK vs. FineReader Engine SDK https://abbyy.technology/en:features:comparisons:comp_onlinesdk-fre
You can visit the code repository to download the code from Git Hub in case you wish to try your hands.
As mentioned above this code contains the urls and API keys etc as it is to avoid any confusion for learners. However those will not work and you need to adjust the values of API KEY, URL, PASSWORD etc. for code to run.
Download Link here
Result & Observations
You can Check the raw Output result in Excel in the Output folder of the Code.
However here are the observation of RPABOTSWORD Team on Trying Different OCR with Uipath.
There were few issues faced while this code for Uipath Ocr example was being written. You might face similar issue so we have also listed down those for your help.
#1. Microsoft Cloud Ocr Was not getting connect with Uipath and throwing below error-
UiPath.10:43:03.4398 Fatal UiPath.Vision.OCR.OCRException: MicrosoftAzureComputerVisionErrorRunEngine —> System.Net.Http.HttpRequestException: An error occurred while sending the request. —> System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send. —> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. —> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
This was due to the fact that one of the required DLL assembly for public key was missing and we need to update the Computer vision package from Manage Package Screen.
This can be identified by looking at the Studio Logs.
21:20:34.8925 => [ERROR] [UiPath.Studio.exe]  $LoadAssembly: UiPath.CV, Version=220.127.116.11, Culture=neutral, PublicKeyToken=null: System.IO.FileNotFoundException: Could not load file or assembly ‘Emgu.CV.World, Version=18.104.22.16884, Culture=neutral, PublicKeyToken=null’ or one of its dependencies. The system cannot find the file specified.
File name: ‘Emgu.CV.World, Version=22.214.171.12484, Culture=neutral, PublicKeyToken=null’
#2 Issue with the Maximum Size of the PDF file with Google Cloud OCR Engine
Read PDF With OCR: Error performing OCR: Request payload size exceeds the limit: 10485760 bytes. GoogleCloudErrorInvalidResponse
You need to check the MaxSizeLimit Property for GoogleCloudOCR To fix this issue.
Key Points you need to remember
- You should note that in many cases, in order to get better OCR results, you’ll need to improve the quality of the image you are giving to OCR engine.
- Unsurprisingly, the paid OCR engines performed the best, especially with scanned documents. None of the engines read low-quality scans perfectly, but the cloud options were closest.
- If OCR is a key part of your project, I recommend trying all of your available options with Uipath OCR for the specific document types you’re working with to find the best option that works within your project budget.
- OCR is all about experimenting with different settings so you need to modify the scale, dpi or sometimes you might need to pre-process the image for better result.