Text recognition (OCR)

Business
Business+
Document Management

Finding documents by their contents using the OCR (Optical Character Recognition) module for RetSoft Archive.
The OCR module provides a way to make the entire text of scanned documents searchable.

The OCR module

Business
Business+
Document Management

With the the OCR Add-on in use, searching is greatly simplified. Since scanned documents are like photos to a computer, their contents can for instance not be copied or edited with a text editor normally. 
With use of OCR, the contents of a scanned document can be recognized (refer to 'Manual OCR' or 'Automatic OCR' for more information).
Scanned as well as imported MS Word, MS Excel, PDF, HTML and other documents will be made searchable by their contents.
With this module you can:

  • Make RetSoft automatically recognize every scanned page.
  • Select text within a document and convert it into editable text.
  • Find back documents by simply searching for a word that occurs in their contents.

Visit our (website|) for more information on this module and how to order it.

OCR Settings

Business
Business+
Document Management

From the settings window (select  in the main screen) you can change your OCR settings.
Select the tab 'Text recognition' in the screen that appears. The following settings will appear:


Recognition Languages
'Primary and Secondary language' - The languages used to recognize text in documents. You can submit a primary and secondary language. Several languages are available for this functionality.
After installing the languages Dutch and English will be provided, but more languages can be added. Contact RetSoft for this.

Options
'Apply text recognition (OCR) after a scan session' - Three choices are available for this option.

  • 'Always' - Text recognition will always be executed after a page has been scanned without prompting the user.
  • 'Never' - Text recognition will not be applied to the scanned pages
  • 'Prompt' - RetSoft Archive will ask the user after every scan wether text recognition should be applied to the document or not.

'Display result after OCR is processed' - If marked, the recognized text is displayed after OCR text recognition has completed. Note: this only works when a single document has been processed.
'Remember last used OCR template' - Remembers the last used OCR template after a scan session. Disabled by default.

Manual OCR

Business
Business+
Document Management

Manual OCR can be initiated in the following ways:

  1. Using the button 'Text recognition'  in the main window 
  2. Using the shortcut Ctrl+O. 
  3. By selecting 'Text recognition' from the 'Options' menu in the main window. 
  4. Through the different popup menus that appear when the right mouse button is pressed at one of the different locations in the main window. 

Text recognition can be applied to a single document, to multiple documents or to a selection made within a document. 
When text recognition has been applied to a document, the result will be attached to the respective document. 
Using text recognition allows the user to search documents by their contents. It is possible to save the results in a file and also to display the results after each text recognition process. 
The results will only be shown when the option 'Display result after OCR is processed' on the OCR settings window has been checked. 

OCR Templates are templates that define areas on a document where RetSoft Archive should apply the text recognition. 

When viewing the properties of a document, you can find if it has been processed under the tab 'Text recognition', this will tell you if it's searchable by contents.

Making the entire archive searchable.
To make an archive completely searchable by contents select the main (first) folder in the folder view. Then select 'Text recognition'
in the main screen.
Select OK to start the OCR process.
This may take some time depending on the amount of unprocessed documents.
Tip: Start this process when you are not going to be needing the computer it is running on for a while.

Automatic OCR

Business
Business+
Document Management

Automatic OCR means that text recognition is applied to the documents immediately after the scan session completes. The window which appears is slightly different from 'Manual OCR' (see image). 
A template can be selected or one can choose not to apply OCR.

If you choose to set your choice as default, it can be changed in the 'Settings' window at a later time. To do this, check the option 'Prompt' on the 'Text recognition' tab. 

Use recognized text as name

Business
Business+
Document Management

This option creates the possibility to use the recognized text from a RetSoft-document as the new name of a document. First you create a selection in the document. 
This is done by holding down the left mouse button while puling a rectangle on the document. 
After this, right click on the selection you have just made and choose 'Selection', followed by 'Use selection as document name'.
 
TIP: If you hold down 'Ctrl + Alt' while creating the selection, the 'Use recognized text as name' option is instantly activated.

OCR Templates

Business
Business+
Document Management

OCR Templates filter out sections of documents which do not have to be processed by the OCR engine and therefore speed up the text recognition process. 
The use of templates is especially useful when scanning a lot of documents with the same layout, or documents that only have a small portion of relevant text. 

More information about this extension of the OCR Add-on is available at our website and in the module's chapter.