While the GlobalCapture Optical Character Recognition (OCR) Engine gets excellent results, when possible, there are additional steps that may improves results even more.
Test Templates Against Different Images
For best results, work with a large sampling of documents; the larger and more diverse the better. This allows the Template to be tested against a variety of factors that may impact Zone accuracy on some documents and not others.
Plan Your Zones and Workflows For OCR
The default OCR settings in GlobalCapture work well in many cases, but due to the highly variable nature of documents, you should plan to test the results of your OCR settings.
- Use a variety of sample documents which are typical for your Templates, the more sample documents the better.
- Plan for documents which the OCR engine does not process 100%, such as documents which are stained, wrinkled, or low-contrast.
- Remember that end users may adjust their OCR settings in KeyFree as well. Test your Workflow using the same settings as those used in your production environment or the results may differ.
Plan Your Documents For OCR
Just as the Template Zones should be optimized for OCR, it helps if the documents themselves have been optimized for the best OCR results. (In many cases this also makes them easier for people to read as well.)
- Use a consistent dpi resolution when capturing documents. A change of dpi will change the results. Use a high enough setting to capture sufficient details. (300 dpi is generally a good setting.)
- Use clear, simple fonts for your documents and avoid smaller font sizes (less than 10 point).
- Make sure that the characters and words have proper spacing (do not use “compressed” fonts).
- Allow plenty of space between text and borders or line separators on the page.
- Include a company name in plain text, not just in the company logo image.
- For the most reliable automated data extraction, use barcodes rather than plain text.
- Create sufficient contrast between text and background and remove color whenever possible, particularly behind the text.