Limits Zone Properties
Use the Limits settings to specify parameters for data to extract. If a Zone on the current page does not meet the parameters (such as not enough digits for a Social Security Number), then the Template will search any subsequent specified pages for data that does. Data in the Search Region must contain at least the minimum specified elements or it will be not be validated. Results will truncated for any characters, words, or lines past the maximum specified.
Note that the Min settings for the Characters limit is enforced per line read within the entire Zone, while for the Min settings for the Words limit is enforced for the entire Zone. Also note that the Lines settings value must be either zero (meaning no limit) or a number greater than one, in order to configure Word Spacing in Marker and Pattern Match Zones.
Elements include:
- Characters – In the Characters subgroup, enter a number for the minimum and/or the maximum number of characters required for data extracted from the Zone.
- Words – In the Words subgroup, enter a number for the minimum and/or the maximum number of words required.
- Lines – In the Lines subgroup, enter a number for the minimum and/or the maximum number of lines of text required and for variable-height line extraction.
- Word Spacing – Sometimes “loose” regular expressions can return false-positive extraction results when used across documents with variable text-value lengths that include gaps between consecutive words or lines of text. You can adjust those results using the Word Spacing settings.
When you set your Zone to extract from different blocks of text, you can control what should be considered for the Zone and what should not, using the Word Spacing setting. This appears when the Lines Max setting is zero or greater than one. You can configure your multi-line pattern matching to specify the vertical and horizontal distances allowed between lines in one long, searchable string. When you specify the size of gaps, you can set two paragraphs to be extracted together, for example, or words in a full justified paragraph, where the space between words may be larger than normal.
Setting either Vertical or Horizontal spacing to zero will bridge gaps of any size. Both settings at zero will combine all available words into a single searchable string. You can use the Measure feature to help determine the gap settings. To measure the distance, click the Measure () icon in the Template Designer toolbar and then drag your mouse pointer on the Design Canvas from one point to another to create a line. The Measurement dialog will appear to display the line’s X and Y coordinates.
To specify limits to the spaces between consecutive words and lines, in the Word Spacing subgroup that appears, select Vertical ro specify the maximum number of pixels high and Horizontal to specify the maximum number of pixels wide the space between valid words or lines to extract.
Control Extraction by Controlling Gaps
The OCR engine treats a group of words as a “line,” although the words may not necessarily all be on the same horizontal plane, as one might think of a line of text. To the OCR engine, both examples shown are two lines.
Use the Word Spacing settings to control your extraction results, based on the empty spaces between words. In this example, Horizontal Word Spacing has been set to 200 pixels. There is a 150-pixel wide gap between the right edge of the first line and the left edge of the second line. Since this falls within the specified spacing distance, the pattern match for the Zone is successful.
If the document has a larger 500-pixels gap between lines, the pattern match is not found.
Use Field Limits and Zone Limits Together
If you set an Index Field for an invoice description using the default maximum of 50 characters and then set a Search Region of 75 characters for a Zone, only the first 50 characters will be extracted. (This is true for all the repeating Zones that follow the first one as well.) So, either set your Index Field to a maximum number of characters high enough to encompass most scenarios or use it knowingly to eliminate some invoice descriptions that you do not want to capture.