Leveraging Data Types With Extraction Rules
When extracting data from document pages, it is common to use character fields regardless of the type of data being extracted. This approach can be the safest when the quality of the source documents is expected to be low. It also results in the least amount of potential for error, as data types are strictly enforced. This means, for example, a zero that extracts as the letter O in a date that is targeting a date field would not read as date fields to not accept character input. Of course, GlobalCapture has ways to navigate such issues, but there are also times when leveraging the data type makes a lot of sense.
Take the following image snippet as an example:
In this case, we know the source document is digitally generated, ensuring high readability and a high level of accuracy for extraction. We also know, from examining sample documents, that the Invoice Date can shift significantly left or right. This problem could be solved by using a “Marker” to look for the words Invoice Date, and using another zone to extract to the right of where ever the marker is found. We could also use a pattern matching zone, if licensed for Unstructured Extraction, to find the date using a regular expression.
Alternately, we can leverage data types to find the date on this region of the page. Create a new positional zone, and bind the zone to Date field. Either GlobalCapture or GlobalSearch fields will work, as long as the field is a Date. Next, draw the zone around the entire region where the value is expected to be found.
Be sure to click the Check on the Zone’s properties panel to apply the changes.
Notice that, despite numerous text elements, only the data is extracted from the entire drawn region. The data type of the selected field ensures that only date data may be extracted, and only one date is found in this large block.
This concept may be leveraged in other ways. You might use data types to extract all dollar amounts from an entire document, or to control what data is extracted from a repeating zone. It works with any zone type, and can also be leveraged with multi-value an table data sets.