TransformAI retrieves data in character form, regardless of the prescribed field’s data type. This allows for a higher degree of extraction accuracy, but results in a requirement to sanitize that data once extracted.
Customer needs may shift, and Customer’s with invoices outside the US might need specific data cleansing options, specifically for date and amount fields. The TAI Retrieve node includes options for automating the clean up process for common scenarios. These options are convenience features only. All replacement functionality can be implemented or augmented through replacements using the Set Process FIeld node. Some customers may find for their use case, replacements can be more or less aggressive, or better tuned to regional/locale based input.
Exclusively for character based fields, enabling Remove Whitespace Characters performs the follow character replacements:
Remove any LINE FEED characters and replace with a single space
Remove any occurrences of 2 or more SPACES in a row and replace with a single space
Remove any TAB characters and replace with a single space
Remove any occurrences of a CARRIAGE RETURN and replace with a single space
Remove any occurrences of a CARRIAGE RETURN + LINE FEED (CRLF) and replace with a single space
Sanitize Date Fields
For date fields only, enabling this option formats the output as follows:
Remove all occurrences of COMMAS
Replace with a single space the following:
SPACE characters of two or more
CARRIAGE RETURN + LINE FEED (CRLF)
Remove all occurrences of the strings st, rd, th, and nd
These characters may show up in dates written like: January 25th, 2023
Replace the letter O (Capital or Lowercase) with zero whenever the letter is directly adjacent to another numeric digit.
You can not safely be more aggressive with character replacements like this without running the risk of impacting the format of some common dates. GlobalCapture will allow you to perform more use case specific replacements in the workflow.
Sanitize Amount Fields
For amount type fields only, enabling this option formats the output as follows:
Remove all SPACE characters
Remove all occurrences of the character combination USD