Skip to main content
Skip table of contents

Data Cleanup

TransformAI retrieves data in character form, regardless of the prescribed field’s data type. This allows for a higher degree of extraction accuracy, but results in a requirement to sanitize that data once extracted.

Customer needs may shift, and Customer’s with invoices outside the US might need specific data cleansing options for date and amount fields. The TAI Retrieve node includes options for automating the clean up process for common scenarios. These options are convenience features only. All replacement functionality can be implemented or augmented using the Set Process FIeld node. Some customers may find for their use case, replacements can be more or less aggressive, or better tuned to regional/locale based input.

Remove Whitespace

Exclusively for character based fields, enabling Remove Whitespace Characters performs the follow character replacements:

  • Remove any LINE FEED characters and replace with a single space

  • Remove any occurrences of 2 or more SPACES in a row and replace with a single space

  • Remove any TAB characters and replace with a single space

  • Remove any occurrences of a CARRIAGE RETURN and replace with a single space

  • Remove any occurrences of a CARRIAGE RETURN + LINE FEED (CRLF) and replace with a single space

Sanitize Date Fields

For date fields only, enabling this option formats the output as follows:

  • Remove all occurrences of COMMAS

  • Replace with a single space the following:

    • SPACE characters of two or more

    • NEW LINE

    • CARRIAGE RETURN

    • CARRIAGE RETURN + LINE FEED (CRLF)

    • TAB

  • Remove all occurrences of the strings st, rd, th, and nd

    • These characters may show up in dates written like: January 25th, 2023

  • Replace the letter O (Capital or Lowercase) with zero whenever the letter is directly adjacent to another numeric digit.

    • You can not safely be more aggressive with character replacements like this without running the risk of impacting the format of some common dates. GlobalCapture will allow you to perform more use case specific replacements in the workflow.

Sanitize Amount Fields

For amount type fields only, enabling this option formats the output as follows:

  • Remove all SPACE characters

  • Remove all occurrences of the character combination USD

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.