OCR Temp Directory Is Filling Up

Problem

The OCR Temp folder is very large, with many documents that appear to be very similar or the same. This folder can swell up by many gigabytes overnight, causing server slowdowns or crashes.

Cause

A corrupt PDF is trying to be converted by your Archive PDF Conversion, is failing, and is caught in a loop. This PDF will try and convert ad-infinitum, causing your OCR Temp folder to explode in size. If left alone, this will eventually crash the drive.

Solution

The most important thing to do is find the document sending the convert message to the queue, and remove it from the file structure. This will stop the bad PDF from converting over and over. To do this, you'll have to look into the message queue.

The OCR Temp folder typically lives in the service account user's temp directory - usually in C:\Users\SSAdministrator\AppData\Local\Temp\Square9\OCR

Locating the Conversion Message for the stuck Document and Removing it from the File Structure.

Open Computer Management
Unfold Services and Applications > Message Queueing > Private Queues > ss_fulltext
left-click "ss_fulltext" to display all the messages for the PDF Conversion queue - you'll very likely see quite a few if the document is indeed stuck.
Open some of the messages and confirm they are all pointing to the same document or documents. Simply double click on the message and use the filepath in the "Body" tab to locate the document within the file structure.
1. In some instances, the file may be creating and deleting messages over and over again, rendering the message unable to be read. Enable "Journaling" in the Queue by right clicking on ssFulltext, hitting properties and enabling Journaling with the checkbox. This will allow you to find the messages in the Journal log so you can read through them after the fact. Make sure you limit the journal so this doesn't blow up over the coming months or weeks.
In the Body, you will see a filepath pointing to the document in our file structure. This is the destination of where this particular file lives. The Converter will try and take a copy of that file from that location and convert it. If it fails, it may keep trying to convert it indefinitely. To stop that process, we need to remove the file from the file structure, at least temporarily.
1. Navigate to the filepaths specified in the message and move the document from the file structure.
2. You're going to want to keep the document and not delete it, in case the client wants to remove the metadata from GS as well.
Once all the files that are corrupt have been removed, clear the message queue by right clicking on ss_fulltext > All Tasks > Purge
1. If you enabled Journaling,you can turn it off at this point as well.
Delete all files from OCR Temp folder
Monitor the OCR Temp folder for a time and ensure that the folder isn't enlarging rapidly (documents should spend very little time in this folder - the folder size under normal circumstances should not grow very large.)
If your OCR Temp folder continues to expand rapidly, there's a chance there are more corrupt PDFs to remove. Repeat the process until the OCR Temp folder stabilizes.

Problem

Cause

Locating the Conversion Message for the stuck Document and Removing it from the File Structure.

Related articles