Cognitive Suite Crawling and Race Conditions
Problem
You are using multiple Cognitive Suite tools to crawl the same location(s) during the same scheduled interval(s) and see many entries in the logs that were skipped due to file locks or permission errors. The tools may be running from scripts or scheduled tasks.
Cause
A race condition occurs when two or more threads access shared data at the same time. For example; Script A crawls the destination source for new files runs at the same time Script B is extracting OCR. If a file is being processed for OCR, it can’t be crawled until the process is completed.
Solution
Ensure all scripts and scheduled tasks are executed in sequential order. In the above example, wait for crawling to complete, then move to OCR.