Common CLI Errors and Exceptions
Permission/Access Errors:
These errors should be resolved for completeness
[16:23:31 ERR] Unable to get children for node \\server\path\folder (folder). An 
unexpected network error occurred.
 An unexpected network error occurred.
System.IO.IOException: An unexpected network error occurred.[16:46:43 ERR] No authorization for \\server\path\folder (folder). Access is denied.File is inaccessible (no action to take):
[17:10:25 ERR] There was an error processing 7a1bf54fca11a09e4bec13d38345577b0a0db1f2. The process cannot access the file because it is being used by 
another process.File is in use by another process elsewhere.
[17:10:25 ERR] There was an error processing 7a1bf54fca11a09e4bec13d38345577b0a0db1f2. The process cannot access the file because it is encrypted.The file cannot be opened as it is encrypted
[17:10:25 ERR] There was an error processing 7a1bf54fca11a09e4bec13d38345577b0a0db1f2. The process cannot access the file because it is password protected.The file cannot be opened as it is password protected
File no longer exists (no action to take):
[16:43:06 ERR] Unable to get children for node \\server\path\folder (folder). The system cannot find the file specified.This indicates that the file no longer exists as it was crawled in the index.
Known issues (no action to take):
RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser
[00:02:47 ERR] An error occurred extracting text from 9e518fef4797b4130e72e8bfe4e108edcb3ad880. : Text extractor unable to get text.
Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@1a5ad35
Schemas (*.xsb) for CTTable can't be loaded - usually this happens when OSGI loading is used and the thread context classloader has no reference to the xmlbeans classes - use POIXMLTypeLoader.setClassLoader() to set the loader, e.g. with CTTable.class.getClassLoader() Text extractor unable to get text.
Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@1a5ad35
Schemas (*.xsb) for CTTable can't be loaded - usually this happens when OSGI loading is used and the thread context classloader has no reference to the xmlbeans classes - use POIXMLTypeLoader.setClassLoader() to set the loader, e.g. with CTTable.class.getClassLoader()
TextExtraction.TextExtractorException: Text extractor unable to get text.
Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@1a5ad35
Schemas (*.xsb) for CTTable can't be loaded - usually this happens when OSGI loading is used and the thread context classloader has no reference to the xmlbeans classes - use POIXMLTypeLoader.setClassLoader() to set the loader, e.g. with CTTable.class.getClassLoader()
   at TextExtraction.TextExtractor.GetText(Func`1 extractFunction) in D:\jenkins-agent\workspace\cognitive-toolkit_main\TextExtractor\TextExtractor.cs:line 87
   at Shinydocs.CognitiveToolkit.Tools.ScrollTools.HashAndText.AddHashAndExtractedTextHelper.GetText(String id, MemoryStream stream, Int32 maxCharacters, String filename)
   at Shinydocs.CognitiveToolkit.Tools.ScrollTools.HashAndText.AddHashAndExtractedTextHelper.<>c__DisplayClass6_0.<AddTextToDocument>b__0()
   at Shinydocs.CognitiveToolkit.Tools.ScrollTools.HashAndText.AddHashAndExtractedTextHelper.TimerAndExceptionWrapper[T](String actionName, Func`1 function, T fallback, String errorMessage)
This error is found when text extracting. This error is known and cannot be resolved currently. Shinydocs is looking into solutions.
Handle does not support synchronous operations
[03:51:54 ERR] There was an error calculating the hash for 2440757724a7b58d2e1b657b4890ed04a1f2e1c6. : Handle does not support synchronous operations. 
The parameters to the FileStream constructor may need to be changed to indicate that the handle was opened asynchronously (that is, it was opened 
explicitly for overlapped I/O). Handle does not support synchronous operations. The parameters to the FileStream constructor may need to be changed to 
indicate that the handle was opened asynchronously (that is, it was opened explicitly for overlapped I/O).
   at Shinydocs.CognitiveToolkit.Tools.ScrollTools.HashAndText.AddHashAndExtractedTextHelper.TimerAndExceptionWrapper[T](String actionName, Func`1 
function, T fallback, String errorMessage)Currently being investigated by Shinydocs. Initial investigation reveals that this occurs on some extensionless file that are not hashable or text extractable. Possibly due to the state the file is in.
Others
No search context found for id [######]
[22:51:04 FTL] Tool addhashandextractedtext Exited: One or more errors occurred.
System.AggregateException: One or more errors occurred. ---> Elasticsearch.Net.ElasticsearchClientException: The remote server returned an error: (404) 
Not Found.. Call: Status code 404 from: POST /_search/scroll. ServerError: Type: search_phase_execution_exception Reason: "all shards failed" CausedBy: 
"Type: search_context_missing_exception Reason: "No search context found for id [167235]"" ---> System.Net.WebException: The remote server returned an 
error: (404) Not Found.
   at Shinydocs.CognitiveToolkit.Repositories.Elasticsearch.DocumentRepository.FindByQueryDynamicUsingScroll(String query, String includes, Int32 
nodesPerRequest, TimeSpan scrollTimeout, String scrollId, String schemaTypeFilter, String sortField, Boolean sortAscending)
   at Shinydocs.CognitiveToolkit.QueryRunners.ScrollRepositorySearch.GetResults(Int32 nodesPerRequest)
   at Shinydocs.CognitiveToolkit.DocumentProcessors.MultiThreadedDocumentUpdater.ProcessSearchResults(Int32 nodesPerRequest)
---> (Inner Exception #0) Elasticsearch.Net.ElasticsearchClientException: The remote server returned an error: (404) Not Found.. Call: Status code 404 
from: POST /_search/scroll. ServerError: Type: search_phase_execution_exception Reason: "all shards failed" CausedBy: "Type: 
search_context_missing_exception Reason: "No search context found for id [167235]"" ---> System.Net.WebException: The remote server returned an error: 
(404) Not Found.This occurs when text extraction or hashing is taking too long to build the next payload to the index. When the Cognitive Toolkit gets the files from the index to get text for, there is a set time window that the context stays alive. If it takes longer than the timeout (default 1 hour) to post anything to the index (because the text extraction is slow), it will fail. The data that was text extracted is not lost, this error indicates the tool was not able to continue past that point as the queue of files expired.
To remedy:
- Try text extracting again 
- Try lowering the --nodes-per-request for text extraction - Less nodes per request = more frequent communication to the index 
- Slightly less performant as there is more overhead in calls to the index 
 
