Version 2.4.2
Product: Cognitive Toolkit
Version: 2.4.2
Date: April 7, 2021
OneDrive Support: The Cognitive Toolkit now includes a number of tools that support OneDrive. For each of these tools, see help for the named tool for available parameters and options:
CrawlOneDrive: Use this tool for crawling your OneDrive accounts. (Ref: CT-1429, CT-1835, CT-1896)
AddHashAndExtractedText: This tool has been updated to support OneDrive. Use for adding hash, extracting text (or both at the same time) from your crawled OneDrive documents. For performance reasons, we recommend the combined option for OneDrive, if possible. (Ref: CT-1430)
AddPathValidation: This tool has been updated to support OneDrive. (Ref: CT-1796)
Dispose: This tool has been updated to support disposing of documents in OneDrive. (Ref: CT-1431)
Migrate: Our new Migrate tool can be used to migrate files from OneDrive to Content Server. (Ref: CT-1432)
Migrate to SharePoint Online: The Cognitive Toolkit now ships with a tool that can be used to migrate files from your file system to SharePoint Online. This is part of our new, single Migrate tool. See –help for Migrate for available parameters and options. (Ref: CT-541)
Crawl Content Server via REST: The Cognitive Toolkit now includes a tool that can be used to crawl Content Server via the REST API. Use this method when you are for whatever reason not able to crawl Content Server via our direct database connector. See –help for CrawlContentServerRest for available parameters and options. (Ref: CT-1753)
Single Migrate Tool: As mentioned above, the Cognitive Toolkit now ships with a single Migrate tool that can be used to either migrate files from your file system to SharePoint Online, or from OneDrive to Content Server. Note that we are planning on moving all of our previous migrate tools (MigrateFileNetToContentServer, MigrateSharePointToContentServer, MigrateShortcutsToContentServer, MigrateToContentServer) to this single, combined tool as of our 2.4.4 release, currently expected in the summer. (Ref: CT-1702)
AddHashAndExtractedText – Ability to OCR via Iron or Azure: The AddHashAndExtractedText tool now has the ability to perform optical character recognition via Iron OCR or Azure OCR. Note the new –ocr-utility parameter, which can be set to “iron” or “azure”. If using Azure, you will also need to set some additional parameters. See –help for AddHashAndExtractedText for available parameters and options. (Ref: CT-1236, CT-1489, CT-1807)
SaveValue: The Cognitive Toolkit now ships with a tool that can be used to save values for the purpose of using these values via substitution in tools like MigrateToContentServer. See –help for SaveValue for available parameters and options. Note that with this release, the following tool is deprecated (as it is now replaced by this improved tool). Also note that for parameters previously saved via SaveParameter in your local saved-parameters.yaml file, these will continue to work with the new SaveValue tool: (Ref: CT-1625)
SaveParameter
New Visualizer Dashboards: There are now a set of dashboards for use specifically with Content Server, OneDrive and/or SharePoint. Simply look for the name “- Content Server, OneDrive, SharePoint” at the end of the dashboard name for these. (Ref: CT-1772, CT-1814)
CrawlSharePoint – Performance Improvements: The CrawlSharePoint tool has had a number of performance improvements implemented – number of threads was increased, Index ingestion was improved and additional informational logs were added on sites being crawled. The crawler can also now be restarted (where the crawler was is kept track of in a sharepoint.sqlite file that must be deleted if doing a fresh crawl). See –help for CrawlSharePoint for available parameters and options. (Ref: CT-1690, CT-1751, CT-1752)
CrawlBox – Converted to Source Settings: The CrawlBox tool has been converted over to use –source-settings. See –help for CrawlBox for available parameters and options. (Ref: CT-1646)
MigrateToContentServer – Don’t Migrate if Invalid TKL: MigrateToContentServer has been improved so that when migrating, when attempting to set an invalid TKL (which will fail) for a document, the document is NOT migrated. That this has occurred (and for which document) can be determined by the logs, so that the issue can be fixed and the migration tried again. (Ref: CT-1876)
Size Calculation Performance Improvement: We have found that the time required to calculate the size of the Index after a given tool runs to be significant, so we are providing an option to disable that calculation. Simply add the key <add key=”CalculateSize” value=”true” /> to the CognitiveToolkit.exe.config file to change this behaviour. (Ref: CT-1836)
Bug Fix: Fixed an issue where permission mapping for MigrateToContentServer was not working in some circumstances. (Ref: CT-1705)
Bug Fix: Fixed an issue where the –verify-hash option for the Dispose tool for Box was not working in some circumstances. (Ref: CT-1755)
Bug Fix: Fixed an issue with AddHashAndExtractedText when being used with FileNet – the nodeID field was not being set correctly. (Ref: CT-1851)
Outlook Add-in: Data Loss Prevention (preview): 1.0.0
Data Loss Prevention: The Outlook Add-In: Data Loss Prevention works in concert with how documents have been tagged in your Shinydocs Analytics Engine. Leveraging settings configured in Shinydrive Server, you can automatically either warn users before they share a document (via Outlook) that has been tagged as confidential or outright block them from sending such a document (via Outlook). Note that this is a “preview” release and is provided so that interested customers can take this feature for a test drive and provide us feedback on their experience. (Ref: SD-3753)