Version 2.3.10

Product: Cognitive Toolkit

Version: 2.3.10

Date: January 6, 2021

Find Similar Classification: The Cognitive Toolkit now ships with a powerful tool that can be used assign classifications (Insights) to documents based on their similarity to already classified documents, using artificial intelligence. See –help for FindSimilarClassification for available parameters and options. (Ref: CT-548, CT-1337)
Entity Extraction: The Cognitive Toolkit now ships with a powerful tool that can be used to extract “entities” from your documents, based on common word patterns. Entities like person, location, organization, and also money, date, percent as well can be extracted. Note that you have to download the stanford-named-entity-recognizer-2020-12-16.zip file, and then run the enclosed stanford-named-entity-recognizer-service.bat file which will run the service, which the ExtractEntities tool uses to extract these entities. See –help for ExtractEntities for available parameters and options. (Ref: CT-1287)
Extracted Entities Dashboard: The Extracted Entities dashboard has been updated to show the new fields created by the ExtractEntities tool. (Ref: CT-1669)
Improved Password Treatment + Encryption Support for Content Server Migrations: The various Content Server migration tools (MigrateToContentServer, MigrateFileNetToContentServer, MigrateShortcutsToContentServer, MoveWithinContentServer) had existing plain text options removed in favor of using a source file. Sample source files can be found in the External Resources\Sample Source Settings folder. username and password values can be saved with the SaveParameter tool. (Ref: CT-1360)
Standardized Add Hash and Extracted Text Tool: We have completed standardizing on the AddHashAndExtractedText tool, which can be used to calculate the hash of a file, or extract text from a file, or do both at the same time. See –help for AddHashAndExtractedText for available parameters and options. Note that with this release, the following tool is depreciated (as it is now replaced by this combined tool):
- AddExtractedText

(Ref: CT-1086)

Standardized Dispose Tool: We have completed standardizing on the Dispose tool, which can be used to dispose of documents in the source repository. See –help for Dispose for available parameters and options. Note that with this release, the following tool is deprecated (as it is not replaced by this combined tool):
- DisposeInBox

(Ref: CT-1212, CT-1215)

Content Server Migration App – Schedule for Later: Our new Content Server Migration app now has an option to schedule the migration for later. This is useful for when users might want to do migrations into Content Server during off-peak hours, for example. (Ref: CT-1146)
Content Server Migration App – Configuration to Skip Crawling: For the use-case where your users are migrating data that is already crawled, the ability to disable crawling is now configurable for the Content Server Migration app. In your ContentServerMigration.exe.config file, specify via the <add key=”Crawl” value=”true” /> and <add key=”Migrate” value=”true” /> how you want the app to behave. (Ref: CT-1610)
Crawl FileNet – Ability to Handle Huge Folders: Our CrawlFileNet tool now has improvements to handle folders that contain huge numbers of files (e.g.: 5,000 or more). (Ref: CT-1467)
Crawl SharePoint- Ability to Handle Huge Folders: Our CrawlSharePoint tool now has improvements to handle folders that contain huge numbers of files (e.g.: 5,000 or more). (Ref: CT-1501)
Ability to Crawl Content Server Subtypes (including Shortcuts): Our CrawlContentServer tool now has the ability to crawl Content Server subtypes. See –help for CrawlContentServer, specifically for the –subtypes parameter. Common subtypes to crawl are 144 and 749 (documents and emails). Other subtypes include 0 (folders) and 1 (shortcuts). (Ref: CT-652)
Relative Date Support for CrawlExchange: The CrawlExchange tool now has support for relative dates, so dates of the form “now-1d” are valid. See –help for CrawlExchange for available parameters and options. (Ref: CT-1053)
Confirm Number of Items to Dispose: The Dispose tool now informs you of the number of files that will be deleted before the dispose runs, as a verification step for this powerful tool (i.e. the administrator can confirm that the number is what was expected). (Ref: CT-1274)
Site-Collection Support for Crawl SharePoint: The CrawlSharePoint tool now supports the ability to crawl from the site-collection level. See –help for CrawlSharePoint for available parameters and options. (Ref: CT-1544)
Export from Index Improved Reporting: The ExportFromIndex tool now has the option to report on the Index where the primary duplicate hash is located. See –help for ExportFromIndex, specifically for the –calculate-master parameter. (Ref: CT-1551)
Content Server Metadata Crawl (PostgreSQL): The Cognitive Toolkit CrawlContentServer tool now includes support for crawling Content Server, backed by a PostgreSQL database. See –help for CrawlContentServer (for the “–database-type” parameter) for available options. (Ref: CT-644)
Tag Primary Duplicate Only Tags 1 Primary: The TagDuplicate tool has been improved when run with the “–tag-primary” option to now only tag a single primary item. (Ref: CT-1428)
Improved Error Messages: We’ve improved the error messages presented to the user in a number of scenarios, such as fatal errors, incorrect identity error, owner validation errors, path not found and so on. (Ref: CT-962, CT-1149, CT-1182, CT-1299, CT-1327, CT-1497, CT-1574)
Bug Fix: Fixed an issue where Table Key Lookup (TKL) validation was failing with the MigrateToContentServer tool. (Ref: CT-1233, CT-1586)
Bug Fix: Fixed an issue where the AddClassifications tool was not respecting the –class-field value in certain circumstances. (Ref: CT-1454)
Bug Fix: Fixed an issue where the ExportFromIndex tool was not properly handling periods. (Ref: CT-1462)
Bug Fix: Fixed an issue where CrawlFileSystem was not supporting very long file paths (>260 characters). (Ref: CT-1546)
Bug Fix: Fixed an issue where CacheFileSystemPermissions (and SetFileSystemPermissions) was not working correctly for very long file paths (>260 characters). (Ref: CT-1626)
Bug Fix: Fixed an issue where custom columns or properties from Content Server, FileNet or SharePoint were causing an issue with the Index. To resolve this, fields that start with “cc-” or “prop-” will always be treated as a text field. (Ref: CT-1666)