Cognitive Toolkit February 2022/2.6.0.2 Release Notes
Details
Product: Cognitive Toolkit
Version: 2.6.0.2
Date: February 3, 2022
New Software Prerequisites
Shinydocs Extraction Service
- Requires Java/OpenJDK 11 
Shinydocs Cognitive Toolkit
- .Net Framework has been deprecated 
- .Net Core 5.0 is now included in the Cognitive Toolkit and does not need to be installed separately 
Features
- Architecture/Performance Improvements: Not something that you can see directly, but we made a number of internal architecture changes within the Cognitive Toolkit: Improved methods, modular connectors, library segregation, improvements to support automation testing & quality assurance. (Ref: CT-1961, CT-2090) 
- Improved Crawl File System: Our CrawlFileSystem tool was overhauled to take advantage of multi-threading and the ability to specify multiple paths to crawl (previously this would be typically done via multiple instances of this tool). See --help for CrawlFileSystem for available parameters and options. (Ref: CT-2124) 
- Improved Crawl Box: Our CrawlBox tool was improved by the implementation of user filters, the ability to read in a list of users from the API, and the options --box-config was removed (now done via --source-settings) as was --folders (no longer necessary with new architecture). See --help for CrawlBox for available parameters and options. (Ref: CT-2228) 
- Text Extraction Service: As of this release, Text Extraction has been added to our Entity Extraction service (now renamed to be just “Shinydocs Extraction Service”). Our AddHashAndExtractedText and ExtractEntities tools have been updated to reference this new service. See --help for AddHashAndExtractedText and/or ExtractEntities for available parameters and options. To install the service, see our Shinydocs Extraction Service January 2022/2.0.0 (Ref: CT-2151, CT-2263) 
- Add Property Data: The Cognitive Toolkit now includes a tool that can be used to add property data to your Analytics Engine, pulled from an ECM via a direct database connector if applicable, or via the REST API. Currently, Content Server Categories & Attributes are supported. See --help for AddPropertyData for available parameters and options. Note that with this release, the following tools are deprecated (as they are now replaced by this tool). (Ref: CT-2212) - AddCategoryData 
- AddCategoryDataRest 
 
- Update Properties: The Cognitive Toolkit now includes a tool that can be used to update property data on your ECM via the REST API. Currently, Content Server Category Attributes can be set, as can Content Server Classification values, as can Content Server RM Classification values. Files can also be renamed or moved in the ECM. See --help for UpdateProperties for available parameters and options. Note that with this release, the following tool is deprecated (as it is now replaced by this tool). (Ref: CT-2211) - UpdateContentServerMetadata 
 
- Migrate FileNet to Content Server via REST: The functionality of our universal Migrate tool has been enhanced to support migration from FileNet to Content Server via REST. See –help for Migrate for available parameters and options. Note that with this release, the following tool is deprecated (as it is now replaced by this tool). (Ref: CT-1965) - MigrateFileNetToContentServer 
 
- Scroll Framework: Previously, some of the Cognitive Toolkit tools had been converted to the “scroll” framework so that tools would run faster when acting on large amounts of data. This change was to complete this work for these tools so they would also run faster: AddPathValidation, CrawlExchange, CrawlFileNet, CrawlSharePoint, CrawlSharePointSites, Dispose, ExportFromIndex, ExtractEntities. Note that no change is required to take advantage of this improvement. (Ref: CT-2188, CT-2189, CT-2191, CT-2192, CT-2240, CT-2241 CT-2242, CT-2243) 
- Audit Logging: With this release, we have provided a comprehensive job logging Index (called shinydocs-jobs), to which each tool logs information such as tool name, options, start time, end time, quantity of files, files per second and user-name. In the event a tool encounters errors for some reason, that is also logged. Enable via the CognitiveToolkit.exe.config file (<add key="JobIndexUrl" value="http://localhost:9200/" />), then after running the Cognitive Toolit, add the Index Pattern shinydocs-jobs and use a Time Filter (such as startTime). View via Shinydocs Visualizer: Discover / shinydocs-jobs to review a log of the jobs that were run (note your Time Range interval). (Ref: CT-2232) 
- Removed Tools (Absorbed into Existing Tools): With this release, the following tools have been removed, since the functionality has been “absorbed” into another existing tool. Each of these is listed below (see --help for the new tool for available parameters and options): - AddOwner: Functionality included within CrawlFileSystem. (Ref: 2187) 
- CrawlContentServerRest: Functionality included within CrawlContentServer. (Ref: CT-2125) 
- AddHashAndExtractedTextFromExchange: Now included within AddHashAndExtractedText. (Ref: CT-2193) 
- AddBreadCrumbs: Now included within CrawlContentServer. (Ref: CT-2271) 
 
- Removed Tools: With this release, the following tools have been removed: - AddInsightByMapping. (Ref: CT-2244) 
- IndexCompare. (Ref: CT-2190) 
- MigrateShortcutsToContentServer. (Ref: CT-2276) 
 
- Removed Options: With this release, the following tool options have been removed. Make sure that any automations you have are no longer using these options: - AddPathValidation: The --field option is no longer available (we rely on the field “path-valid” for downstream processing). (Ref: CT-2198) 
- CrawlFileSystem: The --index-type option is no longer available (we rely on this being set to “shinydocs” for downstream processing). (Ref: CT-2202) 
- CrawlFileSystem: The --crawl-pst option is no longer available (for extracting PST files, use the ExtractAndCrawlPst tool instead). (Ref: CT-2206) 
 
- Removed Fields: With this release the following tools are no longer creating fields that you may wish to be aware of: - CrawlContentServer: The domain field is no longer created by this tool. When crawling a Content Server database, the fileType field is no longer created by this tool. Note that the field extension is still created, which contains the file extension. (Ref: CT-2215, CT-2217) 
- CrawlFileSystem: The domain field is no longer created by this tool. (Ref: CT-2217) 
 
- Updated Visualizations: Various improvements to existing dashboards and visualizations were made. (Ref: CT-2323) - Last Accessed Date visualization was removed from all dashboards (last accessed date is so easily updated, it could lead to misunderstanding your data). 
- Classifications dashboard: Number of Files by Classification visualization was updated to use the “classification” field. 
- Disposed Files dashboard: Number of Files by Disposition Success visualization was updated to use the “dispose” field. 
- Duplicated Files dashboards: Number of Duplicated Files updated to now show “Primary Duplicates”. 
- Progress dashboard: Updated Number of Files Migrated to Content Server to use the “prop-content-it.keyword” field 
- ROT dashboards: Updated Records - ROT - Unknown updated to change “Records” to match the “classification” field. 
 
Fixes
- Bug Fix: Fixed an issue with the Migrate tool, where when migrating to Content Server, characters such as blanks (e.g.: ” “) in the name were causing the file name in Content Server to be set incorrectly. Note that this issue did not exist with the MigrateToContentServer tool. (Ref: CT-2387) 
Known Issues
The following items are known issues and are flagged for resolution in a later release:
- Support for SharePoint on prem is not currently supported by version 2.6.0.1 of the Cognitive Toolkit - we are working on a hotfix that will include a fix for all tools for SharePoint on prem, except Migrate (which is expected to be fixed in version 2.7.0). For this reason, if you require support for SharePoint on prem, we recommend staying with version 2.5.1.5 of the Cognitive Toolkit until release 2.6.0.1 is ready. 
