Product: Cognitive Toolkit
Date: June 8, 2020
Open Clustering: The Cognitive Toolkit now ships with Open Clustering. To leverage this powerful ability, load the new visualizations/dashboards included with this release and use the “Open Clustering” dashboard. Note that you will need to update the visualization used if your Index name is something other than “shiny” (and just update this reference to the name of your Index). The dashboard works best when combined with a path-based filter (as the tool shows words used more in the selected path vs. what is in the rest of the Index). (Ref: CT-918, CT-105, CT-1002)
Duplicate Tagging Improvements: The Cognitive Toolkit now ships with 2 new tools for tagging duplicates. For each of these tools, see –help for available parameters and options. (Ref: CT-360)
TagDuplicate: This tool is an updated version of the TagDuplicateField tool (which is now deprecated with this release). This tool checks if the value in a field is duplicated in the Shinydocs Index, limited by the corresponding query with which this tool is run. This is normally used on the “hash” field so that you can subsequently filter the index to show only duplicated files. Note that if you previously did use the TagDuplicateField tool on the Index, when you use the new TagDuplicate tool, be sure to enter a different field name via the –duplicate-field-name option.
TagPrimaryDuplicate: This tool is used to identify the “primary” or main file from a list of duplicates. The one with the oldest creation date, for example.
Support for Commands on Multiple indexes: The Cognitive Toolkit now supports the use of the wildcard “*” for running a single command across multiple indexes (so against shiny* for example, which would cover both shiny1 and shiny2). (Ref: CT-951)
Add Hash and Extracted Text from SharePoint: The Cognitive Toolkit now ships with a tool for calculating the hash and also extracting full text from SharePoint. See –help for AddHashAndExtractedTextFromSharePoint for available parameters and options. (Ref: CT-370)
Add Hash and Extracted Text from FileNet: The Cognitive Toolkit now ships with a tool for calculating the hash and also extracting full text from FileNet. See –help for AddHashAndExtractedTextFromFileNet for available parameters and options. (Ref: CT-373)
Add Extracted Text from Box: The Cognitive Toolkit now ships with a tool for extracting full text from Box. See –help for AddExtractedTextFromBox for available parameters and options. (Ref: CT-662)
Crawl FileNet: The Cognitive Toolkit now ships with a tool for crawling FileNet. Supports the crawling of directories, documents and document properties. See –help for CrawlFileNet for available parameters and options. (Ref: CT-372)
Migrate FileNet to Content Server: The Cognitive Toolkit now ships with a tool for migrating from FileNet to Content Server. See –help for MigrateFileNetToContentServer for available parameters and options (Ref: CT-374).
OCR Support: The Cognitive Toolkit now supports the OCR (Optical Character Recognition). See –help for the –useIronOcr option under AddExtractedText for options. Note that in order to suppress system messages coming from OCR, add ” 2> nul” at the end of the command line that calls the Cognitive Toolkit. (Ref: CT-438, CT-974, CT-1025)
Support for Multiple Classifications: Multiple classifications are now supported for migrations into Content Server. (Ref: CT-968)
Relative Date Support for Crawl Filters: Crawl filters now support relative dates, so dates of the form “now-1d” are valid. (Ref: CT-1011)
Export From Index Now Includes Index Name: The ExportFromIndex command now includes the name of the Index that it was exported from. (Ref: CT-1016)
Bug Fix: Fixed an issue where the fullText field could be created, even though no full text was extracted as the result of a AddHashAndExtractedTextFromContentServer command. (Ref: CT-540)
Bug Fix: Fixed an issue with crawling Box when starting at an invalid start node. (Ref: CT-881)
Bug Fix: Fixed an issue where 0 byte and very large files were not getting a hash value with the AddHashFromBox tool. (Ref: CT-956)
Bug Fix: Fixed an issue where RestoreCachedFilePermissions was not restoring original permissions on files in some circumstances. (Ref: CT-967, CT-1007)
Bug Fix: Fixed an issue where when extracting email addresses, the fully qualified email form (name@domain) was not being used in some circumstances. (Ref: CT-1008)