Version 2.3.8
Product: Cognitive Toolkit
Version: 2.3.8
Date: October 7, 2020
Move Within Content Server: The Cognitive Toolkit now ships with a tool that can be used to move content within Content Server, while also setting Content Server Category Attribute values, as found in the Index. See –help for MoveWithinContentServer for available parameters and options. (Ref: CT-689)
Remove Items from Index: The Cognitive Toolkit now ships with a tool that can be used to remove entire items from the Index. A good use case for this is after you have identified ROT and actually deleted files on your file system, then after moving those records from your main Index to another (“Disposed”, for example), you could then fully delete those records in the main Index, while keeping a record of them in the other Index for audit tracking. See –help for RemoveItems for available parameters and options. (Ref: CT-1276, CT-1304)
Create File System Shortcuts: The Cognitive Toolkit now ships with a tool that can be used to create shortcuts (“Shell links”) on the file system for any duplicate files found as a result of running the TagPrimaryDuplicate tool. While this tool will create a .lnk file that links to the original file, note that it can also actually delete the duplicate document, if desired. See –help for ReplaceWithShortcut for available parameters and options. (Ref: CT-1052)
New Visualizer Dashboards: There is a new “Disposition – Combined Index View” dashboard that can give you a combined view across multiple Indexes, intended to include an Index of Disposed files, whose metadata has been moved over from the main Index to the Disposed one. This dashboard is also good to use for determining how much data has been crawled in total and disposed of, since it includes both. There is also a new “Validated Files” dashboard that can be used to review the results of running the AddPathValidation tool. Follow the instructions here to upgrade your dashboards and visualizations. (Ref: CT-233, CT-1272, CT-1273)
Content Server Migration App – Query Support: The Content Server Migration App now supports an associated query, which can be used to limit what will be migrated to Content Server (for example, anything tagged with the Insight “do_not_migrate”). In your ContentServerMigration.exe.config file, specify the query that will be used in the “<add key=”Query” value=”” />” line. See (Ref: CT-1252)
Migrate To Content Server – Shortcut Support: The MigrateToContentServer tool now includes support for creating Content Server Shortcuts instead of migrating files. After running the TagPrimaryDuplicate tool, perform two Content Server migrations in sequence. The first migration would be for all documents that have no duplicates, or are Primary Duplicates. The second migration would be with the MigrateShortcutsToContentServer tool, which will only create shortcuts in Content Server, pointing to the original documents. See –help for MigrateShortcutsToContentServer for available parameters and options. (Ref: CT-535)
Consolidation of Add Hash: We have completed the consolidation of the following tool in favor of the combined AddHashAndExtractedText tool which can be used to calculate the hash of a file, or extract text from a file, or do both at the same time. The –action-keyword is used to determine which mode the tool is operating in and –source-settings is used to specify the connector file (leave blank for file system). See –help for available parameters and options. Note that with this release, the following tool is deprecated (as it is now replaced by this combined tool):
AddHash
(Ref: CT-1178)
PDF Page Count: The AddHashAndExtractedText tool now calculates and stores the total page count for PDF documents (in the Index field pageCount). No special instructions to trigger this other than just running this tool against a repository containing PDF documents. (Ref: CT-1283)
Consolidation of Dispose: We have consolidated the following tool into the combined Dispose tool which can be used to dispose of documents in the source repository. –source-settings is used to specify the connector file (leave blank for file system). Supported sources are currently File System and Content Server (new this release). See –help for available parameters and options. (Ref: CT-530, CT-1212, CT-1213)
Identify Duplicate Email Messages: You can now leverage the TagDuplicate and TagPrimaryDuplicate tools to find duplicate email messages. After extracting text, email messages will have the field internetMessageId defined that can be used just like you would use the field hash for determining file duplicates. See –help for TagDuplicate and TagPrimaryDuplicate for how to use these tools for this use case. (Ref: CT-1238)
Path Validation – Scroll Option: The Cognitive Toolkit AddPathValidation tool now supports the scroll method, which is useful for then running against large datasets. See –use-scroll for more information. (Ref: CT-1190)
Path Validation – SharePoint Subsite Support: The AddPathValidation tool was updated to support SharePoint subsites. (Ref: CT-1295)
Crawl Content Server – Modified Date Support: The CrawlContentServer tool now supports crawling by modified date. In help for CrawlContentServer, see the –modified-after option. (Ref: CT-1157, CT-1339)
Export From Index – Main Index Name Support: The ExportFromIndex tool now supports the ability to include the Index name where the Primary Duplicate record exists (which is generated after TagPrimaryDuplicate). This is particularly useful when dealing with a large number of different Indexes, where the Primary Duplicate is not in the Index you are exporting from. This tool now also supports splitting the exported data into multiple files, based on –max-file-size. See –help for ExportFromIndex for available parameters and options. (Ref: CT-1197)
Move Index – Suppress Prompt: The functionality of the MoveIndex tool was improved to allow the suppression of prompts (via the “–force” option) when the tool is run (so that it can be run unattended as a scheduled task). See –help for MoveIndex for available parameters and options. (Ref: CT-1279)
All Tools Default to Threads = 1: All of the Command Line Interface (CLI) tools now use a default of 1 thread. Some of these can be increased to higher values, but must be done so with knowledge of the performance implications for the repository in question. See –help for the –threads option for the selected tool in question. (Ref: CT-1266)
Improved Error Messages: Improved error messages were implemented across many of our tools. (Ref: CT-1065, CT-1066, CT-1208, CT-1219, CT-1313)
Bug Fix: Fixed an issue with the AddHashAndExtractedText tool, where when running with –action-keyword both (which is also the default), the hash was not calculated correctly. (Ref: CT-1270)
Bug Fix: Fixed an issue where the “new_name” field in the Index was not used when migrating to Content Server via the Content Server REST API. (Ref: CT-1089)