Version 2.5.1

Product: Cognitive Toolkit

Version: 2.5.1

Date: July 26, 2021

Crawl Content Server via REST – Subtype Support : Our CrawlContentServerRest tool has been improved to include support for subtypes. Note the addition of the –allowed-types option which can be used to control the type ids you wish to crawl. See –help for CrawlContentServerRest for available parameters and options. (Ref: CT-1942)
Migrate to Content Server via REST: The functionality of the our universal Migrate tool has been expanded to support migration from your file system to Content Server via REST. See –help for Migrate for available parameters and options. (Ref: CT-1963)
Add Category Data via REST: The Cognitive Toolkit now includes a tool that can be used to add Category Data via the REST API. Use this tool when you are for whatever reason not able to use our standard AddCategoryData tool, which leverages a direct database connector. See –help for AddCategoryDataRest for available parameters and options. (Ref: CT-1945)
RunScript – Enrich Document Data: The Cognitive Toolkit RunScript tool now includes support for the BulkDocumentEnricher script, with which you can use a .CSV file to enrich an Index. The performance of this tool has also been optimized to support very large .CSV files. Please work with Shinydocs Implementation Services, who will work with you as required in the configuration & setup of this use case. (CT-1997)
Documentum Support: The Cognitive Toolkit now includes a number of tools that support Documentum. For each of these tools, see help for the named tool for available parameters and options:
- CrawlDocumentum: Use this tool for crawling your Documentum instance. (Ref: CT-695, CT-1959)
- AddHashAndExtractedText: This tool has been updated to support Documentum. Use for adding hash, extracting text (or both at the same time) from your crawled Documentum instance. For performance reasons, we recommend the combined option for Documentum, if possible. (Ref: CT-698)
- AddPathValidation: This tool has been updated to support Documentum. (Ref: CT-1867)
- Dispose: This tool has been updated to support disposing of documents in Documentum. (Ref: CT-1868)
Add Extracted Text From Engineering Drawings: We are pleased to announce that via an integration with Bentley MicroStation (64 bit) we are able to extract text and title block information from native (digital) engineering drawings (.dwg and/or .dgn file extension). See our Help Desk for installation & configuration instructions. See –help for AddExtractedTextFromEngineeringDrawings for available parameters and options. (Ref: CT-1925)
Crawl File System – Dry Run Option: The Cognitive Toolkit CrawlFileSystem tool now includes the option for a –dry-run, which can be used to execute the crawl, but not update the Index. See –help for CrawlFileSystem for available parameters and options. (Ref: CT-2080)
Auto-Upgrade Content Server Category Version on Migrate: When migrating to Content Server, you now have the option to automatically upgrade the inherited Category version to the latest, if this is the desired result. If desired, this can also be set to fail if you do not want to automatically upgrade inherited Category versions. If desired, specify –auto-upgrade-category true in your command line (default is false). (Ref: CT-1268, CT-1803)
New Visualizer Dashboards: There are a number of changes/additions to our dashboards, as outlined below:
- Engineering Drawings: Use this dashboard to view engineering drawings that you have text extracted with the “AddExtractedTextFromEngineeringDrawings” tool. (Ref: CT-1937)
- Migrated Files – File System to Content Server: Use this dashboard after running the “Migrate” tool (to Content Server) to view a record of the migrated files. (Ref: CT-2053)
- Migrated Files – OneDrive/SharePoint to Content Server: Use this dashboard after running the “Migrate” tool (from OneDrive or SharePoint to Content Server) to view a record of the migrated files. (Ref: CT-2130)
Crawl OneDrive Improvements: We have implemented a number of improvements for the Cognitive Toolkit CrawlOneDrive tool:
- Mixed Case Email Address Support: Our CrawlOneDrive tool now supports email addresses with both upper and lower case characters. (Ref: CT-2068)
- Crawl Individual Accounts: Our CrawlOneDrive tool now supports the ability to crawl one individual account. See –help for CrawlOneDrive, specifically the –specific-accounts parameter. (Ref: CT-1862)
- Progress Bar: Our CrawlOneDrive tool now has a handy progress bar which updates as the crawl is progressing. (Ref: CT-2101)
Find Similar Classification: The Cognitive Toolkit FindSimilarClassification has been improved to use the size of the fullText field instead of the actual size of documents, for more accurate matches. (Ref: CT-1909)
AddClassifications – Support for Separate Multiple Classifications: The AddClassifications tool now supports the creation of separate, multiple classifications, which can now be created via the –source-settings option. (Ref: CT-1634)
Dispose – Add FileNet Support: The Dispose tool has been updated to support FileNet. (Ref: CT-1216)
CrawlExchange – Improved Control Over Object Types: Via the –item-types parameter, you can now control if you want to crawl “emails”, “appointments”, “contacts”, “tasks” or “all”. emails is the default. See –help for CrawlExchange for available parameters and options.. (Ref: CT-1442)
AddBreadCrumbs – Various Improvements: Various improvements were made to the Content Server AddBreadCrumbs tool, including recognizing an incorrect start node ID, incorrect database type, incorrect source settings file. (Ref: CT-1509, CT-1511, CT-1520)
CrawlSharePoint – Option to Remove Cache (sqlite database): The CrawlSharePoint tool with –crawl-site-collection uses a sqlite database, which can now be automatically deleted (if desired). See –help for CrawlSharePoint, specifically the –remove-cache parameter. (Ref: CT-1788)
CrawlSharePoint – Don’t Retry Forever: Our Cognitive Toolkit CrawlSharePoint tool was modified so that it will only retry the number of times specified in the CognitiveToolkit.exe.config file (in the event there is a network error, for example). (Ref: CT-1944)
AddHashAndExtractedTextFromExchange – Action Keyword: The AddHashAndExtractedTextFromExchange tool now has the –action-keyword option that allows you to choose between adding hash, extracting text, or performing both. See –help for AddHashAndExtractedTextFromExchange for available parameters and options. (Ref: CT-1798)
TagDuplicate – Unique Tagging: The TagDuplicate tool now has the added option “–tag-unique” which can be used to explicitly tag unique documents. Note that this is calculated at the time the TagDuplicate tool is run and the document MAY no longer be unique in a later crawl of course, so there is no guarantee such a document will remain unique. (Ref: CT-2052)
Removed Tool – Add Insight to First Hash: Note that as of this release, the AddInsightToFirstHash is no longer included in the cognitive-toolkit.exe. (CT-1994)
Inline json No Longer Supported: Note that including an inline json query via –query <json> is no longer supported. As an example, for CrawlMaximo that used to use the inline command –query “select * from work_orders” would now use –sql-query “<path>\maximoquery.txt” where that text file would contain the following. (Ref: CT-1605)

SELECT *

FROM work_orders

Improved Error Messages: We’ve improved the error messages presented to the user in a number of scenarios, such as no authorization for crawling, missing files for add hash and extracted text, missing items for add classifications, and incorrect index name for AddPathValidation. (Ref: CT-822, CT-1793, CT-1794, CT-1886)
Bug Fix: Fixed an issue where the AddBreadCrumbs tool would fail with a database connector error in some circumstances. (Ref: CT-2025)
Bug Fix: Fixed an issue where the AddCategoryData tool would fail to update some items. (Ref: CT-1984)
Bug Fix: Fixed an issue where after creating 31 log files, any additional ones created caused the previously created log files to be deleted. (Ref: CT-2100)
Bug Fix: Fixed an issue where some custom column fields were not being populated in the Index with the CrawlSharePoint tool. (Ref: CT-1989)
Bug Fix: Fixed an issue where when using a Source Setting File for SharePoint for the “up” auth-type, this was not working. (Ref: CT-2026)
Bug Fix: Fixed an issue where the AddPathValidation tool for SharePoint was not proerly invalidating items when an entire library was deleted. (Ref: CT-2041)
Bug Fix: Fixed an issue where the TagDuplicate tool would in certain circumstances tag more than 1 Primary Duplicate. (Ref: CT-2052)
Bug Fix: Fixed an issue where when crawling a Content Server that major/minor versioning, the version number stored in the Index was not correct. (Ref: CT-2118)