Cognitive Suite 2.9.0 (October 2022)

Here is where you will find information on feature enhancements and fixed issues that are part of the Cognitive Suite 2.9.0 (October 2022) release.

Features

Crawl Exchange Email Addresses

Description

You can crawl all the email addresses in Microsoft® Exchange without entering them manually.

How to enable

When running the CrawlExchange operation, you can crawl all the email addresses in exchange without entering them manually. The email address option is no longer required, unless you are specifically crawling public folders. Leaving the email address option blank or omitting it will crawl all searchable mailboxes. The user credentials must have “Discovery Management” access role.

Exchange On-Premise Optimization Improvements

Description

Improved crawling performance within Exchange On Premise 2019 with threading.

How to enable

The --threads option for CrawlExchange will also crawl using multiple threads

Migrate index data to custom columns in SharePoint® Online

Description

You can migrate the metadata residing in the index into custom columns in SharePoint Online.

How to enable

Generate a metadata mapping file.

Note: The custom column names and the index field names are not case-sensitive at this time (see Known Issues).

Sample Mapping File

CODE

[
  {
    "name": "",
    "values": {
      "SharePoint Custom Column Name": "Index Field Name"
    }
  }
]

CrawlSharePointOnPrem

Description

The CrawlSharePointOnPrem operation has a new option that allows you to specify various content types to crawl via a comma-separated list.

How to enable

When building the runscript command for CrawlSharePointOnPrem, ensure the option --content-types is included in the command. If this option is not included in the command, all content types will be crawled. (Default: all)

Fixed Issues

ExtractFromIndex: Errors while extracting from index to csv causing corruption

Description

Previously, errors were reported when extracting data from the index to a csv file. This issue has been fixed and a maximum files size of 1 GB has been set.

CrawlSharePointOnPrem: Custom columns with a ‘period’ in the name break the crawler

Description

Previously, the CrawlSharePointOnPrem operation wouldn’t work if custom column names contained a period. This has been fixed.

CrawlSharePointOnPrem: Fix the 'string cannot be zero length' error

Description

Previously, crawling SharePoint Discussion Lists could return errors. These issues have been resolved.

CrawlSharePointOnPrem: Fix the 'name is missing' error

Description

Previously, Link Lists were causing a ‘name is missing’ error when running the CrawlSharePointOnPrem operation. This issue has been fixed.

CrawlContentServer: Fix --modified-after date parameter

Description

Previously, when performing a delta crawl, the --modified-after date parameter would not work as expected. If a specific time were set for the --modified-after option, it would not be recognized. Instead, the operation would bring in everything from the default beginning at 12am for the day. Now, the delta crawl can be run from the specific time you set.

Known Issues

ExportFromIndex --threads option disabled

Description

ExportFromIndex is returning a CsvHelper.WriterException when using threads. Threading has been disabled until we can be sure that csvhelper is threadsafe.

AddHashAndExtractedText--prevent updating LastAccessDate for file system

Description

When any bytes are read from a file, the LastAccessDate will be updated. To prevent the last access date from changing, we have patched the filesystem calls.

How to enable

This BETA feature is hidden behind a feature flag in the CognitiveToolkit.exe.config. The key is PreventLastAccessDateChange. Setting this key to true prevents the last access date from changing.

Note: You must have write-access to the files in question.

Migrate to SharePoint Online: Fields name are NOT case-sensitive

Description

When using a mapping file for custom columns where the field name is entered incorrectly, using the wrong case, the file will still migrate and the column is filled in in SharePoint. The field names are not case-sensitive.

CrawlFileSystem : Crawling with option --validate in a folder with more than 1024 folders produces error

Description

CrawlFileSystem: Crawling with option --validate in a folder that has more than 1024 folders produces an error.

Workaround

In the elasticsearch.yml file, set the following parameter to a number that exceeds the number of folders you have to crawl:

indices.query.bool.max_clause_count