Cognitive Suite 2.9.0 (October 2022)
Here is where you will find information on feature enhancements and fixed issues that are part of the Cognitive Suite 2.9.0 (October 2022) release.
Features
Crawl Exchange Email Addresses
Description
You can crawl all the email addresses in Microsoft® Exchange without entering them manually.
How to enable
When running the CrawlExchange operation, you can crawl all the email addresses in exchange without entering them manually. The email address option is no longer required, unless you are specifically crawling public folders. Leaving the email address option blank or omitting it will crawl all searchable mailboxes. The user credentials must have “Discovery Management” access role.
Exchange On-Premise Optimization Improvements
Description
Improved crawling performance within Exchange On Premise 2019 with threading.
How to enable
The --threads
option for CrawlExchange will also crawl using multiple threads
Migrate index data to custom columns in SharePoint® Online
Description
You can migrate the metadata residing in the index into custom columns in SharePoint Online.
How to enable
Generate a metadata mapping file.
Note: The custom column names and the index field names are not case-sensitive at this time (see Known Issues).
Sample Mapping File
[
{
"name": "",
"values": {
"SharePoint Custom Column Name": "Index Field Name"
}
}
]
CrawlSharePointOnPrem
Description
The CrawlSharePointOnPrem operation has a new option that allows you to specify various content types to crawl via a comma-separated list.
How to enable
When building the runscript command for CrawlSharePointOnPrem, ensure the option --content-types
is included in the command. If this option is not included in the command, all content types will be crawled. (Default: all)
Fixed Issues
ExtractFromIndex: Errors while extracting from index to csv causing corruption
Description
Previously, errors were reported when extracting data from the index to a csv file. This issue has been fixed and a maximum files size of 1 GB has been set.
CrawlSharePointOnPrem: Custom columns with a ‘period’ in the name break the crawler
Description
Previously, the CrawlSharePointOnPrem operation wouldn’t work if custom column names contained a period. This has been fixed.
CrawlSharePointOnPrem: Fix the 'string cannot be zero length' error
Description
Previously, crawling SharePoint Discussion Lists could return errors. These issues have been resolved.
CrawlSharePointOnPrem: Fix the 'name is missing' error
Description
Previously, Link Lists were causing a ‘name is missing’ error when running the CrawlSharePointOnPrem operation. This issue has been fixed.
CrawlContentServer: Fix --modified-after date parameter
Description
Previously, when performing a delta crawl, the --modified-after
date parameter would not work as expected. If a specific time were set for the --modified-after
option, it would not be recognized. Instead, the operation would bring in everything from the default beginning at 12am for the day. Now, the delta crawl can be run from the specific time you set.
Known Issues
ExportFromIndex --threads option disabled
Description
ExportFromIndex is returning a CsvHelper.WriterException when using threads. Threading has been disabled until we can be sure that csvhelper is threadsafe.
AddHashAndExtractedText--prevent updating LastAccessDate for file system
Description
When any bytes are read from a file, the LastAccessDate will be updated. To prevent the last access date from changing, we have patched the filesystem calls.
How to enable
This BETA feature is hidden behind a feature flag in the CognitiveToolkit.exe.config. The key is PreventLastAccessDateChange
. Setting this key to true prevents the last access date from changing.
Note: You must have write-access to the files in question.
Migrate to SharePoint Online: Fields name are NOT case-sensitive
Description
When using a mapping file for custom columns where the field name is entered incorrectly, using the wrong case, the file will still migrate and the column is filled in in SharePoint. The field names are not case-sensitive.
CrawlFileSystem : Crawling with option --validate in a folder with more than 1024 folders produces error
Description
CrawlFileSystem: Crawling with option --validate
in a folder that has more than 1024 folders produces an error.
Workaround
In the elasticsearch.yml file, set the following parameter to a number that exceeds the number of folders you have to crawl:
indices.query.bool.max_clause_count