Encyclopedia of Cognitive Toolkit Operations

This guide explains the operations available within Cognitive Toolkit and how to use them.

Basics

Cognitive Toolkit contains 35 operations to help you understand and manage your data. These operations are runscript-based commands which can be run using a minimum set of required parameters or configured for more complex applications.

Run a command

To begin using Cognitive Toolkit, every operation follows these initial steps:

Type cmd in the Windows search box or charms bar, and then click Run as administrator. Alternatively, you may open a CMD prompt using a Service Account with the appropriate permissions.
Change directory (cd) to your Cognitive Toolkit root folder. (ie. the location of the extracted shinydocs-cognitive-toolkit-[version]-[date].zip file)
At theCognitiveToolkit.exe prompt, enter the command for the operation you wish to run using either the minimum required parameters or edit the command as required.

Edit a command

To build a command you must provide, at a minimum, the required parameters identified in the Options Table for that command.

The command can be adapted to your specific requirements and environment by editing the command:

Add optional parameters as indicated in the command’s Options Table.
Option values are variables based on your environment, as well as the setup/location of source files.
Values should be surrounded by double quotation marks (“”) within the command.

For example:

DO: --query "C:\query-match-path-no-classification.json"

DO NOT: --query C:\query-match-path-no-classification.json

Tips

The first time you use Cognitive Toolkit, you are required to Activate your license.

Begin using Cognitive Toolkit by crawling an ECM and/or file system. This will build out an index in the Analytics Engine that can then be utilized to perform other operations.

Certain characters can negatively impact operation of the Cognitive Toolkit. Learn more about using special characters.

Source Setting Files

Source setting files provide the login credentials required to access a content source.

The following source setting files can be edited with your organization’s administrative login credential information:

Encyclopedia

🇦 A-B

Activate

Example

Activates the license and must be performed to initiate use of the Cognitive Toolkit.

Command using minimum required inputs:

CognitiveToolkit.exe activate -p "VALUE"

Use the Activate Options Table to edit the command.

Activate Options Table

OPTION	VALUE	CONDITION
-p<VALUE>	Path to the license file	Required

📚 Further reading

Activating the Cognitive Toolkit License

AddClassifications

Example

Records Content Server classifications data in an index within the Analytics Engine. Fields created will depend on the fields created within the Content Server.

The CrawlContentServer operation must be performed before running AddClassifications.

Command using minimum required inputs:

CognitiveToolkit.exe AddClassifications --query "C:\match_all.json" -u "VALUE" -i "VALUE"

Use the AddClassifications Options Table to edit the command.

AddClassifications Options Table

OPTION	VALUE	CONDITION
--source-settings <VALUE>	Path to the settings file containing access information such as the username and password for the data source (Possible data sources: Box, Content Server, Documentum, Exchange, Filenet)	Optional (Other parameters will be ignored.)
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters)	Required
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 1000
--use-shinydocs-jobs	Send logging data to the shinydocs-jobs index For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index	Optional *If not included, default: false
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

AddExtractedTextFromEngineeringDrawings

Example

Extracts full text from engineering drawings.

Command using minimum required inputs:

CognitiveToolkit.exe AddExtractedTextFromEngineeringDrawings --query "C:\match_all.json" -u "VALUE" -i "VALUE"

Use the AddExtractedTextFromEngineeringDrawings Options Table to edit your runscript command.

AddExtractedTextFromEngineeringDrawings Options Table

OPTION	VALUE	CONDITION
--source-settings <VALUE>	Path to the settings file containing access information such as the username and password for the data source (Content Server, SharePoint, Box, etc...)	Optional *If not included, default: filesystem
--path-to-micro-station <VALUE>	Fully Qualified Path to MicroStation	Optional *If not included, default: ‘C:\Program Files\Bentley\MicroStation CONNECT Edition\MicroStation\microstation.exe’
--is-v8i	MicroStation is V8i	Optional *If not included, default: false
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters)	Required
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included default: 1000
--use-shinydocs-jobs	Send logging data to the shinydocs-jobs index For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index	Optional *If not included, default: false
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included default: 1
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included default: false
-s\|--silent	Turn off the progress bar	Optional

📚 Further reading

How to AddExtractedTextFromEngineeringDrawings

AddFromSqlDatabase

Example

Migrates data using a SQL query to an index within the Analytics Engine.

Command using minimum required inputs:

CognitiveToolKit.exe AddFromSqlDatabase --database-type "oracle" --username "USERNAME" --password ****** --data-source 192.1.1.1:1521 --sql "C:\sqlquery.sql" --sql-parameters "id=tags" --column-prefix "sql" --query "C:\match_all.json" --index-server-url "http://localhost:9200" --index-name INDEXNAME

Use the AddFromSqlDatabase Options Table to edit your runscript command.

AddFromSqlDatabase Options Table

OPTION	VALUE	CONDITION
--database-type <VALUE>	The database type to which to connect. Supported values: “oracle”, “postgres”, “sqlserver”	Required
--username <VALUE>	The database username (login credentials)	Required
--password <VALUE>	The database password (login credentials)	Required
--data-source <VALUE>	The data source	Required
--sql <VALUE>	The SQL to run or path to .sql file	Required
--sql-parameters <VALUE>	A comma separated list of keys and fields from the index to replace values in the SQL Example: sqlParamId=person.id, sqlParamName=person.name	Optional
-c\|--column-prefix <VALUE>	The prefix added to column names	Required
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters)	Required
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 1000
--use-shinydocs-jobs	Send logging data to the shinydocs-jobs index For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index	Optional *If not included, default: false
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

AddHashAndExtractedText

Example

Generates the hash value for each file specified while also extracting full text.

Note: When a folder is renamed, changing the file path, AddHashAndExtractedText verifies to ensure the file has not changed. If a file is found to be the same, text extraction is not performed on that file.

Command using minimum required inputs:

Cognitivetoolkit.exe AddHashAndExtractedText --query "C:\match_all.json" -u "http://localhost:9200" -i INDEXNAME

Use the AddHashAndExtractedText Options Table to edit your runscript command.

AddHashAndExtractedText Options Table

OPTION	VALUE	CONDITION
--source-settings <VALUE>	Path to the settings file containing access information such as the username and password for the data source (Box, Content Server, Documentum, OneDrive, SharePoint)	Optional *If not included, default: leave blank for filesystem
--algorithm <VALUE>	Algorithm to apply Supported values: “sha1”, “sha256”, “sha512”, “md5”	Optional *If not included, default: md5
--extraction-service-url <VALUE>	URL for the extraction service --extraction-service-url must be included if the default location was changed during installation.	Optional *If not included, default: “http://localhost:55555”
--max-characters <VALUE>	The maximum number of characters for the extracted text field	Optional *If not included, default: 0 [all characters]
--debug-level <VALUE>	The level of depth of exception messages	Optional *If not included, default: 20
--action-keyword <VALUE>	The AddHashAndExtractedText command automatically generates hash and extracts text from the source parameter, but it can also be modified to either generate hash OR extract text. Include the --action-keyword option to specify which of the two actions should be performed: hash or text --action-keyword “hash” will apply hash, but will not extract full text. --action-keyword “text” will extract full text, but will not apply hash.	Optional *If not included, default: both
--time-out <VALUE>	The timeout in seconds for each batch of nodes being processed	Optional *If not included, default: 60 seconds
--ocr-utility <VALUE>	OCR Utility to use for text extraction Supported values: “iron”, “azure”, “none”	Optional *If not included, default: none
--azure-subscription-key <VALUE>	Azure Computer Vision Key is found on the Keys and Endpoint page for your Cognitive Services resource in the Azure Portal	Optional* *Required, if ocr-utility is "azure"
--azure-subscription-endpoint <VALUE>	Azure Computer Vision Endpoint, this is found on the Keys and Endpoint page for your Cognitive Services resource in the Azure Portal	Optional* *Required, if ocr-utility is "azure"
--text-timeout <VALUE>	The number of seconds to wait before cancelling text for an item (0 for unlimited). To ensure that the OCR process is completed on each file, modify the `--text-timeout` value higher than the default setting of 60 seconds in the exe.config file.	Optional *If not included, default: 60 seconds
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters)	Required
-n\|--nodes-per-request <VALUE>	Number of nodes per request. For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 100
--use-shinydocs-jobs	Send logging data to the shinydocs-jobs index For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index	Optional *If not included, default: false
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

📚 Further reading

AddPathValidation

Example

The AddPathValidation operation checks for changes to the index within the Analytics Engine. The changes that are validated result in a false value and are based on the data source itself.

AddPathValidation for File system: checks for files that have been moved, deleted or had a name change.
AddPathValidation for Content Server: uses nodeID to check for files that have been deleted.
AddPathValidation for SharePoint: checks for files that have been moved or deleted.
AddPathValidation for Box: uses fileID to check for files that have been deleted.
AddPathValidation for Documentum: uses nodeID to check for files that have been deleted.
AddPathValidation for OneDrive: checks for files that have been moved or deleted.

Command using minimum required inputs:

Cognitivetoolkit.exe AddPathValidation --query "C:\match_all.json" -u "http://localhost:9200" -i INDEXNAME

Use the AddPathValidation Options Table to edit your runscript command.

AddPathValidation Options Table

OPTION	VALUE	CONDITION
--source-settings <VALUE>	Path to the settings file containing access information such as the username and password for the data source (Box, Content Server, Documentum, SharePoint, OneDrive)	Optional *If not included, default: leave blank for filesystem
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters)	Required
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 100
--use-shinydocs-jobs	Send logging data to the shinydocs-jobs index For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index	Optional *If not included, default: false
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

📚 Further reading

How to add a path validation and item removal

AddPropertyData

Example

Pulls property data from an ECM and adds it to the index in the Analytics Engine. Content Server categories and attributes are currently supported.

Category attributes - data is pulled via a direct database connector and/or REST API
Classification values - data is pulled via REST API
Records Management (RM) Classification values - data is pulled via REST API

Command using minimum required inputs:

Cognitivetoolkit.exe AddPropertyData --query "C:\match_all.json" -u "http://localhost:9200" -i INDEXNAME

Use the AddPropertyData Options Table to edit your runscript command.

AddPropertyData Options Table

OPTION	VALUE	CONDITION
--source-settings <VALUE>	Path to the settings file containing access information such as the username and password for the data source (Content Server, SharePoint, Box, etc...)	Optional *If not included, default: leave blank for filesystem
--legacy-naming	Use legacy naming (CS data only) If the --legacy-naming option is not used, the fields created in the index are prefixed with prop-	Optional *If not included, default: false
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters)	Required
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 100
--use-shinydocs-jobs	Send logging data to the shinydocs-jobs index For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index	Optional *If not included, default: false
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

🇨 C-D

CacheFileSystemPermissions

Example

Caches the permissions on a file system item and creates the following fields within an index in the Analytics Engine:

CachedPermsFieldName = "cached-permissions"
CachedInheritanceFieldName = "cached-inheritance"
CachedOwnerFieldName = "cached-owner"

Command using minimum required inputs:

Cognitivetoolkit.exe CacheFileSystemPermissions --query "C:\match_all.json" -u "http://localhost:9200" -i INDEXNAME

Use the CacheFileSystemPermissions Options Table to edit your runscript command.

CacheFileSystemPermissions Options Table

OPTION	VALUE	CONDITION
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters)	Required
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 100
--use-shinydocs-jobs	Send logging data to the shinydocs-jobs index For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index	Optional *If not included, default: false
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

📚 Further reading

Cache, set and restore file system permissions

CopyItems

Example

Used to copy an object from one index to another index within the Analytics Engine.

Command using minimum required inputs:

Cognitivetoolkit.exe CopyItems --destination-index-url http://localhost:9200 --destination-index-name INDEXNAME_1 -u http://localhost:9200 -i INDEXNAME_2 --query C:\match_all.json

Use the CopyItems Options Table to edit your runscript command.

CopyItems Options Table

OPTION	VALUE	CONDITION
--destination-index-url <VALUE>	The destination index server	Required
--destination-index-name <VALUE>	The destination index name	Required
--destination-index-type <VALUE>	The destination index type	Optional
--destination-index-shards <VALUE>	The destination index number of shards	Optional *If not included, default: 5
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters)	Required
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 100
--use-shinydocs-jobs	Send logging data to the shinydocs-jobs index For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index	Optional *If not included, default: false
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

CrawlBox

Example

Crawls for the metadata within Box and adds it to an index in the Analytics Engine.

Command using minimum required inputs:

Cognitivetoolkit.exe CrawlBox --source-settings "C:SourceFiles\Box.json" --query "C:\box.json" -u "http://localhost:9200" -i INDEXNAME

Use the CrawlBox Options Table to edit your runscript command.

CrawlBox Options Table

OPTION	VALUE	CONDITION
--source-settings <VALUE>	Path to the settings file containing access information such as the username and password for the Box data source	Required
--start-folder-id <VALUE>	Default Start Folder Id	Optional *If not included, default: 0
--users <VALUE>	Comma delimited list of user logins and folder ids in format <user_login>:<start_folder_id>	Optional *If not included, default: all users
--include-shared-files	Include folders shared by other users	Optional *If not included, default: false
--crawl-collaborators	Crawl and capture Box groups and their users	Optional *If not included, default: false
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters)	Required
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 100
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

CrawlContentServer

Example

Crawls for the metadata within Content Server and adds it to an index in the Analytics Engine.

Use this tool to crawl the Content Server database directly or via REST API.

Crawling the Content Server database directly leaves REST API available for other applications.

Command using minimum required inputs:

Cognitivetoolkit.exe CrawlContentServer --source-settings "C:SourceFiles\ContentServer.json" --query "C:\content_server.json" -u "http://localhost:9200" -i INDEXNAME

Use the CrawlContentServer Options Table to edit your runscript command.

CrawlContentServer Options Table

OPTION	VALUE	CONDITION
--source-settings <VALUE>	Path to the settings file containing access information such as the username and password for the Content Server database See Resources for: sample --source-settings file for crawling Content Server database directly sample --source-settings file for crawling via REST API	Optional *If not included, default: crawl Content Server via the REST API
--starting-folder-id <VALUE>	The ID of the folder from which to begin traversing	Optional *If not included, default: 0
--allowed-types <VALUE>	A comma delimited list of content type ids you wish to crawl	Optional *If not included, default: 1,144,736,749
--modified-after <VALUE>	Items changed on / after this date Note: To use the --modified-after option with Content Server, the date of the documents has to be within 30 days of of the date you run the tool Example: Today’s date: 2023-03-28 Modified-after-date: 2023-03-01 (This will crawl items after March 1st) Modified-after-date: 2023 -01-25 (This will fail with error: `date_exceeds allowed number of days`) Supported date formats are: yyyy-MM-dd, yyyy-MM-dd HH:mm, yyyy-MM-ddTHH:mm. Example: 2018-12-20 or 2018-12-20 19:42. Supported relative date formats are: now, now+/-1d[/d], now+/-1m[/d], now+/-1y[/d]	Optional
--delta	Use the audit tables to detect changes based on the --modified-after option with a default of today’s date.	Optional *If included, crawl is performed via REST
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters)	Required
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 1000
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

CrawlContentServerWorkflows

Example

Crawls for the metadata within Content Server workflows and adds it to an index in the Analytics Engine.

Use this tool to crawl the Content Server database directly or via REST API.

Crawling the Content Server database directly leaves REST API available for other applications.

Command using minimum required inputs:

Cognitivetoolkit.exe CrawlContentServerWorkflows --source-settings "C:SourceFiles\ContentServer.json" --query "C:\content_server.json" -u "http://localhost:9200" -i INDEXNAME

Use the CrawlContentServerWorkflows Options Table to edit your runscript command.

CrawlContentServerWorkflows Options Table

OPTION	VALUE	CONDITION
--source-settings <VALUE>	Path to the settings file containing access information such as the username and password for the Content Server database See Resources for: sample --source-settings file for crawling Content Server database directly sample --source-settings file for crawling via REST API	Optional *If not included, default: crawl Content Server via the REST API
--process-status <VALUE>	Process archived status (Supported values: "archived, noarchive") Supported values: “archived”, “noarchive”	Optional
--initiated-after <VALUE>	Items changed on/after this date. Supported date formats are: yyyy-MM-dd, yyyy-MM-dd HH:mm, yyyy-MM-ddTHH:mm. Example: 2018-12-20 or 2018-12-20 19:42. Supported relative date formats are: now, now+/-1d[/d], now+/-1m[/d], now+/-1y[/d]	Optional
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters)	Required
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 1000
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

CrawlDocumentum

Example

Crawls for the metadata within Documentum and adds it to an index in the Analytics Engine.

Command using minimum required inputs:

Cognitivetoolkit.exe CrawlDocumentum --source-settings "C:SourceFiles\Documentum.json" --query ""C:\documentum.json"" -u "http://localhost:9200" -i INDEXNAME

Use the CrawlDocumentum Options Table to edit your runscript command.

CrawlDocumentum Options Table

OPTION	VALUE	CONDITION
--source-settings <VALUE>	Path to the settings file containing access information such as the username and password for the Documentum data source	Required
--use-single-index	Use single index	Optional *If not included, default: false
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters)	Required
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 1000
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

CrawlExchange

Example

Crawls for the metadata within Microsoft Exchange and adds it to an index in the Analytics Engine.

Command using minimum required inputs:

Cognitivetoolkit.exe CrawlExchange --source-settings "C:SourceFiles\Exchange.json --email tester1@exchange.local -u http://localhost:9200 -i INDEXNAME

Use the CrawlExchange Options Table to edit your runscript command.

CrawlExchange Options Table

OPTION	VALUE	CONDITION
--page-size <VALUE>	Number of exchange items to retrieve in a single request	Optional *If not included, default: 500
--max-characters <VALUE>	The maximum number of characters for the extracted email text field Note: Setting this option when crawling Exchange Online will restrict the number of characters displayed in Enterprise Search results.	Optional *If not included, default: 0 [all characters]
--source-settings <VALUE>	Path to the settings file containing access information such as the username and password for the Exchange data source	Required
--crawl-public-folders	Crawl public folders	Optional *If not included, default: false
--email <VALUE>	Comma separated list of email addresses to crawl Note: Leave blank to crawl all mailboxes Use when crawling public folders	Required
--exclude-auto-replies	Exclude auto-replies (For example, out-of-office replies) from the index	Optional *If not included, default: false
--ignore-inline-attachments	Excludes all inline attachments	Optional *If not included, default: false
--ignore-body	Excludes the body text of exchange items from being indexed	Optional *If not included, default: false
--ignore-attachment-extensions <VALUE>	Comma separated list of extensions of the inline attachments that should be ignored	Optional
--ignore-folders <VALUE>	Comma separated list of folder names to be excluded from crawling Supported values are: drafts, deleted items, junk email	Optional
--after-last-modified-date <VALUE>	Items changed on/after this date Supported date formats are: yyyy-MM-dd, yyyy-MM-dd HH:mm, yyyy-MM-ddTHH:mm. Example: 2018-12-20 or 2018-12-20 19:42. Supported relative date formats are: now, now+/-1d[/d], now+/-1m[/d], now+/-1y[/d]	Optional
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 1000
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

CrawlFileNet

Example

Crawls for the metadata within FileNet and adds it to an index in the Analytics Engine.

Command using minimum required inputs:

Cognitivetoolkit.exe CrawlFileNet --source-settings "C:SourceFiles\FileNet.json" -u http://localhost:9200 -i INDEXNAME

Use the CrawlFileNet Options Table to edit your runscript command.

CrawlFileNet Options Table

OPTION	VALUE	CONDITION
--source-settings <VALUE>	Path to the settings file containing access information such as the username and password for the FileNet data source	Required
--class-definition <VALUE>	The document class you want to filter for	Optional *If not included, default: 'All'
-exclude-subclasses	Exclude subclasses	Optional *If not included, default: false
--crawl-hidden	Crawl hidden document classes	Optional *If not included, default: false
--where-clause <VALUE>	FileNetSQL where clause will override the dates when used	Optional
--before-date-last-modified <VALUE>	Crawl everything before this date	Optional *If not included, default: Now
--after-last-modified-date <VALUE>	Items changed on/after this date Supported date formats: yyyy-MM-dd, yyyy-MM-dd HH:mm, yyyy-MM-ddTHH:mm Example: 2018-12-20 or 2018-12-20 19:42	Optional *If not included, default: 1970-01-01
--interval <VALUE>	The number of months to crawl at a time	Optional *If not included, default: 3
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 1000
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

CrawlFileSystem

Example

Base operation for data discovery. Generally performed prior to running any other Cognitive Toolkit operation. This operation crawls the specified path (or multiple paths) for metadata. The metadata is then stored in an index within the Analytics Engine where it can be further mined for insights.

Command using minimum required inputs:

CognitiveToolkit.exe CrawlFileSystem --path-file "C:\path.json" -u http://localhost:9200 -i INDEXNAME

Use the CrawlFileSystem Options Table to edit your runscript command.

CrawlFileSystem Options Table

OPTION	VALUE	CONDITION
--path <VALUE> OR --path-file <VALUE>	Single path to crawl OR Text file that contains multiple paths to crawl At least one of these two options must be included in the runscript command. If --path is not* used, --path-file must be used. If --path-file is not used, --path must be used.	Optional* Optional*
--include-hidden	Includes hidden files in the crawl	Optional *If not included default: false
--include-system	Includes system files in the crawl	Optional *If not included default: false
-a\|--add-field-owner	Add the Owner field to the index	Optional *If not included default: false
--include-reparse	Includes reparse items A file or directory can contain a reparse point, which is a collection of user-defined data. The format of this data is understood by the application which stores the data, and a file system filter, which interprets the data and processes the file. When an application sets a reparse point, it stores this data, plus a reparse tag, which uniquely identifies the data it is storing.	Optional *If not included default: false
--after-date-last-modified	Crawls everything after this date* Supported date formats: yyyy-MM-dd, yyyy-MM-dd HH:mm, yyyy-MM-ddTHH:mm Example: 2018-12-20 or 2018-12-20 19:42 Supported relative date formats: now, now+/-ld[/d], now+/-lm[/d], now +/-ly[/d]	Optional *If not included default: all
--validate	Validates file paths Issue: Using the option --validate in a folder with more than 1024 folders produces an error. Solution: In the elasticsearch.yml file, set the following parameter to a number that exceeds your folder amount: `indices.query.bool.max_clause_count`	Optional *If not included default: false
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 1000
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

📚 Further reading

CrawlMaximo

Example

Crawls for the metadata within Maximo and adds it to an index in the Analytics Engine.

Command using minimum required inputs:

Cognitivetoolkit.exe CrawlMaximo --database-type postgres --schema-type work-order --connection-string User ID=postgres;Password=mypassword;Host=localhost;Port=5435;Database=Maximodatabase --query select assetnum,workorderid,worktype from work_orders --key-fields workorderid --index-server-url http://localhost:9200 --index-name INDEXNAME

Use the CrawlMaximo Options Table to edit your runscript command.

CrawlMaximo Options Table

OPTION	VALUE	CONDITION
--database-type <VALUE>	The type of database Supported types: 'oracle', 'postgres', 'sqlserver'	Required
--schema-type <VALUE>	The type of items being crawled Supported types: 'work-order', 'condition-report', 'item', 'location', 'oem', 'company'	Required
--connection-string <VALUE>	The database Connection String, used to connect to the database	Required
--sql-query <VALUE>	The SQL query to retrieve records	Required
--key-fields <VALUE>	A comma separated list of fields that produce a unique key	Required
--database-timeout <VALUE>	The length of time (in seconds) to wait for a connection to the server before terminating the attempt and generating an error	Optional *If not included, default: 120
--connection-string-password <VALUE>	A password to replace '{{password}}' in the connection string	Optional
--chunk-size <VALUE>	The number of items sent to the index in a single request	Optional *If not included, default: 1000
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
-n\|--nodes-per-request <VALUE>	Number of nodes per request. For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 100
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

CrawlOneDrive

Example

Crawls for the metadata within OneDrive and adds it to an index in the Analytics Engine.

Command using minimum required inputs:

Cognitivetoolkit.exe CrawlOneDrive --source-settings C:\onedrive.json -u http://localhost:9200 -i INDEXNAME --use-single-index

Use the CrawlOneDrive Options Table to edit your runscript command.

CrawlOneDrive Options

OPTION	VALUE	CONDITION
--source-settings <VALUE>	Path to the settings file containing access information such as the username and password for the OneDrive data source	Required
--use-single-index	Use single index	Optional *If not included, default: false
--specific-accounts <VALUE>	Specify email addresses to crawl (comma-separated) Example: ‘email1@domain.com', ‘email2@domain.com’, ‘email3@domain.com’, 'email4@domain.com’	Optional
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 1000
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

CrawlSharePointOnline

Example

Crawls for the metadata within SharePoint Online and adds it to an index in the Analytics Engine.

Command using minimum required inputs:

Cognitivetoolkit.exe CrawlSharePoint--source-settings C:\sharepoint.json -u http://localhost:9200 -i "shinydrive index" --use-single-index

Use the CrawlSharePointOnline Options Table to edit your runscript command.

CrawlSharePointOnline Options Table

OPTION	VALUE	CONDITION
--source-settings <VALUE>	Path to the settings file containing access information such as the username and password for the SharePoint Online data source	Required
--crawl-subsites	Crawl all of the subsites	Optional *If not included, default: false
--remove-standard-filter	Remove filter for non-standard document libraries (Hidden, System, etc)	Optional *If not included, default: false
--remove-document-library-type-filter	Remove filter for non-document library types	Optional *If not included, default: false
--remove-site-assets-filter	Remove filter for site asset libraries	Optional *If not included, default: false
--crawl-site-collection	Crawl From SiteCollection (overrides CrawlSubsites option)	Optional *If not included, default: false
--crawl-from-index <VALUE>	Crawl From Previously Run 'CrawlSharePointSites' index	Optional
--filter <VALUE>	SharePoint Filter File/JSON Note: The --filter option for CrawlSharePointOnline is only used in conjunction with --crawl-site-collection. Results will not be filtered unless the --crawl-site-collection option is also used.	Optional
--use-single-index	Use single index	Optional *If not included, default: false
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 1000
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
--index-type <VALUE>	Type name for index objects	Optional *If not included, default: shinydocs
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false

📚 Further reading

Add Fields to Crawl from the SharePoint Property Bag

CrawlSharePointOnPrem

Example

Crawls for the metadata within SharePoint On-Premise and adds it to an index in the Analytics Engine.

Command using minimum required inputs:

Cognitivetoolkit.exe CrawlSharePointOnPrem--source-settings C:\sharepoint.json -u http://localhost:9200 -i "shinydrive index" --use-single-index

Use the CrawlSharePointOnPrem Options Table to edit your runscript command.

CrawlSharePointOnPrem Options Table

OPTION	VALUE	CONDITION
--source-settings <VALUE>	Path to the settings file containing access information such as the username and password for the SharePoint On-Premise data source	Required
--crawl-subsites	Crawl all of the subsites.	Optional *If not included, default: false
--hidden	Crawl hidden lists	Optional *If not included, default: false
--catalog	Crawl catalog lists	Optional *If not included, default: false
--application	Crawl application lists	Optional *If not included, default: false
--private	Crawl private lists	Optional *If not included, default: false
--use-single-index	Use single index	Optional *If not included, default: false
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 1000
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

CrawlSharePointOnlineSites

Example

Crawls SharepointOnlineSites to create a list of all the site names and add them to an index in the Analytics Engine. This information can then be used to crawl specific subsites using the CrawlSharePointOnline or CrawlSharePointOnPrem operations.

Command using minimum required inputs:

Cognitivetoolkit.exe CrawlSharePointSites --source-settings "C:\sharepoint.json" -u http://localhost:9200 -i INDEXNAME

Use the CrawlSharePointSites Options Table to edit your runscript command.

CrawlSharePointOnlineSites Options Table

OPTION	VALUE	CONDITION
--source-settings <VALUE>	Path to the settings file containing access information such as the username and password for the SharePointOnlineSites data source	Required
--keyword-query <VALUE>	Additional Keyword Query parameters	Optional
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 1000
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

Dispose

Example

Deletes the specified data/files based on the query. This will add a field to an index in the Analytics Engine called [dispose] with value of true if successful.

For confirmation, Dispose identifies the number of files that will be deleted before the dispose runs.

Command using minimum required inputs:

Cognitivetoolkit.exe Dispose -query "C:\disposeQuery.json" -u http://localhost:9200 -i INDEXNAME

Use the Dispose Options Table to edit your runscript command.

Dispose Options Table

OPTION	VALUE	CONDITION
--source-settings <VALUE>	Path to the settings file containing access information such as the username and password for the data source (Supported data sources: Content Server, Documentum, File System, OneDrive, SharePoint Online and SharePointOnPrem)	Optional *If not included, default: filesystem
--verify-hash	Verifies that the file still matches the hash value before file deletion	Optional *If not included, default: false
--hash-field <VALUE>	The hash field	Optional, but required if --verify-hash is specified
--hash-algorithm <VALUE>	The hash algorithm to use when verifying Supported values: “sha1”, “sha256”, “sha512”, “md5”	Optional, but required if --verify-hash is specified
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters)	Required
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 100
--use-shinydocs-jobs	Send logging data to the shinydocs-jobs index For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index	Optional *If not included, default: false
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
--force	Forcefully remove / Suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

📚 Further reading

🇪 E-K

ExportFromIndex

Example

Specify and export fields/values from an index in the Analytics Engine into a comma-separated value (csv) file.

Command using minimum required inputs:

Cognitivetoolkit.exe ExportFromIndex --fields creationTimeUtc,name,extension,path --query "C:\match_all.json" -u http://localhost:9200 -i INDEXNAME

Use the ExportFromIndex Options Table to edit your runscript command.

ExportFromIndex Options Table

OPTION	VALUE	CONDITION
--fields <VALUE>	Comma-delimited list of fields to include in the export	Required
--filename <VALUE>	File to export the index to	Optional *If not included, default: export.csv
--search-index-name <VALUE>	Index to search for the duplicates	Optional
--max-file-size <VALUE>	Maximum file size in MB, limited to 1GB	Optional *If not included, default: 1GB
--inspected-field <VALUE>	The name of the field that the duplicates were tagged on	Optional *If not included, default: hash
--duplicate-field <VALUE>	The name of the field that identifies the duplicate	Optional *If not included, default: duplicate-hash
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters)	Required
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 100
--use-shinydocs-jobs	Send logging data to the shinydocs-jobs index For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index	Optional *If not included, default: false
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

📚 Further reading

Best Practices for Finding and Removing Duplicates

ExtractAndCrawlPst

Example

Extracts text and performs a

crawl of pst (email) files.

Command using minimum required inputs:

Cognitivetoolkit.exe ExtractAndCrawlPst --query "C:\query-match-extension-pst-not-extracted.json" --index-server-url http://localhost:9200 --index-name INDEXNAME

Use the ExtractAndCrawlPst Options Table to edit your runscript command.

ExtractAndCrawlPst Options Table

OPTION	VALUE	CONDITION
--create-duplicates	Allow duplicate files If this option is utilized, pst files that are crawled more than once will be duplicated in the file on the fileshare and in the index. By default, if an existing pst file is found, it will be skipped during the operation.	Optional *If not included, default: false
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters)	Required
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 100
--use-shinydocs-jobs	Send logging data to the shinydocs-jobs index For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index	Optional *If not included, default: false
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

📚 Further reading

How to Extract and Crawl PST files

ExtractEntities

Example

An information extraction technique whereby key elements from text are identified and classified into predefined categories. This transforms unstructured data to structured data that is machine readable and available for standard processing.

Command using minimum required inputs:

Cognitivetoolkit.exe ExtractEntities --extraction-service-url http://localhost:8181/ --query "C:\query-match-fullText-no-entities.json" --index-server-url http://localhost:9200 --index-name INDEXNAME

Use the ExtractEntities Options Table to edit your runscript command.

ExtractEntities Options Table

OPTION	VALUE	CONDITION
--field <VALUE>	The name of the index field to store the extracted entities	Optional *If not included, default: entities
-c\|--classes <VALUE>	Comma separated list of entity classes to extract from text. The extracted classes is dependent on the classifier model used to perform entity extraction.	Optional *If not included, default: all classes, Example:"LOCATION,PERSON,ORGANIZATION"
-e\|--extract-from <VALUE>	The name of the index field from which to extract entities	Optional *If not included, default: fullText
--preserve-spacing	Preserve spacing from extracted entities. Depending on the source file, there may be unwanted line breaks in extracted entities.	Optional *If not included, default: false
--extraction-service-url <EXTRACTION_SERVICE_URL>	URL for the entity extraction service	Optional *If not included, default: http://localhost:55555
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters)	Required
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 100
--use-shinydocs-jobs	Send logging data to the shinydocs-jobs index For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index	Optional *If not included, default: false
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

📚 Further reading

How to Extract Entities from Full Text

FindSimilarClassification

Example

Adds classifications towards documents based on their similarity to other, already-classified documents in the Analytics Engine.

For example, choose 5-10 documents of a similar kind and classify them by their document type, such as offer letters or purchase orders. The Shinydocs Cognitive Suite will “learn” from those examples and will be able to find other similar documents for classification.

Command using minimum required inputs:

Cognitivetoolkit.exe FindSimilarClassification --classification-field classification --query "C:\query-match-path-no-classification.json" --tokens 100 --threshold 75 --min-docs 5 --min-terms 1 --match 1 --index-server-url http://localhost:9200 --index-name INDEXNAME

Use the FindSimilarClassification Options Table to edit your runscript command.

FindSimilarClassification Options Table

OPTION	VALUE	CONDITION
--field-list <VALUE>	Fields to compare	Optional *If not included, default: fullText
--classification-field <VALUE>	Name of the field where classifications are found	Required
--tokens <VALUE>	Number of tokens to compare	Optional *If not included, default: 500
--min-docs <VALUE>	Minimum document frequency	Optional *If not included, default: 5
--min-terms <VALUE>	Minimum term frequency	Optional *If not included, default: 2
--max-docs <VALUE>	Maximum document frequency	Optional
--min-word-length <VALUE>	Minimum word length [number of characters]	Optional
--threshold <VALUE>	Similarity threshold [minimum-should-match]	Optional *If not included, default: 90 (%)
--match <VALUE>	Number of documents to match	Optional *If not included, default: 5
--size-similarity <VALUE>	Size similarity threshold (the percent delta between sizes)	Optional *If not included, default: 20
--inclusion <VALUE>	File extension inclusion list (Comma delimited)	Optional
--exclusion <VALUE>	File extension exclusion list (Comma delimited)	Optional
-print-query	Print the Elasticsearch query in the logs. Does not run operation!	Optional *If not included, default: false
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters)	Required
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 100
--use-shinydocs-jobs	Send logging data to the shinydocs-jobs index For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index	Optional *If not included, default: false
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

📚 Further reading

Best Practice: Classifying via FindSimilarClassification

🇱 L-O

Migrate

Example

Migrates data/files from one source to another

NOTES:

A crawl of the origin source data is required before performing a migration.
When Migrating to SharePoint Online, field names are NOT case-sensitive at this time.
When Migrating File Share to File Share (internal servers only), permissions and ownership are not carried over

Command using minimum required inputs:

Cognitivetoolkit.exe Migrate --destination-source-settings "C:\contentserver.json" --start-location 123456 "C:\match_all.json" -u http://localhost:9200 -i INDEXNAME --path-prefix-to-remove "\\TestFolder" --classification-mapping "C:\ClassificationsMapping.json"

Use the Migrate Options Table to edit your runscript command.

Migrate Options Table

OPTION	VALUE	CONDITION
--origin-source-settings <VALUE>	The origin settings file	Optional *If not included, default: File System Source
--destination-source-settings <VALUE>	The destination settings file	Required
--start-location <VALUE>	The default starting location	Optional depending on other input. Ignored otherwise.
--location-field <VALUE>	The field name in the index indicating the destination location for the file migration	Optional
--migrate-versions	Migrate all versions (Source dependent)	Optional *If not included, default: false
--migrate-permissions	Migrate all permissions (Source dependent)	Optional *If not included, default: false
--path-prefix-to-remove <VALUE>	Removes the provided text from the beginning of path	Optional Default: Smart Prefix Removal ie. 'c:\' or '\computer\'
--description-field-name <VALUE>	The name of the field where the description value is found (only use with Content Server as destination source)	Optional
--name-field-name <VALUE>	The name of the field where the name value is found	Optional
--metadata-mapping <VALUE>	Location of the metadata mapping file (Source dependent)	Optional
--user-mapping <VALUE>	Location of the user mapping file (Source dependent)	Optional
--user-mapping-type <VALUE>	User mapping-type to use (Source dependent) Supported values: off, file	Optional *If not included, default: off
--default-owner <VALUE>	The default owner of the documents being uploaded (Source dependent defaults)	Optional
--classification-mapping <VALUE>	Location of the classification mapping file (Content Server Only)	Optional
--is-records-management	Records management enabled (Content Server Only) This option is used in conjunction with the --classification-mapping option for Content Server. Including --is-records-management ensures that records management classifications are included in the migration. Excluding --is-records-management means that any associated --records management classifications are excluded from the migration.	Optional *If not included, default: false
--auto-upgrade-category	Auto category upgrade (Content Server Only)	Optional *If not included, default: false
--disable-over-write	Disable over-write (SharePoint Only)	Optional *If not included, default: false
--site-url	Site to migrate to (SharePoint Only)	Optional *If not included, default: site specified in the source settings
-s\|--scroll-size	The page size of the query results	Optional *If not included, default: false
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters)	Required
-n\|--nodes-per-request <VALUE>	Number of nodes per request. For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 100
--use-shinydocs-jobs	Send logging data to the shinydocs-jobs index For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index	Optional *If not included, default: false
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

📚 Further reading

🇵 P-S

RemoveField

Example

Removes the field specified within the explicit index.

Command using minimum required inputs:

Cognitivetoolkit.exe RemoveField --field parent --query "C:\match_all.json" -u http://localhost:9200 -i INDEXNAME

Use the RemoveField Options Table to edit your runscript command.

RemoveField Options

OPTION	VALUE	CONDITION
-f\|--field <FIELD>	The name of the index field	Required
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters)	Required
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 100
--use-shinydocs-jobs	Send logging data to the shinydocs-jobs index For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index	Optional *If not included, default: false
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

RemoveIndex

Example

Removes the index from your Analytics Engine, but does not remove the index pattern.

Command using minimum required inputs:

Cognitivetoolkit.exe RemoveIndex -u http://localhost:9200 -i INDEXNAME

Use the RemoveIndex Options Table to edit your runscript command.

RemoveIndex Options Table

OPTION	VALUE	CONDITION
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 1000
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

RemoveItems

Example

Removes items from the Index.

Use Case: There are times when a dataset has been crawled, but then files have later been deleted from the dataset. This will result in files having an invalid path in the index (path-valid = false).

Apply the RemoveItems operation to remove items from the index that are displaying data for files that users have deleted.

Command using minimum required inputs:

Cognitivetoolkit.exe RemoveItems --query "C:\match_field_keyword.json" -u http://localhost:9200 -i INDEXNAME

Use the RemoveItems Options Table to edit your runscript command.

RemoveItems Options Table

OPTION	VALUE	CONDITION
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters). In this case, a match_field_keyword.JSON could be applied. For example: CODE `{ "match" : { "path-valid" : "false" } }`	Required
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 100
--use-shinydocs-jobs	Send logging data to the shinydocs-jobs index For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index	Optional *If not included, default: false
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

📚 Further reading

How to remove incomplete items from the index

RestoreCachedFileSystemPermissions

Example

Restores the FileSystem permissions. Can be used in conjunction with the following tools CachedFileSystemPermissions and SetFileSystemPermissions.

CachedPermsFieldName = "cached-permissions"
CachedInheritanceFieldName = "cached-inheritance"
CachedOwnerFieldName = "cached-owner"

If you run into problems running the SetFileSystemPermissions, you can perform a Restore which uses the fields created by this tool. This restores the original permissions on the actual file in the file system.

Command using minimum required inputs:

Cognitivetoolkit.exe RestoreCachedFileSystemPermissions --query "C:\match_all.json" -u http://localhost:9200 -i INDEXNAME

Use the RestoreCachedFileSystemPermissions Options Table to edit your runscript command.

RestoreCachedFileSystemPermissions Options Table

OPTION	VALUE	CONDITION
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters)	Required
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 100
--use-shinydocs-jobs	Send logging data to the shinydocs-jobs index For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index	Optional *If not included, default: false
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

📚 Further reading

Cache, set and restore file system permissions

RunWithCredentials

Example

Allows you to run the Cognitive Toolkit as a different user.

Command using minimum required inputs:

Cognitivetoolkit.exe RunWithCredentials

Use the RunWithCredentials Options Table to edit your runscript command.

RunWithCredentials Options Table

OPTION

VALUE

CONDITION

--save-credentials

Save credentials on the machine

Optional

*If not included, default: false

--hide

Hide the window

Optional

*If not included, default: false

SaveValue

Example

Saves values for the purpose of using these values via substitution in tools. This tool will also encrypt passwords and user names used for such required options within the --source-settings and or directly within the command line.

Encrypting repository passwords with SaveValue

Command using minimum required inputs:

Cognitivetoolkit.exe SaveValue --query "C:\match_all.json" -u http://localhost:9200 -i INDEXNAME

Use the SaveValue Options Table to edit your runscript command.

SaveValue Options Table

OPTION	VALUE	CONDITION
--list	List all of the saved values	Optional
--save <SAVE>	Name of saved value	Optional
--value <VALUE> Value (Optional)	Value	Optional
--remove <REMOVE>	Remove from saved values	Optional
--no-encryption	Do not encrypt value	Optional *If not included, default: false

SetFileSystemPermissions

Example

Resets permissions on the filesystem. Make sure you retain the Administrators rights on the file system.

CacheFileSystemPermissions must be performed before running this operation.

Command using minimum required inputs:

Cognitivetoolkit.exe SetFileSystemPermissions --exclusions "Administrator,Administrators" -q "C:\match_all.json" -u http://localhost:9200 -i INDEXNAME --identity SHINYAD\identity1,identity2\identity3 --access-control-reason quarantine --rights none

Use the SetFileSystemPermissions Options Table to edit your runscript command.

SetFileSystemPermissions Options Table

OPTION	VALUE	CONDITION
--exclusions <VALUE>	A comma-separated list of users/groups you wish to exclude	Optional
--identity <VALUE>	Identities to add file access control (Comma separated)	Required
--access-control-reason <VALUE>	File access control change identifier. Ie. legal-hold, destruction, public-record	Required
--rights <VALUE>	The level of permissions that will remain on the object (ie. none, read, modify)	Optional *If not included, default: read
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters)	Required
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 100
--use-shinydocs-jobs	Send logging data to the shinydocs-jobs index For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index	Optional *If not included, default: false
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

📚 Further reading

Cache, set and restore file system permissions

🇹 T-Z

TagDuplicate

Example

Tags any files that are considered duplicates and the option to identify the primary duplicate. This must be used in combination after you AddHashAndExtractedText tool with adding hash value.

Command using minimum required inputs:

Cognitivetoolkit.exe TagDuplicate --tag-primary --date-field lastCreationUtc --inspected-field hash --sort-order descending --items-to-process-query "C:\match_all.json" -u http://localhost:9200 -i INDEXNAME

Use the TagDuplicate Options Table to edit your runscript command.

TagDuplicateField Options Table

OPTION	VALUE	CONDITION
--duplicate-field-name <VALUE>	The name of the field that will identify the duplicate	Optional *If not included, default: duplicate-{inspected-field}
--tag-primary	Tag a primary duplicate Primary duplicates are defined subjectively. Your data management strategy could identify primary duplicates by: creation date = original document status = published document	Optional *If not included, default: false
--tag-unique	Tag unique documents The unique status of an item is calculated at the time the TagDuplicate command is run. The item may no longer be considered unique in a later crawl. There is no guarantee an item will remain unique.	Optional *If not included, default: false
--date-field <VALUE>	The name of the date field that will be used to determine the primary duplicate	Optional *If not included, default: creationTimeUtc
--sort-order <VALUE>	The sort order to be used in conjunction with date-field Supported values are: ‘ascending’, ‘descending’	Optional *If not included, default: ascending
--inspected-field <VALUE>	The name of the field that will be compared	Optional *If not included, default: ‘hash’
--use-keyword <VALUE>	Use the keyword field to filter	Optional *If not included, default: true
--aggregate	Use the aggregate method (recommended for large datasets)	Optional *If not included, default: false
-q\|--items-to-process-query <VALUE>	Query for items to process (File or JSON input)	Required
--match-against-query	Query for items to match against	Optional *If not included, default: match everything
--overwrite	Items that are no longer duplicates will be erased from the index.	Optional *If not included, default: false
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 100
--use-shinydocs-jobs	Query will be used against the shinydocs-jobs index	Optional *If not included, default: false
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

📚 Further reading:

Best Practices for Finding and Removing Duplicates

UpdateProperties

Example

Once you have migrated items into Content Server, this tool will help you update the items with category and attributes. (if not completed during migration).

It also allows you to move and/or rename documents within Content Server.

Command using minimum required inputs:

Cognitivetoolkit.exe UpdateProperties --source-settings "C:\sourcesettings.json" -q "C:\match_all.json" -rm -cm "C:\ContentServerRMClassificationsMapping.json" -u http://localhost:9200 -i INDEXNAME

Use the UpdateProperties Options Table to edit your runscript command.

UpdateProperties Options Table

OPTION	VALUE	CONDITION
--source-settings <VALUE>	Path to the settings file containing access information such as the username and password for the data source (Possible data sources: Box, Content Server, Documentum, Exchange, Filenet)	Required
--name	The field name that contains the updated node name	Optional
--description	The field name that contains the updated node description	Optional
--duplicate-resolution	Duplicate name conflict resolution Supported values are: ‘duplicate', ‘skip’, ‘version’	Optional *If not included, default: skip
--parent	The field name that contains the updated node parent	Optional
--records-management-classification	Include this option if you are using records management for classification	Optional *If not included, default: false
--metadata-mapping	Metadata mapping file	Optional
--classification-mapping	Classifications file	Optional
-q\|--query <VALUE>	The path to the search query (File or JSON defining input parameters)	Required
-n\|--nodes-per-request <VALUE>	Number of nodes per request For recommendations on setting this number value, see Setting the "--nodes-per-request" option	Optional *If not included, default: 100
--use-shinydocs-jobs	Send logging data to the shinydocs-jobs index For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index	Optional *If not included, default: false
-u\|--index-server-url <VALUE>	URL of the index server If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.	Required
-i\|--index-name <VALUE>	Name of the index	Required
-t\|--threads <VALUE>	Number of parallel processes to start For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520	Optional *If not included, default: 1
--force	Forcefully remove/suppress prompt for confirmation	Optional *If not included, default: false
--dry-run	Runs everything but doesn’t send nodes to the Analytics Engine The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.	Optional *If not included, default: false
-s\|--silent	Turn off the progress bar	Optional

To access the complete list of available operations from within the Cognitive Toolkit, type the following at the root folder of the Cognitive Toolkit: CognitiveToolkit.exe -h!|--help!