Skip to main content
Skip table of contents

Encyclopedia of Cognitive Toolkit Operations

This guide explains the operations available within Cognitive Toolkit and how to use them.


Basics

Cognitive Toolkit contains 35 operations to help you understand and manage your data. These operations are runscript-based commands which can be run using a minimum set of required parameters or configured for more complex applications.

Run a command

To begin using Cognitive Toolkit, every operation follows these initial steps:

  1. Type cmd in the Windows search box or charms bar, and then click Run as administrator. Alternatively, you may open a CMD prompt using a Service Account with the appropriate permissions.

  2. Change directory (cd) to your Cognitive Toolkit root folder. (ie. the location of the extracted shinydocs-cognitive-toolkit-[version]-[date].zip file)

  3. At theCognitiveToolkit.exe prompt, enter the command for the operation you wish to run using either the minimum required parameters or edit the command as required.

Edit a command

To build a command you must provide, at a minimum, the required parameters identified in the Options Table for that command.

The command can be adapted to your specific requirements and environment by editing the command:

  • Add optional parameters as indicated in the command’s Options Table.

  • Option values are variables based on your environment, as well as the setup/location of source files.

  • Values should be surrounded by double quotation marks (“”) within the command.

For example:

DO: --query "C:\query-match-path-no-classification.json"

DO NOT: --query C:\query-match-path-no-classification.json

Tips

(tick) The first time you use Cognitive Toolkit, you are required to Activate your license.

(tick) Begin using Cognitive Toolkit by crawling an ECM and/or file system. This will build out an index in the Analytics Engine that can then be utilized to perform other operations.

(tick) Certain characters can negatively impact operation of the Cognitive Toolkit. Learn more about using special characters.

Source Setting Files

Source setting files provide the login credentials required to access a content source.

The following source setting files can be edited with your organization’s administrative login credential information:


Encyclopedia

🇦 A-B

Activate

Example

Activates the license and must be performed to initiate use of the Cognitive Toolkit.

Command using minimum required inputs:

CognitiveToolkit.exe activate -p "VALUE"

Use the Activate Options Table to edit the command.

Activate Options Table

OPTION

VALUE

CONDITION

-p<VALUE>

Path to the license file

Required


AddClassifications

Example

Records Content Server classifications data in an index within the Analytics Engine. Fields created will depend on the fields created within the Content Server.

The CrawlContentServer operation must be performed before running AddClassifications.

Command using minimum required inputs:

CognitiveToolkit.exe AddClassifications --query "C:\match_all.json" -u "VALUE" -i "VALUE"

Use the AddClassifications Options Table to edit the command.

AddClassifications Options Table

OPTION

VALUE

CONDITION

--source-settings <VALUE>

Path to the settings file containing access information such as the username and password for the data source (Possible data sources: Box, Content Server, Documentum, Exchange, Filenet)

Optional

(Other parameters will be ignored.)

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters)

Required

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 1000

--use-shinydocs-jobs

Send logging data to the shinydocs-jobs index

For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index

Optional

*If not included, default: false

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


AddExtractedTextFromEngineeringDrawings

Example

Extracts full text from engineering drawings.

Command using minimum required inputs:

CognitiveToolkit.exe AddExtractedTextFromEngineeringDrawings --query "C:\match_all.json" -u "VALUE" -i "VALUE"

Use the AddExtractedTextFromEngineeringDrawings Options Table to edit your runscript command.

AddExtractedTextFromEngineeringDrawings Options Table

OPTION

VALUE

CONDITION

--source-settings <VALUE>

Path to the settings file containing access information such as the username and password for the data source (Content Server, SharePoint, Box, etc...)

Optional

*If not included, default: filesystem

--path-to-micro-station <VALUE> 

Fully Qualified Path to MicroStation

Optional

*If not included, default: ‘C:\Program Files\Bentley\MicroStation CONNECT Edition\MicroStation\microstation.exe’

--is-v8i

MicroStation is V8i

Optional

*If not included, default: false

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters)

Required

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included default: 1000

--use-shinydocs-jobs

Send logging data to the shinydocs-jobs index

For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index

Optional

*If not included, default: false

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included default: 1

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included default: false

 -s|--silent

Turn off the progress bar

Optional


AddFromSqlDatabase

Example

Migrates data using a SQL query to an index within the Analytics Engine.

Command using minimum required inputs:

CognitiveToolKit.exe AddFromSqlDatabase --database-type "oracle" --username "USERNAME" --password ****** --data-source 192.1.1.1:1521 --sql "C:\sqlquery.sql" --sql-parameters "id=tags" --column-prefix "sql" --query "C:\match_all.json" --index-server-url "http://localhost:9200" --index-name INDEXNAME

Use the AddFromSqlDatabase Options Table to edit your runscript command.

AddFromSqlDatabase Options Table

OPTION

VALUE

CONDITION

--database-type <VALUE>

The database type to which to connect.

Supported values: “oracle”, “postgres”, “sqlserver”

Required

  --username <VALUE>          

The database username (login credentials)

Required

  --password <VALUE>                  

The database password (login credentials)

Required

  --data-source <VALUE>                

The data source

Required

  --sql <VALUE>                                

The SQL to run or path to .sql file

Required

  --sql-parameters <VALUE>      

A comma separated list of keys and fields from the index to replace values in the SQL

Optional

  -c|--column-prefix <VALUE>         

The prefix added to column names

Required

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters)

Required

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 1000

--use-shinydocs-jobs

Send logging data to the shinydocs-jobs index

For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index

Optional

*If not included, default: false

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


AddHashAndExtractedText

Example

Generates the hash value for each file specified while also extracting full text.

Note: When a folder is renamed, changing the file path, AddHashAndExtractedText verifies to ensure the file has not changed. If a file is found to be the same, text extraction is not performed on that file.

Command using minimum required inputs:

Cognitivetoolkit.exe AddHashAndExtractedText --query "C:\match_all.json" -u "http://localhost:9200" -i INDEXNAME

Use the AddHashAndExtractedText Options Table to edit your runscript command.

AddHashAndExtractedText Options Table

OPTION

VALUE

CONDITION

--source-settings <VALUE>

Path to the settings file containing access information such as the username and password for the data source (Box, Content Server, Documentum, OneDrive, SharePoint)

Optional

*If not included, default: leave blank for filesystem

--algorithm <VALUE>

Algorithm to apply

Supported values: “sha1”, “sha256”, “sha512”, “md5”

Optional

*If not included, default: md5

 --extraction-service-url <VALUE>           

URL for the extraction service

--extraction-service-url must be included if the default location was changed during installation.

Optional

*If not included, default: “http://localhost:55555

--max-characters <VALUE>

The maximum number of characters for the extracted text field

Optional

*If not included, default: 0 [all characters]

--debug-level <VALUE>

The level of depth of exception messages

Optional

*If not included, default: 20

--action-keyword <VALUE>

The AddHashAndExtractedText command automatically generates hash and extracts text from the source parameter, but it can also be modified to either generate hash OR extract text.

Include the --action-keyword option to specify which of the two actions should be performed: hash or text

--action-keyword “hash” will apply hash, but will not extract full text.

--action-keyword “text” will extract full text, but will not apply hash.

Optional

*If not included, default: both

--time-out <VALUE>

The timeout in seconds for each batch of nodes being processed

Optional

*If not included, default: 60 seconds

--ocr-utility <VALUE>

OCR Utility to use for text extraction

Supported values: “iron”, “azure”, “none”

Optional

*If not included, default: none

--azure-subscription-key <VALUE>

Azure Computer Vision Key is found on the Keys and Endpoint page for your Cognitive Services resource in the Azure Portal

Optional*

*Required, if ocr-utility is "azure"

--azure-subscription-endpoint <VALUE>

Azure Computer Vision Endpoint, this is found on the Keys and Endpoint page for your Cognitive Services resource in the Azure Portal

Optional*

*Required, if ocr-utility is "azure"

  --text-timeout <VALUE>                    

The number of seconds to wait before cancelling text for an item (0 for unlimited).

To ensure that the OCR process is completed on each file, modify the --text-timeout value higher than the default setting of 60 seconds in the exe.config file.

Optional

*If not included, default: 60 seconds

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters)

Required

-n|--nodes-per-request <VALUE>

Number of nodes per request.

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 100

--use-shinydocs-jobs

Send logging data to the shinydocs-jobs index

For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index

Optional

*If not included, default: false

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


AddPathValidation

Example

The AddPathValidation operation checks for changes to the index within the Analytics Engine. The changes that are validated result in a false value and are based on the data source itself.

  • AddPathValidation for File system: checks for files that have been moved, deleted or had a name change.

  • AddPathValidation for Content Server: uses nodeID to check for files that have been deleted.

  • AddPathValidation for SharePoint: checks for files that have been moved or deleted.

  • AddPathValidation for Box: uses fileID to check for files that have been deleted.

  • AddPathValidation for Documentum: uses nodeID to check for files that have been deleted.

  • AddPathValidation for OneDrive: checks for files that have been moved or deleted.

Command using minimum required inputs:

Cognitivetoolkit.exe AddPathValidation --query "C:\match_all.json" -u "http://localhost:9200" -i INDEXNAME

Use the AddPathValidation Options Table to edit your runscript command.

AddPathValidation Options Table

OPTION

VALUE

CONDITION

--source-settings <VALUE>

Path to the settings file containing access information such as the username and password for the data source (Box, Content Server, Documentum, SharePoint, OneDrive)

Optional

*If not included, default: leave blank for filesystem

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters)

Required

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 100

--use-shinydocs-jobs

Send logging data to the shinydocs-jobs index

For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index

Optional

*If not included, default: false

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


AddPropertyData

Example

Pulls property data from an ECM and adds it to the index in the Analytics Engine. Content Server categories and attributes are currently supported.

  • Category attributes - data is pulled via a direct database connector and/or REST API

  • Classification values - data is pulled via REST API

  • Records Management (RM) Classification values - data is pulled via REST API

Command using minimum required inputs:

Cognitivetoolkit.exe AddPropertyData --query "C:\match_all.json" -u "http://localhost:9200" -i INDEXNAME

Use the AddPropertyData Options Table to edit your runscript command.

AddPropertyData Options Table

OPTION

VALUE

CONDITION

--source-settings <VALUE>

Path to the settings file containing access information such as the username and password for the data source (Content Server, SharePoint, Box, etc...)

Optional

*If not included, default: leave blank for filesystem

--legacy-naming

Use legacy naming (CS data only)

If the --legacy-naming option is not used, the fields created in the index are prefixed with prop-

Optional

*If not included, default: false

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters)

Required

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 100

--use-shinydocs-jobs

Send logging data to the shinydocs-jobs index

For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index

Optional

*If not included, default: false

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


🇨 C-D

CacheFileSystemPermissions

Example

Caches the permissions on a file system item and creates the following fields within an index in the Analytics Engine:

  • CachedPermsFieldName = "cached-permissions"

  • CachedInheritanceFieldName = "cached-inheritance"

  • CachedOwnerFieldName = "cached-owner"

Command using minimum required inputs:

Cognitivetoolkit.exe CacheFileSystemPermissions --query "C:\match_all.json" -u "http://localhost:9200" -i INDEXNAME

Use the CacheFileSystemPermissions Options Table to edit your runscript command.

CacheFileSystemPermissions Options Table

OPTION

VALUE

CONDITION

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters)

Required

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 100

--use-shinydocs-jobs

Send logging data to the shinydocs-jobs index

For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index

Optional

*If not included, default: false

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


CopyItems

Example

Used to copy an object from one index to another index within the Analytics Engine.

Command using minimum required inputs:

Cognitivetoolkit.exe CopyItems --destination-index-url http://localhost:9200 --destination-index-name INDEXNAME_1 -u http://localhost:9200 -i INDEXNAME_2 --query C:\match_all.json

Use the CopyItems Options Table to edit your runscript command.

CopyItems Options Table

OPTION

VALUE

CONDITION

--destination-index-url <VALUE>

The destination index server

Required

--destination-index-name <VALUE>

The destination index name

Required

--destination-index-type <VALUE>

The destination index type

Optional

--destination-index-shards <VALUE>

The destination index number of shards

Optional

*If not included, default: 5

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters)

Required

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 100

--use-shinydocs-jobs

Send logging data to the shinydocs-jobs index

For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index

Optional

*If not included, default: false

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


CrawlBox

Example

Crawls for the metadata within Box and adds it to an index in the Analytics Engine.

Command using minimum required inputs:

Cognitivetoolkit.exe CrawlBox --source-settings "C:SourceFiles\Box.json" --query "C:\box.json" -u "http://localhost:9200" -i INDEXNAME

Use the CrawlBox Options Table to edit your runscript command.

CrawlBox Options Table

OPTION

VALUE

CONDITION

--source-settings <VALUE>

Path to the settings file containing access information such as the username and password for the Box data source

Required

--start-folder-id <VALUE>

Default Start Folder Id

Optional

*If not included, default: 0

--users <VALUE>

Comma delimited list of user logins and folder ids in format <user_login>:<start_folder_id>

Optional

*If not included, default: all users

--include-shared-files

Include folders shared by other users

Optional

*If not included, default: false

--crawl-collaborators

Crawl and capture Box groups and their users

Optional

*If not included, default: false

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters)

Required

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 100

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


CrawlContentServer

Example

Crawls for the metadata within Content Server and adds it to an index in the Analytics Engine.

Use this tool to crawl the Content Server database directly or via REST API.

Crawling the Content Server database directly leaves REST API available for other applications.

Command using minimum required inputs:

Cognitivetoolkit.exe CrawlContentServer --source-settings "C:SourceFiles\ContentServer.json" --query "C:\content_server.json" -u "http://localhost:9200" -i INDEXNAME

Use the CrawlContentServer Options Table to edit your runscript command.

CrawlContentServer Options Table

OPTION

VALUE

CONDITION

--source-settings <VALUE>

Path to the settings file containing access information such as the username and password for the Content Server database

See Resources for:

  • sample --source-settings file for crawling Content Server database directly

  • sample --source-settings file for crawling via REST API

Optional

*If not included, default: crawl Content Server via the REST API

--starting-folder-id <VALUE>

The ID of the folder from which to begin traversing

Optional

*If not included, default: 0

--allowed-types <VALUE>

A comma delimited list of content type ids you wish to crawl

Optional

*If not included, default: 1,144,736,749

--modified-after <VALUE>

Items changed on / after this date

Note: To use the --modified-after option with Content Server, the date of the documents has to be within 30 days of of the date you run the tool

Example: Today’s date: 2023-03-28
Modified-after-date: 2023-03-01 (This will crawl items after March 1st)

Modified-after-date: 2023 -01-25 (This will fail with error: date_exceeds allowed number of days)

Supported date formats are:

yyyy-MM-dd, yyyy-MM-dd HH:mm, yyyy-MM-ddTHH:mm. Example: 2018-12-20 or 2018-12-20 19:42.

Supported relative date formats are:

now, now+/-1d[/d], now+/-1m[/d], now+/-1y[/d]

Optional

--delta

Use the audit tables to detect changes based on the --modified-after option with a default of today’s date.

Optional

*If included, crawl is performed via REST

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters)

Required

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 1000

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


CrawlContentServerWorkflows

Example

Crawls for the metadata within Content Server workflows and adds it to an index in the Analytics Engine.

Use this tool to crawl the Content Server database directly or via REST API.

Crawling the Content Server database directly leaves REST API available for other applications.

Command using minimum required inputs:

Cognitivetoolkit.exe CrawlContentServerWorkflows --source-settings "C:SourceFiles\ContentServer.json" --query "C:\content_server.json" -u "http://localhost:9200" -i INDEXNAME

Use the CrawlContentServerWorkflows Options Table to edit your runscript command.

CrawlContentServerWorkflows Options Table

OPTION

VALUE

CONDITION

--source-settings <VALUE>

Path to the settings file containing access information such as the username and password for the Content Server database

See Resources for:

  • sample --source-settings file for crawling Content Server database directly

  • sample --source-settings file for crawling via REST API

Optional

*If not included, default: crawl Content Server via the REST API

--process-status <VALUE>

Process archived status (Supported values: "archived, noarchive")

Supported values: “archived”, “noarchive”

Optional

--initiated-after <VALUE>

Items changed on/after this date.

Supported date formats are:

yyyy-MM-dd, yyyy-MM-dd HH:mm, yyyy-MM-ddTHH:mm. Example: 2018-12-20 or 2018-12-20 19:42.

Supported relative date formats are:

now, now+/-1d[/d], now+/-1m[/d], now+/-1y[/d]

Optional

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters)

Required

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 1000

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


CrawlDocumentum

Example

Crawls for the metadata within Documentum and adds it to an index in the Analytics Engine.

Command using minimum required inputs:

Cognitivetoolkit.exe CrawlDocumentum --source-settings "C:SourceFiles\Documentum.json" --query ""C:\documentum.json"" -u "http://localhost:9200" -i INDEXNAME

Use the CrawlDocumentum Options Table to edit your runscript command.

CrawlDocumentum Options Table

OPTION

VALUE

CONDITION

--source-settings <VALUE>

Path to the settings file containing access information such as the username and password for the Documentum data source

Required

--use-single-index

Use single index

Optional

*If not included, default: false

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters)

Required

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 1000

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


CrawlExchange

Example

Crawls for the metadata within Microsoft Exchange and adds it to an index in the Analytics Engine.

Command using minimum required inputs:

Cognitivetoolkit.exe CrawlExchange --source-settings "C:SourceFiles\Exchange.json --email tester1@exchange.local -u http://localhost:9200 -i INDEXNAME

Use the CrawlExchange Options Table to edit your runscript command.

CrawlExchange Options Table

OPTION

VALUE

CONDITION

--page-size <VALUE>

Number of exchange items to retrieve in a single request

Optional

*If not included, default: 500

--max-characters <VALUE>

The maximum number of characters for the extracted email text field

Note: Setting this option when crawling Exchange Online will restrict the number of characters displayed in Enterprise Search results.

Optional

*If not included, default: 0 [all characters]

--source-settings <VALUE>

Path to the settings file containing access information such as the username and password for the Exchange data source

Required

--crawl-public-folders

Crawl public folders

Optional

*If not included, default: false

--email <VALUE>

Comma separated list of email addresses to crawl

Note: Leave blank to crawl all mailboxes

Use when crawling public folders

Required

--exclude-auto-replies

Exclude auto-replies (For example, out-of-office replies) from the index

Optional

*If not included, default: false

--ignore-inline-attachments

Excludes all inline attachments

Optional

*If not included, default: false

--ignore-body

Excludes the body text of exchange items from being indexed

Optional

*If not included, default: false

--ignore-attachment-extensions <VALUE>

Comma separated list of extensions of the inline attachments that should be ignored

Optional

--ignore-folders <VALUE>

Comma separated list of folder names to be excluded from crawling

Supported values are:

drafts, deleted items, junk email

Optional

--after-last-modified-date <VALUE>

Items changed on/after this date

Supported date formats are:

yyyy-MM-dd, yyyy-MM-dd HH:mm, yyyy-MM-ddTHH:mm. Example: 2018-12-20 or 2018-12-20 19:42.

Supported relative date formats are:

now, now+/-1d[/d], now+/-1m[/d], now+/-1y[/d]

Optional

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 1000

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


CrawlFileNet

Example

Crawls for the metadata within FileNet and adds it to an index in the Analytics Engine.

Command using minimum required inputs:

Cognitivetoolkit.exe CrawlFileNet --source-settings "C:SourceFiles\FileNet.json" -u http://localhost:9200 -i INDEXNAME

Use the CrawlFileNet Options Table to edit your runscript command.

CrawlFileNet Options Table

OPTION

VALUE

CONDITION

--source-settings <VALUE>

Path to the settings file containing access information such as the username and password for the FileNet data source

Required

--class-definition <VALUE>

The document class you want to filter for

Optional

*If not included, default: 'All'

-exclude-subclasses

Exclude subclasses

Optional

*If not included, default: false

--crawl-hidden

Crawl hidden document classes

Optional

*If not included, default: false

--where-clause <VALUE>

FileNetSQL where clause will override the dates when used

Optional

--before-date-last-modified <VALUE>

Crawl everything before this date

Optional

*If not included, default: Now

--after-last-modified-date <VALUE>

Items changed on/after this date

Supported date formats:

yyyy-MM-dd, yyyy-MM-dd HH:mm, yyyy-MM-ddTHH:mm Example: 2018-12-20 or 2018-12-20 19:42

Optional

*If not included, default: 1970-01-01

--interval <VALUE>

The number of months to crawl at a time

Optional

*If not included, default: 3

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 1000

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


CrawlFileSystem

Example

Base operation for data discovery. Generally performed prior to running any other Cognitive Toolkit operation. This operation crawls the specified path (or multiple paths) for metadata. The metadata is then stored in an index within the Analytics Engine where it can be further mined for insights.

Command using minimum required inputs:

CognitiveToolkit.exe CrawlFileSystem --path-file "C:\path.json" -u http://localhost:9200 -i INDEXNAME

Use the CrawlFileSystem Options Table to edit your runscript command.

CrawlFileSystem Options Table

OPTION

VALUE

CONDITION

--path <VALUE>

OR

--path-file <VALUE>

Single path to crawl

OR

Text file that contains multiple paths to crawl

*At least one of these two options must be included in the runscript command.
If --path is not used, --path-file must be used. If --path-file is not used, --path must be used.

Optional*

 

Optional*

--include-hidden

Includes hidden files in the crawl

Optional

*If not included default: false

--include-system

Includes system files in the crawl

Optional

*If not included default: false

-a|--add-field-owner

Add the Owner field to the index

Optional

*If not included default: false

--include-reparse

Includes reparse items

A file or directory can contain a reparse point, which is a collection of user-defined data. The format of this data is understood by the application which stores the data, and a file system filter, which interprets the data and processes the file. When an application sets a reparse point, it stores this data, plus a reparse tag, which uniquely identifies the data it is storing.

Optional

*If not included default: false

--after-date-last-modified

Crawls everything after this date*

Supported date formats:

yyyy-MM-dd, yyyy-MM-dd HH:mm, yyyy-MM-ddTHH:mm Example: 2018-12-20 or 2018-12-20 19:42

Supported relative date formats:

now, now+/-ld[/d], now+/-lm[/d], now +/-ly[/d]

Optional

*If not included default: all

--validate

Validates file paths

Issue: Using the option --validate in a folder with more than 1024 folders produces an error.
Solution: In the elasticsearch.yml file, set the following parameter to a number that exceeds your folder amount:

indices.query.bool.max_clause_count

Optional

*If not included default: false

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 1000

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


CrawlMaximo

Example

Crawls for the metadata within Maximo and adds it to an index in the Analytics Engine.

Command using minimum required inputs:

Cognitivetoolkit.exe CrawlMaximo --database-type postgres --schema-type work-order --connection-string User ID=postgres;Password=mypassword;Host=localhost;Port=5435;Database=Maximodatabase --query select assetnum,workorderid,worktype from work_orders --key-fields workorderid --index-server-url http://localhost:9200 --index-name INDEXNAME

Use the CrawlMaximo Options Table to edit your runscript command.

CrawlMaximo Options Table

OPTION

VALUE

CONDITION

--database-type <VALUE>

The type of database

Supported types: 'oracle', 'postgres', 'sqlserver'

Required

--schema-type <VALUE>

The type of items being crawled

Supported types: 'work-order', 'condition-report', 'item', 'location', 'oem', 'company'

Required

--connection-string <VALUE>

The database Connection String, used to connect to the database

Required

--sql-query <VALUE>

The SQL query to retrieve records

Required

--key-fields <VALUE>

A comma separated list of fields that produce a unique key

Required

--database-timeout <VALUE>

The length of time (in seconds) to wait for a connection to the server before terminating the attempt and generating an error

Optional

*If not included, default: 120

--connection-string-password <VALUE>

A password to replace '{{password}}' in the connection string

Optional

--chunk-size <VALUE>

The number of items sent to the index in a single request

Optional

*If not included, default: 1000

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

-n|--nodes-per-request <VALUE>

Number of nodes per request.

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 100

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


CrawlOneDrive

Example

Crawls for the metadata within OneDrive and adds it to an index in the Analytics Engine.

Command using minimum required inputs:

Cognitivetoolkit.exe CrawlOneDrive --source-settings C:\onedrive.json -u http://localhost:9200 -i INDEXNAME --use-single-index

Use the CrawlOneDrive Options Table to edit your runscript command.

CrawlOneDrive Options

OPTION

VALUE

CONDITION

--source-settings <VALUE>

Path to the settings file containing access information such as the username and password for the OneDrive data source

Required

--use-single-index

Use single index

Optional

*If not included, default: false

--specific-accounts <VALUE>

Specify email addresses to crawl (comma-separated)

Example: ‘email1@domain.com', ‘email2@domain.com’, ‘email3@domain.com’, 'email4@domain.com’

Optional

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 1000

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


CrawlSharePointOnline

Example

Crawls for the metadata within SharePoint Online and adds it to an index in the Analytics Engine.

Command using minimum required inputs:

Cognitivetoolkit.exe CrawlSharePoint--source-settings C:\sharepoint.json -u http://localhost:9200 -i "shinydrive index" --use-single-index

Use the CrawlSharePointOnline Options Table to edit your runscript command.

CrawlSharePointOnline Options Table

OPTION

VALUE

CONDITION

--source-settings <VALUE>

Path to the settings file containing access information such as the username and password for the SharePoint Online data source

Required

--crawl-subsites

Crawl all of the subsites

Optional

*If not included, default: false

--remove-standard-filter

Remove filter for non-standard document libraries (Hidden, System, etc) 

Optional

*If not included, default: false

--remove-document-library-type-filter

Remove filter for non-document library types

Optional

*If not included, default: false

--remove-site-assets-filter

Remove filter for site asset libraries 

Optional

*If not included, default: false

--crawl-site-collection

Crawl From SiteCollection (overrides CrawlSubsites option)

Optional

*If not included, default: false

--crawl-from-index <VALUE>

Crawl From Previously Run 'CrawlSharePointSites' index

Optional

--filter <VALUE>

SharePoint Filter File/JSON

Note: The --filter option for CrawlSharePointOnline is only used in conjunction with --crawl-site-collection. Results will not be filtered unless the --crawl-site-collection option is also used.

Optional

 --use-single-index

Use single index

Optional

*If not included, default: false

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 1000

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

--index-type <VALUE>

Type name for index objects

Optional

*If not included, default: shinydocs

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false


CrawlSharePointOnPrem

Example

Crawls for the metadata within SharePoint On-Premise and adds it to an index in the Analytics Engine.

Command using minimum required inputs:

Cognitivetoolkit.exe CrawlSharePointOnPrem--source-settings C:\sharepoint.json -u http://localhost:9200 -i "shinydrive index" --use-single-index

Use the CrawlSharePointOnPrem Options Table to edit your runscript command.

CrawlSharePointOnPrem Options Table

OPTION

VALUE

CONDITION

--source-settings <VALUE>

Path to the settings file containing access information such as the username and password for the SharePoint On-Premise data source

Required

--crawl-subsites

Crawl all of the subsites.

Optional

*If not included, default: false

--hidden

Crawl hidden lists

Optional

*If not included, default: false

--catalog

Crawl catalog lists

Optional

*If not included, default: false

--application

Crawl application lists

Optional

*If not included, default: false

--private

Crawl private lists

Optional

*If not included, default: false

 --use-single-index

Use single index

Optional

*If not included, default: false

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 1000

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


CrawlSharePointOnlineSites

Example

Crawls SharepointOnlineSites to create a list of all the site names and add them to an index in the Analytics Engine. This information can then be used to crawl specific subsites using the CrawlSharePointOnline or CrawlSharePointOnPrem operations.

Command using minimum required inputs:

Cognitivetoolkit.exe CrawlSharePointSites --source-settings "C:\sharepoint.json" -u http://localhost:9200 -i INDEXNAME

Use the CrawlSharePointSites Options Table to edit your runscript command.

CrawlSharePointOnlineSites Options Table

OPTION

VALUE

CONDITION

--source-settings <VALUE>

Path to the settings file containing access information such as the username and password for the SharePointOnlineSites data source

Required

--keyword-query <VALUE>

Additional Keyword Query parameters

Optional

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 1000

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


Dispose

Example

Deletes the specified data/files based on the query. This will add a field to an index in the Analytics Engine called [dispose] with value of true if successful.

For confirmation, Dispose identifies the number of files that will be deleted before the dispose runs.

Command using minimum required inputs:

Cognitivetoolkit.exe Dispose -query "C:\disposeQuery.json" -u http://localhost:9200 -i INDEXNAME

Use the Dispose Options Table to edit your runscript command.

Dispose Options Table

OPTION

VALUE

CONDITION

--source-settings <VALUE>

Path to the settings file containing access information such as the username and password for the data source (Supported data sources: Content Server, Documentum, File System, OneDrive, SharePoint Online and SharePointOnPrem)

Optional

*If not included, default: filesystem

--verify-hash

Verifies that the file still matches the hash value before file deletion

Optional

*If not included, default: false

--hash-field <VALUE>

The hash field

Optional, but required if --verify-hash is specified

--hash-algorithm <VALUE>

The hash algorithm to use when verifying

Supported values: “sha1”, “sha256”, “sha512”, “md5”

Optional, but required if --verify-hash is specified

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters)

Required

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 100

--use-shinydocs-jobs

Send logging data to the shinydocs-jobs index

For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index

Optional

*If not included, default: false

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

--force

Forcefully remove / Suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional

🇪 E-K

ExportFromIndex

Example

Specify and export fields/values from an index in the Analytics Engine into a comma-separated value (csv) file.

Command using minimum required inputs:

Cognitivetoolkit.exe ExportFromIndex --fields creationTimeUtc,name,extension,path --query "C:\match_all.json" -u http://localhost:9200 -i INDEXNAME

Use the ExportFromIndex Options Table to edit your runscript command.

ExportFromIndex Options Table

OPTION

VALUE

CONDITION

--fields <VALUE>

Comma-delimited list of fields to include in the export

Required

--filename <VALUE>

File to export the index to

Optional

*If not included, default: export.csv

--search-index-name <VALUE>

Index to search for the duplicates

Optional

--max-file-size <VALUE>

Maximum file size in MB, limited to 1GB

Optional

*If not included, default: 1GB

--inspected-field <VALUE>

The name of the field that the duplicates were tagged on

Optional

*If not included, default: hash

--duplicate-field <VALUE>

The name of the field that identifies the duplicate

Optional

*If not included, default: duplicate-hash

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters)

Required

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 100

--use-shinydocs-jobs

Send logging data to the shinydocs-jobs index

For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index

Optional

*If not included, default: false

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


ExtractAndCrawlPst

Example

Extracts text and performs a

crawl of pst (email) files.

Command using minimum required inputs:

Cognitivetoolkit.exe ExtractAndCrawlPst --query "C:\query-match-extension-pst-not-extracted.json" --index-server-url http://localhost:9200 --index-name INDEXNAME

Use the ExtractAndCrawlPst Options Table to edit your runscript command.

ExtractAndCrawlPst Options Table

OPTION

VALUE

CONDITION

--create-duplicates

Allow duplicate files

If this option is utilized, pst files that are crawled more than once will be duplicated in the file on the fileshare and in the index.

By default, if an existing pst file is found, it will be skipped during the operation.

Optional

*If not included, default: false

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters)

Required

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 100

--use-shinydocs-jobs

Send logging data to the shinydocs-jobs index

For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index

Optional

*If not included, default: false

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


ExtractEntities

Example

An information extraction technique whereby key elements from text are identified and classified into predefined categories. This transforms unstructured data to structured data that is machine readable and available for standard processing.

Command using minimum required inputs:

Cognitivetoolkit.exe ExtractEntities --extraction-service-url http://localhost:8181/ --query "C:\query-match-fullText-no-entities.json" --index-server-url http://localhost:9200 --index-name INDEXNAME

Use the ExtractEntities Options Table to edit your runscript command.

ExtractEntities Options Table

OPTION

VALUE

CONDITION

 --field <VALUE>

 The name of the index field to store the extracted entities

Optional

*If not included, default: entities

-c|--classes <VALUE>

Comma separated list of entity classes to extract from text. The extracted classes is dependent on the classifier model used to perform entity extraction.

Optional

*If not included, default: all classes, Example:"LOCATION,PERSON,ORGANIZATION"

-e|--extract-from <VALUE>

The name of the index field from which to extract entities

Optional

*If not included, default: fullText

--preserve-spacing

Preserve spacing from extracted entities. Depending on the source file, there may be unwanted line breaks in extracted entities.

Optional

*If not included, default: false

--extraction-service-url <EXTRACTION_SERVICE_URL>

URL for the entity extraction service

Optional

*If not included, default: http://localhost:55555

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters)

Required

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 100

--use-shinydocs-jobs

Send logging data to the shinydocs-jobs index

For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index

Optional

*If not included, default: false

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


FindSimilarClassification

Example

Adds classifications towards documents based on their similarity to other, already-classified documents in the Analytics Engine.

For example, choose 5-10 documents of a similar kind and classify them by their document type, such as offer letters or purchase orders. The Shinydocs Cognitive Suite will “learn” from those examples and will be able to find other similar documents for classification.

Command using minimum required inputs:

Cognitivetoolkit.exe FindSimilarClassification --classification-field classification --query "C:\query-match-path-no-classification.json" --tokens 100 --threshold 75 --min-docs 5 --min-terms 1 --match 1 --index-server-url http://localhost:9200 --index-name INDEXNAME

Use the FindSimilarClassification Options Table to edit your runscript command.

FindSimilarClassification Options Table

OPTION

VALUE

CONDITION

 --field-list <VALUE>

Fields to compare

Optional

*If not included, default: fullText

--classification-field <VALUE>

Name of the field where classifications are found

Required

--tokens <VALUE>

Number of tokens to compare

Optional

*If not included, default: 500

--min-docs <VALUE>

Minimum document frequency

Optional

*If not included, default: 5

--min-terms <VALUE>

Minimum term frequency

Optional

*If not included, default: 2

--max-docs <VALUE>

Maximum document frequency

Optional

--min-word-length <VALUE>

Minimum word length [number of characters]

Optional

--threshold <VALUE>

Similarity threshold [minimum-should-match]

Optional

*If not included, default: 90 (%)

--match <VALUE>

Number of documents to match

Optional

*If not included, default: 5

--size-similarity <VALUE>

Size similarity threshold (the percent delta between sizes)

Optional

*If not included, default: 20

--inclusion <VALUE>

File extension inclusion list (Comma delimited)

Optional

--exclusion <VALUE>

File extension exclusion list (Comma delimited)

Optional

-print-query

Print the Elasticsearch query in the logs. Does not run operation!

Optional

*If not included, default: false

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters)

Required

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 100

--use-shinydocs-jobs

Send logging data to the shinydocs-jobs index

For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index

Optional

*If not included, default: false

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


🇱 L-O

Migrate

Example

Migrates data/files from one source to another

NOTES:

  • A crawl of the origin source data is required before performing a migration.

  • When Migrating to SharePoint Online, field names are NOT case-sensitive at this time.

  • When Migrating File Share to File Share (internal servers only), permissions and ownership are not carried over

Command using minimum required inputs:

Cognitivetoolkit.exe Migrate --destination-source-settings "C:\contentserver.json" --start-location 123456 "C:\match_all.json" -u http://localhost:9200 -i INDEXNAME --path-prefix-to-remove "\\TestFolder" --classification-mapping "C:\ClassificationsMapping.json"

Use the Migrate Options Table to edit your runscript command.

Migrate Options Table

OPTION

VALUE

CONDITION

--origin-source-settings <VALUE>

The origin settings file

Optional

*If not included, default: File System Source

--destination-source-settings <VALUE>

The destination settings file

Required

--start-location <VALUE>

The default starting location

Optional depending on other input. Ignored otherwise.

--location-field <VALUE>

The field name in the index indicating the destination location for the file migration

Optional

--migrate-versions

Migrate all versions (Source dependent)

Optional

*If not included, default: false

--migrate-permissions

Migrate all permissions (Source dependent)

Optional

*If not included, default: false

--path-prefix-to-remove <VALUE>

Removes the provided text from the beginning of path

Optional

Default: Smart Prefix Removal ie. 'c:\' or '\computer\'

--description-field-name <VALUE>

The name of the field where the description value is found (only use with Content Server as destination source)

Optional

--name-field-name <VALUE>

The name of the field where the name value is found

Optional

--metadata-mapping <VALUE>

Location of the metadata mapping file (Source dependent)

Optional

--user-mapping <VALUE>

Location of the user mapping file (Source dependent)

Optional

--user-mapping-type <VALUE>

User mapping-type to use (Source dependent)

Supported values: off, file

Optional

*If not included, default: off

--default-owner <VALUE>

The default owner of the documents being uploaded (Source dependent defaults)

Optional

--classification-mapping <VALUE>

Location of the classification mapping file (Content Server Only)

Optional

--is-records-management

Records management enabled (Content Server Only)

This option is used in conjunction with the --classification-mapping option for Content Server.

Including --is-records-management ensures that records management classifications are included in the migration.

Excluding --is-records-management means that any associated --records management classifications are excluded from the migration.

Optional

*If not included, default: false

--auto-upgrade-category

Auto category upgrade (Content Server Only)

Optional

*If not included, default: false

--disable-over-write

Disable over-write (SharePoint Only)

Optional

*If not included, default: false

--site-url

Site to migrate to (SharePoint Only)

Optional

*If not included, default: site specified in the source settings

-s|--scroll-size

The page size of the query results

Optional

*If not included, default: false

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters)

Required

-n|--nodes-per-request <VALUE>

Number of nodes per request.

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 100

--use-shinydocs-jobs

Send logging data to the shinydocs-jobs index

For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index

Optional

*If not included, default: false

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


🇵 P-S

RemoveField

Example

Removes the field specified within the explicit index.

Command using minimum required inputs:

Cognitivetoolkit.exe RemoveField --field parent --query "C:\match_all.json" -u http://localhost:9200 -i INDEXNAME

Use the RemoveField Options Table to edit your runscript command.

RemoveField Options

OPTION

VALUE

CONDITION

-f|--field <FIELD>

The name of the index field

Required

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters)

Required

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 100

--use-shinydocs-jobs

Send logging data to the shinydocs-jobs index

For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index

Optional

*If not included, default: false

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


RemoveIndex

Example

Removes the index from your Analytics Engine, but does not remove the index pattern.

Command using minimum required inputs:

Cognitivetoolkit.exe RemoveIndex -u http://localhost:9200 -i INDEXNAME

Use the RemoveIndex Options Table to edit your runscript command.

RemoveIndex Options Table

OPTION

VALUE

CONDITION

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 1000

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


RemoveItems

Example

Removes items from the Index.

Use Case: There are times when a dataset has been crawled, but then files have later been deleted from the dataset. This will result in files having an invalid path in the index (path-valid = false).

Apply the RemoveItems operation to remove items from the index that are displaying data for files that users have deleted.

Command using minimum required inputs:

Cognitivetoolkit.exe RemoveItems --query "C:\match_field_keyword.json" -u http://localhost:9200 -i INDEXNAME

Use the RemoveItems Options Table to edit your runscript command.

RemoveItems Options Table

OPTION

VALUE

CONDITION

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters). In this case, a

match_field_keyword.JSON could be applied. For example:

CODE
{
  "match" : {
    "path-valid" : "false"
  }
}

Required

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 100

--use-shinydocs-jobs

Send logging data to the shinydocs-jobs index

For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index

Optional

*If not included, default: false

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


RestoreCachedFileSystemPermissions

Example

Restores the FileSystem permissions. Can be used in conjunction with the following tools CachedFileSystemPermissions and SetFileSystemPermissions.

  • CachedPermsFieldName = "cached-permissions"

  • CachedInheritanceFieldName = "cached-inheritance"

  • CachedOwnerFieldName = "cached-owner"

If you run into problems running the SetFileSystemPermissions, you can perform a Restore which uses the fields created by this tool. This restores the original permissions on the actual file in the file system.

Command using minimum required inputs:

Cognitivetoolkit.exe RestoreCachedFileSystemPermissions --query "C:\match_all.json" -u http://localhost:9200 -i INDEXNAME

Use the RestoreCachedFileSystemPermissions Options Table to edit your runscript command.

RestoreCachedFileSystemPermissions Options Table

OPTION

VALUE

CONDITION

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters)

Required

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 100

--use-shinydocs-jobs

Send logging data to the shinydocs-jobs index

For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index

Optional

*If not included, default: false

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


RunWithCredentials

Example

Allows you to run the Cognitive Toolkit as a different user.

Command using minimum required inputs:

Cognitivetoolkit.exe RunWithCredentials

Use the RunWithCredentials Options Table to edit your runscript command.

RunWithCredentials Options Table

OPTION

VALUE

CONDITION

--save-credentials

Save credentials on the machine

Optional

*If not included, default: false

--hide

Hide the window

Optional

*If not included, default: false


SaveValue

Example

Saves values for the purpose of using these values via substitution in tools. This tool will also encrypt passwords and user names used for such required options within the --source-settings and or directly within the command line.

Encrypting repository passwords with SaveValue

Command using minimum required inputs:

Cognitivetoolkit.exe SaveValue --query "C:\match_all.json" -u http://localhost:9200 -i INDEXNAME

Use the SaveValue Options Table to edit your runscript command.

SaveValue Options Table

OPTION

VALUE

CONDITION

--list

List all of the saved values

Optional

--save <SAVE>

Name of saved value

Optional

--value <VALUE>    Value (Optional)

Value

Optional

--remove <REMOVE>

Remove from saved values

Optional

--no-encryption

Do not encrypt value

Optional

*If not included, default: false


SetFileSystemPermissions

Example

Resets permissions on the filesystem. Make sure you retain the Administrators rights on the file system.

CacheFileSystemPermissions must be performed before running this operation.

Command using minimum required inputs:

Cognitivetoolkit.exe SetFileSystemPermissions --exclusions "Administrator,Administrators" -q "C:\match_all.json" -u http://localhost:9200 -i INDEXNAME --identity SHINYAD\identity1,identity2\identity3 --access-control-reason quarantine --rights none

Use the SetFileSystemPermissions Options Table to edit your runscript command.

SetFileSystemPermissions Options Table

OPTION

VALUE

CONDITION

 --exclusions <VALUE>

A comma-separated list of users/groups you wish to exclude

Optional

--identity <VALUE>

Identities to add file access control (Comma separated)

Required

--access-control-reason <VALUE>

File access control change identifier. Ie. legal-hold, destruction, public-record

Required

--rights <VALUE>

The level of permissions that will remain on the object (ie. none, read, modify)

Optional

*If not included, default: read

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters)

Required

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 100

--use-shinydocs-jobs

Send logging data to the shinydocs-jobs index

For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index

Optional

*If not included, default: false

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


🇹 T-Z

TagDuplicate

Example

Tags any files that are considered duplicates and the option to identify the primary duplicate. This must be used in combination after you AddHashAndExtractedText tool with adding hash value.

Command using minimum required inputs:

Cognitivetoolkit.exe TagDuplicate --tag-primary --date-field lastCreationUtc --inspected-field hash --sort-order descending --items-to-process-query "C:\match_all.json" -u http://localhost:9200 -i INDEXNAME

Use the TagDuplicate Options Table to edit your runscript command.

TagDuplicateField Options Table

OPTION

VALUE

CONDITION

--duplicate-field-name <VALUE>

The name of the field that will identify the duplicate

Optional

*If not included, default: duplicate-{inspected-field}

--tag-primary

Tag a primary duplicate

Primary duplicates are defined subjectively. Your data management strategy could identify primary duplicates by:

  • creation date = original document

  • status = published document

Optional

*If not included, default: false

--tag-unique

Tag unique documents

The unique status of an item is calculated at the time the TagDuplicate command is run. The item may no longer be considered unique in a later crawl. There is no guarantee an item will remain unique.

Optional

*If not included, default: false

--date-field <VALUE>

The name of the date field that will be used to determine the primary duplicate

Optional

*If not included, default: creationTimeUtc

--sort-order <VALUE>

The sort order to be used in conjunction with date-field

Supported values are:

‘ascending’, ‘descending’

Optional

*If not included, default: ascending

--inspected-field <VALUE>

The name of the field that will be compared

Optional

*If not included, default: ‘hash’

--use-keyword <VALUE>

Use the keyword field to filter

Optional

*If not included, default: true

--aggregate

Use the aggregate method (recommended for large datasets)

Optional

*If not included, default: false

-q|--items-to-process-query <VALUE>

Query for items to process (File or JSON input)

Required

--match-against-query

Query for items to match against

Optional

*If not included, default: match everything

--overwrite

Items that are no longer duplicates will be erased from the index.

Optional

*If not included, default: false

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 100

--use-shinydocs-jobs

Query will be used against the shinydocs-jobs index

Optional

*If not included, default: false

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


UpdateProperties

Example

Once you have migrated items into Content Server, this tool will help you update the items with category and attributes. (if not completed during migration).

It also allows you to move and/or rename documents within Content Server.

Command using minimum required inputs:

Cognitivetoolkit.exe UpdateProperties --source-settings "C:\sourcesettings.json" -q "C:\match_all.json" -rm -cm "C:\ContentServerRMClassificationsMapping.json" -u http://localhost:9200 -i INDEXNAME

Use the UpdateProperties Options Table to edit your runscript command.

UpdateProperties Options Table

OPTION

VALUE

CONDITION

--source-settings <VALUE>

Path to the settings file containing access information such as the username and password for the data source (Possible data sources: Box, Content Server, Documentum, Exchange, Filenet)

Required

--name

The field name that contains the updated node name

Optional

--description

The field name that contains the updated node description

Optional

--duplicate-resolution

Duplicate name conflict resolution

Supported values are:

‘duplicate', ‘skip’, ‘version’

Optional

*If not included, default: skip

--parent

The field name that contains the updated node parent

Optional

--records-management-classification

Include this option if you are using records management for classification

Optional

*If not included, default: false

--metadata-mapping

Metadata mapping file

Optional

--classification-mapping

Classifications file

Optional

-q|--query <VALUE>

The path to the search query (File or JSON defining input parameters)

Required

-n|--nodes-per-request <VALUE>

Number of nodes per request

For recommendations on setting this number value, see Setting the "--nodes-per-request" option

Optional

*If not included, default: 100

--use-shinydocs-jobs

Send logging data to the shinydocs-jobs index

For recommendations on setting up the shinydocs-jobs index, see Setting up the shinydocs-jobs index

Optional

*If not included, default: false

-u|--index-server-url <VALUE>

URL of the index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the localhost.

Required

-i|--index-name <VALUE>

Name of the index

Required

-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see https://shinydocs.atlassian.net/wiki/spaces/SHINY/pages/2356117520

Optional

*If not included, default: 1

--force

Forcefully remove/suppress prompt for confirmation

Optional

*If not included, default: false

--dry-run

Runs everything but doesn’t send nodes to the Analytics Engine

The --dry-run option allows you to quickly see how many items will be processed without actually creating the index.

Optional

*If not included, default: false

 -s|--silent

Turn off the progress bar

Optional


To access the complete list of available operations from within the Cognitive Toolkit, type the following at the root folder of the Cognitive Toolkit: CognitiveToolkit.exe -h!|--help!

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.