Skip to main content
Skip table of contents

Runscript Commands

Overview

Cognitive Toolkit includes a Command Line Interface (CLI) tool for Shinydocs Cognitive Suite. From the CLI tool, you can run several runscript commands for the most common use cases, and you can create your own if needed.

Runscript commands can only be run by an Administrator.

Getting Started

Prerequisites

Cognitive Toolkit must be installed and available.  

Runscript Commands

Using the CLI tool, administrators can use the following runscript commands for specialized applications.

Runscript

Use case

BulkDocumentEnricher

Use a .CSV file to enrich an index

ImportCsv

Create an index and import based on the fields provided 

MoveValues

Move values from one field to another 

NormalizeExtension

Standardize irregular file extension values

RegexEntityExtractor 

Extract values from a pattern, such as a driver’s license, to populate corresponding fields in an index

TagQueryResult

Tag all items that match a given query

UpdateInd

Update indices when upgrading to Shinydocs Cognitive Suite 2.5.1

FlagFieldBasedOnRegex

Use a pattern, such as PII (ie credit card #), to tag fields in an index

BulkDocumentEnricher

Description

The BulkDocumentEnricher runscript command allows you to enrich documents in the index by specifying a comma-separated value file that specifies a mapping between a search term and data to import into the index.

Running the script

To run the BulkDocumentEnricher script, provide the following parameters to the runscript tool:

Option

Details

Required

-p <path>

The path to the script file

Yes

-i <indexName>

Name of the index

Yes

-u <URL>

Server URL of the index

Yes

-q <path to query file>

The path to the JSON query file

Yes

--csv <path to csv file>

Path to the comma-separated value (csv) file

Yes

--column-names

A comma-separated list of the column names specified in the csv file.

Yes

--date-modified-field

Name of field to create or update which contains the datetime the document was last updated. If not specified, no field is created.

No

--date-format

Specify a custom date format for CSV Input. Default is yyyy-MM-dd HH:mm

No

--threads

The number of threads. If not specified, defaults to 1

No

Example:

POWERSHELL
CognitiveToolkit.exe RunScript -p "C:\Users\username\Desktop\Scripts\Crawling Files\resources\cognitive-toolkit-executable\Scripts\General\BulkDocumentEnricher.cs" -u "http://localhost:9200" -i tagging_index --csv "C:\Users\username\Desktop\Scripts\Crawling Files\CSV\department_enrichment.csv" -q "C:\Users\username\Desktop\Queries\fulltext.json" --column-names business_group,business_unit,department

Format of the query file

The query file is a JSON file that uses the standard ElasticSearch query language. By surrounding a field in curly braces, the BulkDocumentEnricher will replace that term with the value of that field instead.

Format of the comma-separated file

The first line lists the field name you would like to search against, followed by the field names you would like to decorate the documents with.

Subsequent lines list the term to search for, and if found, the value to set the decorator field to.

Query Example

For example, given the following query:

CODE
  {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "fullText"
          }
        },
        {
          "match": {
            "fullText": "{fullText}"
          }
        }
      ]
    }
  }

AND the following CSV file:

fullText, category

Mickey, mouse

Donald, duck

Pluto, dwarf planet

Running the BulkDocumentEnricher will search the fullText of each document.

  • If the term “Mickey” is found, a category field will be added to the document in the index and set the field to “mouse”

  • If the the term “Donald” is found, a category field will be added to the document in the index and set the field to “duck”

  • If the the term “Pluto” is found, a category field will be added to the document in the index and set the field to “dwarf planet”

You can tag documents with more than one field, by adding additional columns to the document. For example:

fullText, category, video, serial-number

Mickey, mouse, steamboat, 0001

  • If the term “Mickey” was found in the full text, three category fields (category, video, serial-number) will be added to the document in the index and set the fields to “mouse,” “steamboat,” and “0001” respectively.

Adding an asterisk (*) to the column name indicates that the corresponding field should be treated as a single value (a string) rather than a list of values (an array) and that value will overwrite the previous value rather than be appended to the list.

ImportCsv

​Description

The ImportCsv runscript command will get the CSV file and create an index. You can add a customized field name and field value. The command also creates a timestamp indicating when the Index is created.

Running the script

To run the ImportCsv script, provide the following parameters to the runscript tool:

Option

Details

Required

-p <path>

The path to the script file

Yes

-i <indexName>

Name of the index

Yes

-u <URL>

URL of the index

Yes

--csv <path to query file>

The path to the CSV file

Yes

--field-name

Name of the field that will appear in the index

Yes

--field-value

The value of the field name

Yes

--id-fields

Comma separated list of field names (must be lower case)

Yes

--threads

The number of threads. If not specified, defaults to 1

No

Example

For the following CSV file (a list of major league baseball teams):

teamname,city,league,division
Arizona Diamondbacks,"Phoenix, Arizona",National,West
Atlanta Braves,"Atlanta, Georgia",National,East
Baltimore Orioles,"Baltimore, Maryland",American,East
Boston Red Sox,"Boston, Massachusetts",American,East

the corresponding ImportCSV Runscript and parameters might look like this:

Runscript:

-p <path to runscript goes here>

-i <index name goes here>

-u <index URL goes here>

--filePath <path to CSV file goes here>

--fieldName <FieldName goes here> For example: --fieldName schemaType

--fieldValue <FieldValue goes here> For example: --fieldValue baseballteams

--idFields <comma separated list of fields goes here> For example: --idFields "teamname,city,league,division"

MoveValues

Description

The MoveValues script allows you to move the values of an existing field to a new field, optionally clearing the previous value of the original field.

Prerequisites

Before running the MoveValues script:

  1. Create an index: Run an indexing tool such as CrawlExchange or CrawlFileSystem and create an index

  2. Add hash and extract text: Run a hashing tool such as AddHashAndExtractedText to add hash value and extract text from the index

Running the script

To run the MoveValues script, provide the following parameters to the runscript tool:

Option

Details

Required

-p <path>

The path to the script file

Yes

-i <indexName>

Name of the index

Yes

-u <URL>

URL of the index

Yes

-q <query>

The path to the JSON query file

Yes

--old-field-name

Name of the field which contains value to move

Yes

--new-field-name

Name of the field which the source field value will be moved to

Yes

--clear <true>

Clears the values from the source field. If not specified, defaults to false.

No

--threads

The number of threads. If not specified, defaults to 1

No

Example

Runscript:

Below is an example of what the parameters might look like to move the values from an old field called Address to a new field called Location:

-p <path to runscript goes here>

-i <index name goes here>

-u <index URL goes here>

-q <path to query file>

--old-field-name Address

--new-field-name Location

--clear true

CODE
CognitiveToolkit.exe RunScript -p "C:\Users\ldekker\Desktop\Scripts\Crawling Files\resources\cognitive-toolkit-executable\Scripts\General\MoveValues.cs" -u "http://localhost:9200" -i Shinydocs_index -q "C:\Users\ldekker\Desktop\Queries\fulltext.json" --oldFieldName Address --newFieldName Location --clear true

Format of the query file

The query file is a JSON file that uses the standard ElasticSearch query language. By surrounding a field in curly braces, the MoveValues will replace that term with the value of that field instead.

Query Example

CODE
  {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "fullText"
          }
        },
        {
          "match": {
            "fullText": "{fullText}"
          }
        }
      ]
    }
  }

NormalizeExtension

Description

The NormalizeExtension script allows you to standardize the format of file extension field for all documents in an index by removing any leading "." and ensuring the entirety of the value is lowercase.

This command is useful in situations where you’ve previously crawled your file system and the following types of nonstandard extensions were recorded in the index:

  • oranges.pdf

  • apples.PDF

  • bananas.Pdf

In the index, the “extension” field would have recorded the following values: “pdf”, “PDF”, and “Pdf” respectively, causing issues with grouping the results.

Running the NormalizeExtension runscript command will standardize them all in lowercase as “pdf”, “pdf”, and “pdf”.

Prerequisite

There should have been an initial crawl performed and evidence of nonstandard file extensions recorded within the index.

Running the script

To run the NormalizeExtension script, provide the following parameters to the runscript tool:

Option

Details

Required

-p <path>

The path to the script file

Yes

-i <indexName>

Name of the index

Yes

-u <URL>

URL of the index

Yes

-q <query>

The path to the JSON query file

Yes

--threads

The number of threads. If not specified, defaults to 1

No

Example

Runscript:

Below is an example of what the parameters might look like to normalize the extensions for all items in an index:

-p <path to runscript goes here>

-i <index name goes here>

-u <index URL goes here>

-q <path to query file>

CODE
CognitiveToolkit.exe RunScript -p "C:\Users\ldekker\Desktop\Scripts\Crawling Files\resources\cognitive-toolkit-executable\Scripts\General\NormalizeExtension.cs" -u "http://localhost:9200" -i Shinydocs_index -q "C:\Users\ldekker\Desktop\Queries\match_all.json"

Format of the query file

The query file is a JSON file that uses the standard ElasticSearch query language. By surrounding a field in curly braces, MoveValues will replace that term with the value of that field instead.

Query Example

CODE
  {
    "match_all" : {}
  }

RegexEntityExtractor

Description

The RegexEntityExtractor runscript command allows you to extract values from a pattern, such as a driver’s license, to populate corresponding fields in an index.

Prerequisites

Before running the RegexEntityExtractor script:

  1. Create an index: Run an indexing tool such as CrawlExchange or CrawlFileSystem and create an index.

  2. Add hash and extract text: Run a TextExtraction tool such as AddHashAndExtractedText to add extract text to the index.

Running the script

To run the RegexEntityExtractor script, provide the following parameters to the runscript tool:

Option

Details

Required

-p <path>

The path to the script file

Yes

-i <indexName>

Name of the index

Yes

-u <URL>

URL of the index

Yes

--csv <path to csv file>

The path to the comma-separated value (csv) file

Yes

-q <path to query file>

The path to the JSON query file

Yes

--regex-column-name

A comma-separated list of the column names specified in the csv file

Yes

--tag-column-name

The column name that contains the field values

Yes

--search-columns

Allows multiple comma-separated column names

No

--threads

The number of threads. If not specified, defaults to 1

No

--nodes-per-request

The number of nodes.

No

Example

Below is an example of what parameters might look like to find and extract Ontario Drivers License numbers (Canada) from the fullText field, and place the extracted value in an index field called “Ontario Drivers License”.

CODE
CognitiveToolkit.exe RunScript -p "C:\Users\ldekker\Desktop\Scripts\Crawling Files\resources\cognitive-toolkit-executable\Scripts\General\RegexEntityExtractor.cs" -u "http://localhost:9200" -i Shinydocs_index --csv "C:\Users\ldekker\Desktop\Scripts\Crawling Files\CSV.csv" -q "C:\Users\ldekker\Desktop\Queries\fulltext_not_OntarioDriversLicense.json" --regex-column-name PII_regex --tag-column-name PII_type

Format of the CSV file

CODE
PII_type,PII_regex
Ontario Drivers License,\b[a-zA-Z]\d{4}[\s-]*\d{5}[\s-]*\d{5}\b
Table in excel. Column A is titled PII_type and Column B is labeled PII_regex. Row 1 Column A says Ontario Drivers License. Row 1 Column B has a regex pattern

How the CSV appears in Microsoft Excel

Format of the query file

The query file is a JSON file that uses the standard ElasticSearch query language.

Query Example

Full text must exist, “Ontario Drivers License” field must not exist:

JSON
  {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "fullText"
          }
        }
      ],
      "must_not":[
        {"exists": {
          "field": "Ontario Drivers License"}
          }
      ]
    }
  }

TagQueryResult

Description

The TagQueryResult script allows you to tag a document with the given field name and field value. If the field name exists, then it will only add the value to the field without creating a field with the same name. This runscript can be used to tag documents with any field name and value. For example, by specifying the ROT rules in the query file, you can tag a document with the appropriate value.

Running the script

To run the TagQueryResult script, provide the following parameters to the runscript tool:

Option

Details

Required

-p <path>

The path to the script file

Yes

-i <indexName>

Name of the index

Yes

-u <URL>

URL of the index

Yes

-q <path to query file>

The path to the JSON query file

Yes

--field-name

The name of the field that should exist or will be created

Yes

--field-values

The value the document will be tagged as

Yes

--date-modified-field

Name of field to create or update which contains the datetime the document was last updated. If not specified, no field is created.

No

--threads

The number of threads. If not specified, defaults to 1

No

Example

Below is an example of what the parameters might look like to check the “path” field for anything that has “finance” in it and set the field “department” to “Finance”.

In this example, the TagQueryResult script will skip any item in “backup” folder(s) and it will skip any items that already have the field “department” set.

CODE
CognitiveToolkit.exe RunScript -p "scripts\General\TagQueryResult.cs" --field-values "Finance" --field-name "department" -q "COG Batch Files\ACME RunScripts - 3 dept+offer-status+offer-year\department-finance.json" -i shiny -u http://localhost:9200

Runscript:

-p <path to runscript goes here>

-i <index name goes here>

-u <index URL goes here>

--q <path to query file>

--field-name <Field-name goes here> For example: --field-name “department”

--field-values <Field-value goes here> For example: --field-values “Finance”

UpdateInd

The UpdateInd runscript command allows you to update indices when upgrading to Shinydocs Cognitive Suite 2.5.0. Specifically, this command resolves keyword issues associated with prop-* and cc-* fields with previous versions of Shinydocs Cognitive Suite.

Prerequisites

In some instances, clients created index fields that were prefixed with prop-* and cc-*. Running the UpdateInd runscript command updates the keyword property on these fields prefixed with prop-* and cc-* to ensure they are included in searches.

For a given index, this command only needs to be run once. There is no need to run this command again, unless someone restores an old index.

Running the script

To run the UpdateInd script, provide the following parameters to the runscript tool:

Option

Details

Required

-c <class>

UpdateInd 

Yes

-p <path>

The path to the script file

Yes

-i <indexName>

Name of the index

Yes

-u <URL>

URL of the index

Yes

Example

CODE
CognitiveToolkit.exe RunScript -c UpdateInd -p "C:\Users\ldekker\Desktop\Scripts\Crawling Files\resources\cognitive-toolkit-executable\Scripts\General\UpdateInd.cs" -u "http://localhost:9200" -i Shinydocs_index

FlagFieldBasedOnRegex

​Description

The FlagFieldBasedOnRegex runscript command allows you to enrich documents in the index by adding fields based on given patterns and values.

Prerequisites

Before running the FlagFieldBasedOnRegex script:

  1. Create an index: Run an indexing tool such as CrawlExchange or CrawlFileSystem and create an index.

  2. Add hash and extract text: Run a hashing tool such as AddHashAndExtractedText to add hash value and extract text from the index.

Running the script

To run the FlagFieldBasedOnRegex script, provide the following parameters to the runscript tool:

Option

Details

Required

-p <path>

The path to the script file

Yes

-i <indexName>

Name of the index

Yes

-u <URL>

URL of the index

Yes

-q <path to query file>

The path to the JSON query file

Yes

--regex-pattern

The regex pattern the tool will be looking for in the document

Yes

--search-field

Name of the field the tool will be searching against to find the match.

Yes

--field-name

Name that will appear in the document if a match is found

Yes

--value 

The value that will be displayed beside field name

Yes

Examples

Below, an Ontario Health Card example has been provided along with the regex pattern for the Ontario Driver's License, Canadian passport and Canadian postal code.

Ontario Health Card Number

Runscript

-p path to the script goes here

--regex-pattern "\d{4}[\s-]\d{3}[\s-]\d{3}[\s-]*[a-zA-Z]{2}"

--value "Ontario Health Card Number"

--field-name potential_pii

--search-field name

-q query file goes here

-u index URL goes here

-i index name goes here

Driver's License Number Ontario

Runscript

--regex-pattern "\b[a-zA-Z]\d{4}[\s-]*\d{5}[\s-]*\d{5}\b"

--value "Driver's License Number Ontario"

Canadian Passport Number

Runscript

--regex-pattern "\b[a-zA-Z]\d{4}[\s-]*\d{5}[\s-]*\d{5}\b"

--value "Canadian Passport Number"

Canadian Postal Code

RunScript

--regex-pattern "\b[a-zA-Z]\d{4}[\s-]*\d{5}[\s-]*\d{5}\b"

--value "Canadian Postal Code"

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.