Skip to main content
Skip table of contents

Extracting full text (Advanced)

Advanced instructions for AddHashAndExtractedText.

To see all the optional tools available.

  1. Open the Cognitive Toolkit by opening a windows command prompt in Administrator Mode

  2. Change directory (cd) to where you extracted the shinydocs-cognitive-toolkit-yyyy-mm-dd.zip file

  3. Type the following command within the root folder of the Cognitive Toolkit:

    CODE
    CognitiveToolkit.exe AddHashAndExtractedText --help

Command options:

Tool: AddHashAndExtractedText
Usage: CognitiveToolkit AddHashAndExtractedText [options]

Command

Required/Optional

Info

-- source-settings <SOURCE SETTINGS> 

Required

Default: leave blank for filesystem

Do not use this parameter if the source is a file system!

Path to json file that contains connection details for the repository (source). Templates for these files can be found in the CognitiveToolkit download under External Resources\Example Source Settings

--action-keyword <ACTION_KEYWORD>

Optional

Default: both

Action to perform (hash,text,both)

--debug-level <DEBUG_LEVEL>

Optional

Default: 20

The level of depth of exception messages (Default: 20)

--force

Optional
Default: false

Forcefully remove / Suppress prompt for confirmation

--index-type <INDEX_TYPE>

Optional
Default: shinydocs

Include a name for the index objects. If you do not include a name, the name “shinydocs” will be recorded here. 

You cannot change the index type easily! Only use this option if you know what you are doing. 

--max-characters <MAX_CHARACTERS>

Optional

Default: all characters

The maximum number of characters for the extracted text field

Warning: Setting this value too high can result in timeouts or problems loading the index if the supporting hardware is unable to cope with the load. Shinydocs recommends 30,000 to ensure a performant index.

--ocr-utility <OCR_UTILITY>

Optional

Default: none

OCR Utility to use for text extraction (iron,none)

-a|--algorithm <ALGORITHM>

Optional

Default: md5

Algorithm (Available algorithms: md5, sha1, sha256, sha512)

-i|--index-name <INDEX_NAME>

Required

Name of the index.

Note the value used here as it will have to match what is used in future Cognitive Toolkit tools (such as addHash).

-q|--query <QUERY>

Required

Path to JSON file containing your desired query. The results of this query will be the input for this tool. You can also use escaped json directly in the command.

Paths with spaces or using inline JSON will require “double-quotes”.

-skip-errors

Optional
Default: false

Skip re-processing errors - items marked as:

addhashandextractedtext:error

Errors indicate there was a problem in either generating the hash id for the item or text extracting. Check your logs for errors.

-s|--silent

Optional
Default: false

Note: For tasks that are scheduled, --silent is preferred as there is a slight performance increase.

Turn off the progress bar

-t|--threads <THREADS>

Optional
Default: 1

Number of parallel processes to start

-u|--index-server-url <INDEX_SERVER_URL>

Required

URL of the index server

Sample .json file:

query-match-path-no-hash.json

Sample .bat file:

COG-Query-AddHash-md5 (1).bat

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.