Skip to main content
Skip table of contents

Setting up the shinydocs-jobs index

Compatible with Cognitive Toolkit 2.7.0 and higher.

You will need to configure each Cognitive Toolkit to use the shinydocs-jobs index. See below for instructions.

What is the shinydocs-jobs index?

An index called shinydocs-jobs can be created and used to log data by enabling the --use-shinydocs-jobs option during the following Cognitive Toolkit operations:

  • AddClassifications

  • AddExtractedTextFromEngineeringDrawings

  • AddFromSqlDatabase

  • AddHashAndExtractedText

  • AddPathValidation

  • AddPropertyData

  • CacheFileSystemPermissions

  • CopyItems

  • Dispose

  • ExportFromIndex

  • ExtractAndCrawlPst

  • ExtractEntities

  • FindSimilarClassification

  • Migrate

  • RemoveField

  • RemoveItems

  • RestoreCachedFileSystemPermissions

  • SetFileSystemPermissions

  • TagDuplicate

  • UpdateProperties

Enabling the --use-shinydocs-jobs option prompts the Cognitive Toolkit to send logging data to the shinydocs-jobs index during an operation. This will help you track current crawl activities, gather metrics and visualize errors, such as access and text extraction errors.

Visualizations & Dashboard

See: Shinydocs Jobs Visualizations & Dashboard

Key fields in the shinydocs-jobs index

Once this feature is enabled, there are key fields you should be aware of and their purpose. There are two types of entries in this index:

Crawl operations and metrics

Contains information about the crawl operations and metrics from completed crawl operations. There are more fields available, these are some of the key fields to know about:

Index Field

Use

commandArguments

This field is only present on crawl operations as commandArguments: <arguments used with CognitiveToolkit.exe>

startTime

The date and time that the process started

endTime

The date and time that the process ended.

If this field is not present on an entry, the process is either not complete or was terminated unexpectedly

duration

Crawl operation duration in seconds.

If this field is not present on an entry, the process is either not complete or was terminated unexpectedly

files-per-second

The rate at which the crawl operation was able to crawl or process files.

If this field is not present on an entry, the process is either not complete or was terminated unexpectedly

file-quantity

The number of files crawled or processed.

If this field is not present on an entry, the process is either not complete or was terminated unexpectedly

toolType

The name of the tool used in Cognitive Toolkit

machine

The name of the computer running the Cognitive Toolkit

Errors

Contains errors encountered while crawling, hashing, text extracting, etc. There are more fields available, these are some of the key fields to know about:

Index Field

Use

type

This field is only present on errors as type: error

toolType

The name of the tool used in Cognitive Toolkit

exception

If there is an exception raised during the crawl operation, the exception will be noted here.

Not all errors have an exception

message

The error raised during the crawl operation.

All errors have a message

time

The date and time the error occured

machine

The name of the computer running the Cognitive Toolkit

Configure Cognitive Toolkit to log to the index

With logging to the index enabled, the Cognitive Toolkit will attempt to update the shinydocs-jobs index with log entries as well as the log file. This can be helpful in visualizing current crawl activities, gathering metrics, and visualizing errors (access errors, errors in text extraction, etc.). If the connection between the Cognitive Toolkit and the Index breaks or Cognitive Toolkit is terminated unexpectedly, it will be unable to log to the index.

This must be configured before performing any crawl operations, otherwise previous and current running processes will not be logged to the index (but will still be logged to the log file).

  1. In the Cognitive Toolkit directory, which contains CognitiveToolkit.exe, open CognitiveToolkit.exe.config in a notepad-like editor

  2. On Line 20, uncomment <add key="JobIndexUrl" value="http://localhost:9200" /> by removing the leading <!-- and the trailing -->

  3. Set the value to the URL and port of your Analytics Engine (index) endpoint, replacing http://localhost:9200

  4. Save the file

Set up the index pattern

Once your first crawl has started, a new index will be created called shinydocs-jobs. To set up the index pattern in the Visualizer:

  1. In the Visualizer, click Management

  2. Select Index Patterns

  3. Select Create index pattern

  4. In the field Index pattern, enter shinydocs-jobs

  5. Select Next

  6. In the Time Filter field name drop-down, select “I don’t want to use the Time Filter

  7. Select Create index pattern

  8. Wait for the index pattern to be created (less than 1 minute)

  9. Once created, you will see a list of fields in your index pattern

  10. In the Filter search bar, enter duration, and select the pencil (edit) button

  11. In the Format drop-down, select Duration

  12. The Input format should be Seconds and the Output format should be Human Readable

  13. Select Save field, all done!

Removing tasks that did not complete

Occasionally, there will be tasks that do not complete fully and therefore remain in the index incomplete. This can be caused by:

  • System shutting down while process is running

  • Application crash

  • Terminating the process before completing

  • Connection to the index breaks

To remove these items from the index you will need to use the Cognitive Toolkit or the Visualizer’s Dev Tools section. You will need the Task IDs (indexed as _id) of the entries to remove them (example ID: 5f0d242f-eafe-487d-982c-9250bbed4ecf). Either method will require a query to tell the index what to delete. You can use this query for either method:

Remove ID query

Replace <_id_here> with the ID(s) you want to remove

CODE
{
  "query": {
    "ids": {
      "values": [
        "<_id_here>"
      ]
    }
  }
}

Remove_ID_query.json

Cognitive Toolkit Method

Replace http://<index_url:port> with your index URL and port & <path_to_Remove_ID_query.json>with the path to the Remove ID query

CODE
CognitiveToolkit.exe RemoveItems --index-name shinydocs-jobs --index-server-url http://<index_url:port> --query <path_to_Remove_ID_query.json>

After Cognitive Toolkit has gathered the items to remove from the index, you will be prompted in the terminal to confirm the removal of these items. If you do not want to be prompted, use the --force argument in the command

Visualizer Method

Navigate to Dev Tools in the left sidebar menu.

Replace <_id_here> with the ID(s) you want to remove. Then run the command:

CODE
POST shinydocs-jobs/_delete_by_query
{
  "query": {
    "ids": {
      "values": [
        "<_id_here>"
      ]
    }
  }
}

The output after completion will look like this:

CODE
{
  "took" : 81,
  "timed_out" : false,
  "total" : 1,
  "deleted" : 1,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

"deleted": reflects the number of deleted items

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.