Skip to main content
Skip table of contents

Initial Discovery - File Share (Advanced)

This document is a companion to Initial Discovery - File Share . Here you will find advanced instructions for fine tuning the initial discovery of your file share.

  • Open the Cognitive Toolkit by opening a windows command prompt in Administrator Mode

  • Change directory (cd) to where you extracted the shinydocs-cognitive-toolkit-yyyy-mm-dd.zip file

  • To see all of the available options for the CrawlFileSystem tool, use the following command at the Administrator: Command Prompt:

JSON
CognitiveToolkit.exe CrawlFileSystem --help
  • A list of Command Option will be displayed.

Command options:

Command

Required/Optional

Info

-p|--path <PATH>

Required

Path to crawl

-filter <FILTER FILE LOCATION>

Optional

Use a filter to limit which directories or files are to be crawled. For example, this option can be used to explicitly list some folders for exclusion. Or you could list specifically the folders and/or file types to include. See the full documentation on filters for more information.

-source-id <SOURCE_ID>

Optional

Default: filesystem

Each record in the Shinydocs Index contains the "sourceId" field, which is set to the value specified here. If you omit "--source-id", the value "filesystem" will be recorded here.

a|--add-field-owner

Optional

If you wish to include file owner information for each record in the Index, include this option. Note that this will add overhead to the time required to do the crawl. You can also add owner information to the Shinydocs Index later with the "AddOwner" tool. Basically, only include this option if you are absolutely sure that you need owner information in the Index and are willing to accept the slightly longer crawl time.

-folders

Optional

Default: false

If you wish to ensure that folders are included in the file share crawl, include this option in the command.

-include-hidden

Optional

Default: false

If you wish to ensure that Windows hidden items are included in the file share crawl, include this option in the command.

-include-system

Optional

Default: false

If you wish to ensure that Windows system items are included in the file share crawl, include this option in the command.

-crawl-pst

Optional
Default: false

If you with to crawl inside PST files

-include-reparse

Optional

Default: false

If you wish to ensure that reparse items are included in the file share crawl, include this option in the command.

Reparse items are a feature of NTFS that provide a mechanism for file system filter drivers to intercept a file access request and potentially rewrite it. They provide the mechanism that powers several other NTFS features:

  • Volume mount points

  • Directory junctions

  • Symbolic links

  • Single Instance Storage

  • Native Structured Storage

  • Hierarchical Storage Management

-u|--index-server-url <INDEX_SERVER_URL>

Required

Url of the index server

-i|--index-name <INDEX_NAME>

Required

Name of the index. Note the value used here as it will have to match what is used in future Cognitive Toolkit tools (such as addHash).

-index-type <INDEX>

Optional

Default: shinydocs

Include a name for the index objects. If you do not include a name, the name “shinydocs” will be recorded here. 

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.