Initial Discovery - File Share (Advanced)

This document is a companion to Initial Discovery - File Share . Here you will find advanced instructions for fine tuning the initial discovery of your file share.

Open the Cognitive Toolkit by opening a windows command prompt in Administrator Mode
Change directory (cd) to where you extracted the shinydocs-cognitive-toolkit-yyyy-mm-dd.zip file
To see all of the available options for the CrawlFileSystem tool, use the following command at the Administrator: Command Prompt:

JSON

CognitiveToolkit.exe CrawlFileSystem --help

A list of Command Option will be displayed.

Command options:

Command	Required/Optional	Info
-p\|--path <PATH>	Required	Path to crawl
-filter <FILTER FILE LOCATION>	Optional	Use a filter to limit which directories or files are to be crawled. For example, this option can be used to explicitly list some folders for exclusion. Or you could list specifically the folders and/or file types to include. See the full documentation on filters for more information.
-source-id <SOURCE_ID>	Optional Default: filesystem	Each record in the Shinydocs Index contains the "sourceId" field, which is set to the value specified here. If you omit "--source-id", the value "filesystem" will be recorded here.
a\|--add-field-owner	Optional	If you wish to include file owner information for each record in the Index, include this option. Note that this will add overhead to the time required to do the crawl. You can also add owner information to the Shinydocs Index later with the "AddOwner" tool. Basically, only include this option if you are absolutely sure that you need owner information in the Index and are willing to accept the slightly longer crawl time.
-folders	Optional Default: false	If you wish to ensure that folders are included in the file share crawl, include this option in the command.
-include-hidden	Optional Default: false	If you wish to ensure that Windows hidden items are included in the file share crawl, include this option in the command.
-include-system	Optional Default: false	If you wish to ensure that Windows system items are included in the file share crawl, include this option in the command.
-crawl-pst	Optional Default: false	If you with to crawl inside PST files
-include-reparse	Optional Default: false	If you wish to ensure that reparse items are included in the file share crawl, include this option in the command. Reparse items are a feature of NTFS that provide a mechanism for file system filter drivers to intercept a file access request and potentially rewrite it. They provide the mechanism that powers several other NTFS features: Volume mount points Directory junctions Symbolic links Single Instance Storage Native Structured Storage Hierarchical Storage Management
-u\|--index-server-url <INDEX_SERVER_URL>	Required	Url of the index server
-i\|--index-name <INDEX_NAME>	Required	Name of the index. Note the value used here as it will have to match what is used in future Cognitive Toolkit tools (such as addHash).
-index-type <INDEX>	Optional Default: shinydocs	Include a name for the index objects. If you do not include a name, the name “shinydocs” will be recorded here.