Initial Discovery - File Share (Advanced)
This document is a companion to Initial Discovery - File Share . Here you will find advanced instructions for fine tuning the initial discovery of your file share.
Open the Cognitive Toolkit by opening a windows command prompt in Administrator Mode
Change directory (cd) to where you extracted the shinydocs-cognitive-toolkit-yyyy-mm-dd.zip file
To see all of the available options for the CrawlFileSystem tool, use the following command at the Administrator: Command Prompt:
CognitiveToolkit.exe CrawlFileSystem --help
A list of Command Option will be displayed.
Command options:
Command | Required/Optional | Info |
-p|--path <PATH> | Required | Path to crawl |
-filter <FILTER FILE LOCATION> | Optional | Use a filter to limit which directories or files are to be crawled. For example, this option can be used to explicitly list some folders for exclusion. Or you could list specifically the folders and/or file types to include. See the full documentation on filters for more information. |
-source-id <SOURCE_ID> | Optional Default: filesystem | Each record in the Shinydocs Index contains the "sourceId" field, which is set to the value specified here. If you omit "--source-id", the value "filesystem" will be recorded here. |
a|--add-field-owner | Optional | If you wish to include file owner information for each record in the Index, include this option. Note that this will add overhead to the time required to do the crawl. You can also add owner information to the Shinydocs Index later with the "AddOwner" tool. Basically, only include this option if you are absolutely sure that you need owner information in the Index and are willing to accept the slightly longer crawl time. |
-folders | Optional Default: false | If you wish to ensure that folders are included in the file share crawl, include this option in the command. |
-include-hidden | Optional Default: false | If you wish to ensure that Windows hidden items are included in the file share crawl, include this option in the command. |
-include-system | Optional Default: false | If you wish to ensure that Windows system items are included in the file share crawl, include this option in the command. |
-crawl-pst | Optional | If you with to crawl inside PST files |
-include-reparse | Optional Default: false | If you wish to ensure that reparse items are included in the file share crawl, include this option in the command. Reparse items are a feature of NTFS that provide a mechanism for file system filter drivers to intercept a file access request and potentially rewrite it. They provide the mechanism that powers several other NTFS features:
|
-u|--index-server-url <INDEX_SERVER_URL> | Required | Url of the index server |
-i|--index-name <INDEX_NAME> | Required | Name of the index. Note the value used here as it will have to match what is used in future Cognitive Toolkit tools (such as addHash). |
-index-type <INDEX> | Optional Default: shinydocs | Include a name for the index objects. If you do not include a name, the name “shinydocs” will be recorded here. |