Skip to main content
Skip table of contents

Running CrawlFileSystem

CrawlFileSystem is the base operation for data discovery and is generally performed prior to running any other Cognitive Toolkit operation. It crawls the specified path (or multiple paths) for metadata. The metadata is then stored in an index where it can be further mined for insights.


Cognitive Toolkit is a Command Line tool. Every operation follows the same sequence to run:

  1. Open CMD as an Administrator or a Service account that has the appropriate permissions.

  2. Change directory cd to navigate to where you extracted the shinydocs-extraction-[Version]-[YYYY-MM-DD].zip file (Cognitive Toolkit root folder).

  3. After the CognitiveToolkit.exe prompt, type in the runscript command (CrawlFileSystem), followed by the required parameters.

Running the command

To run the CrawlFileSystem command you must provide, at a minimum, the required parameters identified in the following chart:

Option values are based on your environment setup and location of source files.

CrawlFileSystem Options




--path <VALUE>


--path-file <VALUE>

Single path to crawl


Text file that contains multiple paths to crawl

*At least one of these two options must be included in the runscript command.
If --path is not used, --path-file must be used. If --path-file is not used, --path must be used.




Includes hidden files in the crawl


(Default: false)


Includes system files in the crawl


(Default: false)


Add the Owner field to the index


(Default: false)


Includes reparse items

A file or directory can contain a reparse point, which is a collection of user-defined data. The format of this data is understood by the application which stores the data, and a file system filter, which interprets the data and processes the file. When an application sets a reparse point, it stores this data, plus a reparse tag, which uniquely identifies the data it is storing.


(Default: false)


Crawls all files modified after this date* and ignores anything modified before it. This is useful for differentiating between crawls/indices.

Supported date formats: yyyy-MM-dd, yyyy-MM-dd HH:mm, yyyy-MM-ddTHH:mm

Example: 2018-12-20 or 2018-12-20 19:42

Supported relative date formats are: now, now+/-ld[/d], now+/-lm[/d], now +/-ly[/d]


(Default: all)

--index-server-url <VALUE>

URL to index server

If the Cognitive Toolkit and Index are running on different servers, the value of the --index-server-url option should be set to the IP address of the index server rather than the URL.


-i|--index-name <VALUE>

Name of index


-t|--threads <VALUE>

Number of parallel processes to start

For recommendations on setting this number value, see


(Default: 1)

-n|--nodes-per-request <VALUE>

Number of nodes per request.

For recommendations on setting this number value, see Setting the "--nodes-per-request" option


(Default: 1000)


Forcefully remove / Suppress prompt for confirmation

When running batch files, this option allows the operation to continue without interruption.


(Default: false)


Runs everything but doesn’t send nodes to index

The --dry-run option allows you to quickly see how many items will be indexed without actually creating the index.


(Default: false)


Hides the progress bar during the running of the operation


(Default: false)


The following runscript command is an example of the input required for crawling a multiple paths in a file system:

CognitiveToolkit.exe CrawlFileSystem --path-file "<VALUE>" --index-server-url <VALUE> --index-name <VALUE> --add-field-owner --after-date-last-modified now-l/d



--path-file option

Cognitive Suite 2.6 and later eliminates the need to run the CrawlFileSystem operation more than once by allowing you to specify multiple paths to crawl. A path input file allows you to specify multiple paths to crawl.

Example of a path input file (.txt file):

\\Server2.companyABC.local\Files\HR\Offer Letters
\\Server2.companyABC.local\Files\HR\Acceptance Letters
\\Server2.companyABC.local\Files\Finance\Purchase Orders


JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.