Scheduling CrawlFileSystem
Setting up a scheduled crawl requires the use of batch files and Windows Task Scheduler for execution.
The user running the Scheduled Task will need permission to run batch jobs, along with other permissions required to use the Cognitive Toolkit.
Creating Batch Files for Crawling the File System
Open Shinydocs Cognitive Analysis Toolkit. You can use the following command to see all options available for Crawl File System:
CognitiveToolkit.exe CrawlFileSystem -h
Using your preferred text editor (ex. Notepad++), create a new file
Craft your batch file command.
Sample batch file:
CognitiveToolkit.exe CrawlFileSystem -p <PATH_TO_CRAWL>
u
i
<INDEX_NAME>
Note: Prior to running the command, replace:
<PATH_TO_CRAWL> with one of the paths you wish to crawl
<http://localhost:9200> with the port your Indexer is running on (if different)
<INDEX_NAME> with the name of the index you want to use
You can add virtually as many lines as you would like in the batch file if you have different shares/paths to crawl
Sample executed batch file:
Save your file with the extension .bat
Creating a Scheduled Task with Windows Task Scheduler
Windows offers great flexibility for scheduling tasks in its OS ecosystem. It can be used to schedule virtually any operation within the Shinydocs Cognitive Toolkit with the combination of executables and scripts.
Launch Task Scheduler
Under Actions, select “Create Task”
In the “Create Task” window, give the task a name that will make sense contextually to your organization.
If you are using a service account for crawling purposes, use the “Change User or Group…” option to run the task as that user/account. Please note that the user/account will need read & write access to the batch files and executables directory being used.
Select “Run whether user is logged in or not”
Under the “Triggers” tab, create a new trigger for this task.
Select the time and frequency you want this crawl(s) to be performed
Under the “Actions” tab, we will configure what batch file(s) you want to run with this task. Your action should be “Start a program”
Fill in the Start in (optional): to point to where the CognitiveToolkit.exe lives. This is usually the root folder of the Cognitive Toolkit folder.