How to Set the "--threads" Option
Certain processes within the Cognitive Toolkit have an option for --threads . To set this option to optimize performance of Cognitive Toolkit operations, it’s important to understand how and why it exists.
--threads
The --threads option specifies the number of parallel processes to start.
Some important things to note:
If a --threads <VALUE> option is NOT specified, the default is --threads 1.
The --threads option is not forced. If set to --threads 1000, the operation applies as many threads as possible, “up to” the number specified.
Threads have different implications depending on the Cognitive Toolkit operation being run:
Operation | Potential Impact |
---|---|
Migration | The value of the --threads option affects performance of the Content Server to which you are migrating. |
Database Crawl | The value of the --threads option affects the database. For example, setting the --threads value too high can cause the database to shut down the operation. |
File System Crawl | The value of the --threads option can be gradually increased until performance is at a peak. Anything higher will result in diminishing returns. |
Setting the --threads value
If the chosen operation can be performed with 1 thread quickly, utilize the default setting.
If performance isn’t optimal at the default setting, try increasing the --threads value by intervals of two until the CPU or RAM is at capacity OR until diminishing returns are observed.
Examples
For CPU-intensive operations, the number of threads should be set to a lower number to improve performance.
addhashandextractedtext is a CPU-intensive operation. You may want to set the option to --threads 3
For lightweight operations, the number of threads can be set to a higher number to improve performance.
addapathvalidation is a lightweight operation. You may want to set the option to --threads 20.