Cognitive Toolkit September 2021/2.5.1.2 Release Notes
Product: Cognitive Toolkit
Version: 2.5.1.2
Date: September 13, 2021
- Improved Duplicate Tagging: We have improved the TagDuplicate tool, making it more efficient on both large and small datasets, providing more control over what is matched against and the resulting items that are processed in the Index. Use the option “–dry-run” which will run like it does in our other tools (not actually update anything, but tell you how many items would have been affected). Use the “–items-to-process-query” option which now defines the items to process – note that this replaces the old “–query” option. Use the option “–match-against-query” which defines the initial “pool” of Index records to consider – default is everything. Use the “–aggregate” option to rather than “find all hashes that match this hash”, to instead “bucket” all of the hashes first, then do normal TagDuplicate processing on each bucket. This approach is much faster on huge datasets, especially for the first run of the tool (and generally should not be used for small updates, such as incremental updates). See –help for TagDuplicate for available parameters and options. (Ref: CT-2183) 
