(Archived) Bulk Document Enricher
README
Description
The BulkDocumentEnricher allows you to enrich documents in the index by specifying a comma-separated value file that specifies a mapping between a search term and data to export into the index.
Prerequisites
As this is a Runscript based script, ensure that Cognitive-toolkit is installed and available.
How to run
To run the BulkDocumentEnricher.cs script, you must provide the following parameters to the Runscript tool:
Option | Details | Required |
|---|---|---|
| This is always “BulkDocumentEnricher” | Yes |
| The path to the script file | Yes |
| Name of the index | Yes |
| Server URL of the index | Yes |
| The path to the JSON query file | Yes |
| Path to the comma-separated value (csv) file | Yes |
| A comma-separated list of the column names specified in the csv file | Yes |
| The number of threads. If not specified, defaults to 1 | No |
Format of the query file
The query file is a JSON file that uses the standard ElasticSearch/OpenSearch query language. By surrounding a field in curly braces, the BulkDocumentEnricher will replace that term with the value of that field instead.
Format of the comma-separated file
The first line lists the field name you would like to search against, followed by the field names you would like to decorate the documents with.
Subsequent lines list the term to search for, and if found, the value to set the decorator field to.
For example, given the following query:
{
"bool": {
"must": [
{
"exists": {
"field": "fullText"
}
},
{
"match": {
"fullText": "{fullText}"
}
}
]
}
}
And the following CSV file:
fullText, category
Mickey, mouse
Donald, duck
Pluto, dwarf planet
Running the BulkDocumentEnricher will search the fullText of each document. If it finds the term “Mickey”, it will add the category field to the document and set it to “mouse”. If it finds the the term “Donald”, it will add a category field set to “duck”, and if it finds the term “Pluto” it will add a category field set to “dwarf planet”.
You can tag documents with more than one field, by adding additional columns to the document. For example:
fullText, category, video, serial-number
Mickey, mouse, steamboat, 0001
Would apply three tags if the term “Mickey” was found in the fullText.
Adding an asterisk (*) to the column name indicates that the corresponding field should be treated as a single value (a string) rather than a list of values (an array) and that value will overwrite the previous value rather than be appended to the list.