Disposing your data
The disposal tool would be used if you would like to delete the file(s) using a query-based script to be deleted from its source location (based on the paths in the index). Your disposal query may be to delete specific ROT rules, delete specific folders, or items based on a tag you have applied in the index.
Before we begin, be sure to have completed the following:
Initial Discovery (crawl) has been completed
If not refer to the following Initial Discovery - File Share
Identified what you would like to dispose
Choose to delete the files as is or use an audit index to track deletions
Using an audit index to track deletions
Your organization may require that deletions are tracked for auditing purposes. This can be done manually or configured to run on a schedule via Windows Task Scheduler.
Dispose
Create a query based on your set of ROT rules and or tags you have set.
Below is an example query
JSON{ "bool": { "must": [ { "match_phrase": { "rot_trivial": "log" } } ] } }
The above script would look at disposing of any items with the fields “rot_trivial” having the Value of “log”
Open Command Prompt
Navigate to the CognitiveToolkit root folder
Run the Dispose tool by inputting the following command and replacing the required fields according to your environment
CognitiveToolkit.exe Dispose --hash-field <VALUE> --hash-algorithm <ALGORITHM> -q <QUERY_FILE> -u <INDEX_SERVER_URL> -i <INDEX_NAME>Below is an example of what was used with this user case
CODECognitiveToolkit.exe Dispose --hash-algorithm md5 -q "C:\ShinyDocs\shinydocs-cognitive-toolkit-2020-03-18 (2.3.4.1)\External Resources\Sample Queries\COG-Dispose-Example.json" -u "http://localhost:9200" -i "testindex"
After running the command above, follow the instructions and type Confirm if you are comfortable to proceed.
💡Use--force
to suppress the confirmation messageYou have now disposed of the file(s) based on your query. Your disposed data will still be in the current index.
Copy
Now that the data has been disposed of, you can copy the successful disposals to an audit index.
We recommend using a date-based naming convention for your audit indices
audit-dispose-yyyy-MM-dd
ex. audit-dispose-2022-01-01
This allows for an aggregate index pattern of audit-*
for all audit indices and/or audit-dispose-*
for all disposal audit indices.
Create a query that isolates successfully disposed data, you can use this query as is saved as a .json file
JSON{ "bool": { "must": [ { "match_phrase": { "dispose": "true" } } ] } }
Using Cognitive Toolkit, run the CopyItems tool
CODECognitiveToolkit.exe CopyItems --index-url http://localhost:9200 --index-name shiny --destination-index-url http://localhost:9200 --destination-index-name audit-dispose-2022-01-01 --query <path_to_query.json>
💡For automation, consider using PowerShell to automatically append the date to the index name
Replace<...>
with your index name and pathsPOWERSHELL$CurrentDate = Get-Date -f yyyy-MM-dd $CognitiveToolkitArguments = @("CopyItems", "--index-url", "http://localhost:9200 ", "--index-name", "<source_index_name>", "--destination-index-url", "http://localhost:9200", "--destination-index-name", "audit-dispose-$CurrentDate", "--query", """<path_to_query.json>""") & "<path_to_CognitiveToolkit.exe>" $CognitiveToolkitArguments
Once this process is complete, you will have a new index based on the name you gave it. You can then create an aggregate index pattern (
audit-dispose-*
) in the Visualizer to browse the audit indices
It is good practice to delete audit indices after some time (depending on your organizations retention rules). Using the RemoveIndex tool in Cognitive Toolkit, you can use the same aggregate name logic to delete audit indices for a given year (ex. --index-name audit-dispose-2021*
)
Remove
You’ve disposed of the data, copied the disposals to the audit index, and now you need to remove the disposed of data from the source index.
You can use the same query mentioned in the Copy section
CODE{ "bool": { "must": [ { "match_phrase": { "dispose": "true" } } ] } }
Using Cognitive Toolkit, run the RemoveItems tool
💡Use--force
to suppress the confirmation messageCODECognitiveToolkit.exe RemoveItems --index-url http://localhost:9200 --index-name shiny --query <path_to_query.json>
Once the process is complete, you will have successfully removed the disposed of data from the source index.
Did you know… There are two ways to run tools in the Shinydocs Cognitive Toolkit:
Example .bat file |