Skip to main content
Skip table of contents

Disposing your data

The disposal tool would be used if you would like to delete the file(s) using a query-based script to be deleted from its source location (based on the paths in the index). Your disposal query may be to delete specific ROT rules, delete specific folders, or items based on a tag you have applied in the index.

Before we begin, be sure to have completed the following:

  1. Initial Discovery (crawl) has been completed

    1. If not refer to the following Initial Discovery - File Share

  2. Identified what you would like to dispose

  3. Choose to delete the files as is or use an audit index to track deletions

Using an audit index to track deletions

Your organization may require that deletions are tracked for auditing purposes. This can be done manually or configured to run on a schedule via Windows Task Scheduler.

Dispose

  1. Create a query based on your set of ROT rules and or tags you have set.

    1. Below is an example query

      JSON
      {
      	"bool": {
      		"must": [
      			{
      				"match_phrase": {
      					"rot_trivial": "log"
      				}
      			}
      		]
      	}
      }
  2.  The above script would look at disposing of any items with the fields “rot_trivial” having the Value of “log”

  3. Open Command Prompt

  4. Navigate to the CognitiveToolkit root folder

  5. Run the Dispose tool by inputting the following command and replacing the required fields according to your environment

    CognitiveToolkit.exe Dispose  --hash-field <VALUE> --hash-algorithm <ALGORITHM> -q <QUERY_FILE> -u <INDEX_SERVER_URL> -i <INDEX_NAME>

    1. Below is an example of what was used with this user case

      CODE
      CognitiveToolkit.exe Dispose --hash-algorithm md5 -q "C:\ShinyDocs\shinydocs-cognitive-toolkit-2020-03-18 (2.3.4.1)\External Resources\Sample Queries\COG-Dispose-Example.json" -u "http://localhost:9200" -i "testindex"  
  6. After running the command above, follow the instructions and type Confirm if you are comfortable to proceed.
    💡Use --force to suppress the confirmation message

  7. You have now disposed of the file(s) based on your query. Your disposed data will still be in the current index.

Copy

Now that the data has been disposed of, you can copy the successful disposals to an audit index.

We recommend using a date-based naming convention for your audit indices

audit-dispose-yyyy-MM-dd
ex. audit-dispose-2022-01-01

This allows for an aggregate index pattern of audit-* for all audit indices and/or audit-dispose-* for all disposal audit indices.

  1. Create a query that isolates successfully disposed data, you can use this query as is saved as a .json file

    JSON
    {
    	"bool": {
    		"must": [
    			{
    				"match_phrase": {
    					"dispose": "true"
    				}
    			}
    		]
    	}
    }
  2. Using Cognitive Toolkit, run the CopyItems tool

    CODE
    CognitiveToolkit.exe CopyItems --index-url http://localhost:9200 --index-name shiny --destination-index-url http://localhost:9200 --destination-index-name audit-dispose-2022-01-01 --query <path_to_query.json>

    💡For automation, consider using PowerShell to automatically append the date to the index name
    Replace <...> with your index name and paths

    POWERSHELL
    $CurrentDate = Get-Date -f yyyy-MM-dd
    $CognitiveToolkitArguments = @("CopyItems", "--index-url", "http://localhost:9200 ", "--index-name", "<source_index_name>", "--destination-index-url", "http://localhost:9200", "--destination-index-name", "audit-dispose-$CurrentDate", "--query", """<path_to_query.json>""")
    & "<path_to_CognitiveToolkit.exe>" $CognitiveToolkitArguments
  3. Once this process is complete, you will have a new index based on the name you gave it. You can then create an aggregate index pattern (audit-dispose-*) in the Visualizer to browse the audit indices

It is good practice to delete audit indices after some time (depending on your organizations retention rules). Using the RemoveIndex tool in Cognitive Toolkit, you can use the same aggregate name logic to delete audit indices for a given year (ex. --index-name audit-dispose-2021*)

Remove

You’ve disposed of the data, copied the disposals to the audit index, and now you need to remove the disposed of data from the source index.

  1. You can use the same query mentioned in the Copy section

    CODE
    {
    	"bool": {
    		"must": [
    			{
    				"match_phrase": {
    					"dispose": "true"
    				}
    			}
    		]
    	}
    }
  2. Using Cognitive Toolkit, run the RemoveItems tool
    💡Use --force to suppress the confirmation message

    CODE
    CognitiveToolkit.exe RemoveItems --index-url http://localhost:9200 --index-name shiny --query <path_to_query.json>
  3. Once the process is complete, you will have successfully removed the disposed of data from the source index.

Did you know…

There are two ways to run tools in the Shinydocs Cognitive Toolkit:

  1. Using the full command (as shown above).

  2. Using .bat files. Every installation of the Shinydocs Cognitive Toolkit comes with a folder of .bat files. Use these as a starting point for customizing commands to your specific environment. 

Example .bat file

COG-Query-Dispose.bat


JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.