When a cluster (more than 1 node) is deployed in the Analytics Engine, you cannot just backup the data folder anymore. The shards could be mid-transfer and you won't be able to capture all data properly. Shinydocs recommends using the built-in Snapshot API to achieve this.
While Shinydocs recommends this strategy, the Elasticsearch Snapshot API is developed by Elastic Co. We are able to assist with implementing this strategy with your organization, but Shinydocs does not take responsibility for the Snapshot technologies from Elastic. Issues with the Snapshot API can be resolved with Elastic.
Considerations
Size of Snapshots
The size of your snapshot depends on the size of your index/indices. You can view the current size of your indices by running GET _cat/indices?v in the Visualizer Dev Tools Console. If using compression in your snapshots, you may see compression of approximately 40%. Ensure your destination has enough disk space available.
Repository location
This guide will focus on using a Windows-based shared UNC path.
Official API Documentation
You can view Elastic’s official documentation on this API here: https://www.elastic.co/guide/en/elasticsearch/reference/6.8/modules-snapshots.html
Instructions for using Elasticsearch Snapshot
Luckily, they have a solution! Hellllloooooo snapshots! This can be done and stored in a repo in the cloud, but also locally via UNC.
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/modules-snapshots.html
Step 1
Add the "path.repo" line to your elasticsearch.yml file
path.repo: ["\\\\ACME\\share\\ESBackups"]
Save the file and restart Analytics Engine service
Step 2
Register the repo with Elasticsearch and name the "snapshots" repository. This information is submitted in the Visualizer’s Dev Tools console.
PUT /_snapshot/name_of_repository
{
"type": "fs",
"settings": {
"location": "\\\\path\\noted\\in\\path.repo",
"compress": true
}
}
name_of_repository
location
compress
This will be called in a later command. This is how you can define what the name of the repository is. Some good naming options would be: shinydocs_cluster_snapshot, backup, companyname_snapshot
You need to put in the exact same path you put in path.repo. In this example, you would enter: \\\\ACME\\share\\ESBackups
Accepts true or false.
If true (recommended), the snapshot will be compressed (average ~40%). This will mean more CPU usage during the process but the snapshots will be smaller.
If false, the snapshot will not be compressed. Less CPU is used, but will result is larger snapshots and slightly more network activity.
Step 3
Once the repository has been registered with the cluster, you are now able to initiate a snapshot!
The first snapshot you do will be a full snapshot and will take the longest. Once the next snapshot is initiated in the same repository, an incremental snapshot is performed (sometimes referred to as delta). Incremental snapshots run relatively quickly, only changes (addition and removal) are processed.
To initiate a snapshot, the following can be run in the Visualizer’s Dev Tools console:
PUT /_snapshot/name_of_repository/name_of_snapshot?wait_for_completion=true
name_of_repository
name_of_snapshot
This is the same name you used in Step 2
The name of this snapshot. We recommend appending the date and time (ex. snapshot_01-01-2020)
OR
If you would like to just snapshot individual indices, you can add a body to the command and specify the index/indices to be included
PUT /_snapshot/name_of_repository/name_of_snapshot?wait_for_completion=true
{
"indices": "index_1,index_2",
"ignore_unavailable": true,
"include_global_state": false
}
indices
ignore_unavilable
include_global_state
Provide the name(s) of the index(s) you wish to include.
(ex. “index_1” | “index_1,index2” | “index_1,index_2,index_3”)
Ignores shards that are not available at the time of the snapshot. If shards were not available, the resulting snapshot will be partial.
We recommend true
Includes the state of the cluster. If set to true, the state (green, yellow, red) will also be included.
We recommend false
Restore
You can restore the snapshot to the original cluster or in a whole new environment. If restoring to a new environment, ensure sufficient hardware requirements are met.
The following can be run in the Visualizer’s Dev Tools console:
PUT /_snapshot/name_of_repository/name_of_snapshot/_restore
There are various options for advanced restoring. We encourage you to visit Elasticsearch’s website for the official API documentation for more advanced details.