(Archived) Backup recommendations for Analytics Engine and Visualizer

When a cluster (more than 1 node) is deployed in the Analytics Engine, you cannot just backup the data folder anymore. The shards could be mid-transfer and you won't be able to capture all data properly. Shinydocs recommends using the built-in Snapshot API to achieve this.

While Shinydocs recommends this strategy, the Elasticsearch Snapshot API is developed by Elastic Co. We are able to assist with implementing this strategy with your organization, but Shinydocs does not take responsibility for the Snapshot technologies from Elastic. Issues with the Snapshot API can be resolved with Elastic.

Considerations

Size of Snapshots

The size of your snapshot depends on the size of your index/indices. You can view the current size of your indices by running GET _cat/indices?v in the Visualizer Dev Tools Console. If using compression in your snapshots, you may see compression of approximately 40%. Ensure your destination has enough disk space available.

Repository location

This guide will focus on using a Windows-based shared UNC path.

Official API Documentation

You can view Elastic’s official documentation on this API here: https://www.elastic.co/guide/en/elasticsearch/reference/6.8/modules-snapshots.html

Instructions for using Elasticsearch Snapshot

Luckily, they have a solution! Hellllloooooo snapshots! This can be done and stored in a repo in the cloud, but also locally via UNC.

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/modules-snapshots.html

Step 1

Add the "path.repo" line to your elasticsearch.yml file

path.repo: ["\\\\ACME\\share\\ESBackups"]

Save the file and restart Analytics Engine service

Step 2

Register the repo with Elasticsearch and name the "snapshots" repository. This information is submitted in the Visualizer’s Dev Tools console.

PUT /_snapshot/name_of_repository
{
    "type": "fs",
    "settings": {
        "location": "\\\\path\\noted\\in\\path.repo",
        "compress": true
    }
}

name_of_repository

location

compress

This will be called in a later command. This is how you can define what the name of the repository is. Some good naming options would be: shinydocs_cluster_snapshot, backup, companyname_snapshot

You need to put in the exact same path you put in path.repo. In this example, you would enter: \\\\ACME\\share\\ESBackups

Accepts true or false.
If true (recommended), the snapshot will be compressed (average ~40%). This will mean more CPU usage during the process but the snapshots will be smaller.
If false, the snapshot will not be compressed. Less CPU is used, but will result is larger snapshots and slightly more network activity.

Step 3

Once the repository has been registered with the cluster, you are now able to initiate a snapshot!

The first snapshot you do will be a full snapshot and will take the longest. Once the next snapshot is initiated in the same repository, an incremental snapshot is performed (sometimes referred to as delta). Incremental snapshots run relatively quickly, only changes (addition and removal) are processed.

To initiate a snapshot, the following can be run in the Visualizer’s Dev Tools console:

PUT /_snapshot/name_of_repository/name_of_snapshot?wait_for_completion=true

name_of_repository

name_of_snapshot

This is the same name you used in Step 2

The name of this snapshot. We recommend appending the date and time (ex. snapshot_01-01-2020)

OR

If you would like to just snapshot individual indices, you can add a body to the command and specify the index/indices to be included

PUT /_snapshot/name_of_repository/name_of_snapshot?wait_for_completion=true
{
  "indices": "index_1,index_2",
  "ignore_unavailable": true,
  "include_global_state": false
}

indices

ignore_unavilable

include_global_state

Provide the name(s) of the index(s) you wish to include.
(ex. “index_1” | “index_1,index2” | “index_1,index_2,index_3”)

Ignores shards that are not available at the time of the snapshot. If shards were not available, the resulting snapshot will be partial.
We recommend true

Includes the state of the cluster. If set to true, the state (green, yellow, red) will also be included.
We recommend false

Restore

You can restore the snapshot to the original cluster or in a whole new environment. If restoring to a new environment, ensure sufficient hardware requirements are met.

The following can be run in the Visualizer’s Dev Tools console:

PUT /_snapshot/name_of_repository/name_of_snapshot/_restore

There are various options for advanced restoring. We encourage you to visit Elasticsearch’s website for the official API documentation for more advanced details.