X
  • April 3, 2020

Back Up ElasticSearch 7 to Object Storage

Joseph Holsten
Solutions Architect

When you use Elasticsearch in production, backup and recovery are fundamental to keeping your business running. So after you've built out your Elasticsearch cluster on Oracle Cloud Infrastructure with Terraform, the very next thing you should do, before tuning index performance or building awesome visualization dashboards, is back up.

Elasticsearch supports backing up to a number of storage systems through its standardized snapshot system. As the Elasticsearch documentation says, “It is not possible to back up an Elasticsearch cluster simply by taking a copy of the data directories of all of its nodes.” To get a safe and consistent snapshot of the entire dataset in your cluster, we should use only these snapshot plugins for backing up.

With Oracle Cloud Infrastructure, the easiest storage system to use is Object Storage.

And with Elasticsearch 7.0, Object Storage is supported through our S3 compatibility API. Here’s how to configure your existing cluster for snapshots, save your first snapshot, and then restore it!

Install

Following the Elasticsearch plugin installation steps, run the following command on every node:

/usr/share/elasticsearch/bin/elasticsearch-plugin install -b repository-s3
systemctl restart elasticsearch

Add Credentials

To connect the plugin to Object Storage, use the S3 compatibility API. Set up your account and collect your credentials by following our Enabling Application Access to Object Storage guide.

Then, follow the plugin client configuration steps to add the access key and secret key to the keystore on every node:

ACCESS_KEY=b1234567897f5f30db611dcc59fb3f6481ed41bc
SECRET_KEY=0123456789fJMDz9LINn+l3ych5dWhyKiM7EIkHOhzE=
echo $ACCESS_KEY | sudo /usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.default.access_key
echo $SECRET_KEY | sudo /usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.default.secret_key
curl -X POST "$(hostname -i):9200/_nodes/reload_secure_settings"

Note: This example shows nonfunctional keys. Be careful to share your API keys only with authorized users, and never publish them publicly.

Configure

Set up the repository by following the plugin repository settings instructions.

Note: Unlike the earlier commands, which you needed to run on every machine, you need to run this command only once against any active member of the cluster.

OCI_REGION=us-phoenix-1
OCI_TENANCY=appleseeds-by-j
OCI_BUCKET=elasticsearch-repository

curl -XPUT "$(hostname -i):9200/_snapshot/s3_repository" -H 'Content-Type: application/json' -d"
{
  \"type\": \"s3\",
  \"settings\": {
    \"client\": \"default\",
    \"region\": \"${OCI_REGION}\",
    \"endpoint\": \"${OCI_TENANCY}.compat.objectstorage.${OCI_REGION}.oraclecloud.com\",
    \"bucket\": \"${OCI_BUCKET}\"
  }
}
"

Set Up Sample Data

To test the snapshots, load a sample dataset of the complete works of William Shakespeare:

curl -O https://download.elastic.co/demos/kibana/gettingstarted/7.x/shakespeare.json
curl -XPUT "$(hostname -i):9200/shakespeare" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
    "speaker": {"type": "keyword"},
    "play_name": {"type": "keyword"},
    "line_id": {"type": "integer"},
    "speech_number": {"type": "integer"}
    }
  }
}
'
curl -H 'Content-Type: application/x-ndjson' -XPOST "$(hostname -i):9200/shakespeare/_bulk?pretty" --data-binary @shakespeare.json

Back Up

With the sample dataset available, manually trigger a snapshot:

curl -XPUT "$(hostname -i):9200/_snapshot/s3_repository/snapshot_1?wait_for_completion=true

Inspect the Snapshot

After the snapshot is complete, inspect it:

curl -XGET "$(hostname -i):9200/_snapshot/s3_repository/snapshot_1

Restore

Finally, restore the snapshot:

curl -XPOST "$(hostname -i):9200/_snapshot/my_backup/snapshot_1/_restore

And you're done! You've successfully configured the snapshot plugin, and created and restored individual snapshots. In the long term, you’ll want something to automatically manage these processes. Elasticsearch Curator is the ideal tool to automatically schedule snapshots. Finally, you'll want to use Oracle Cloud's Object Lifecycle Management to periodically archive and delete your records.

Try it for yourself. Start with our guide to deploying Elasticsearch on Oracle Cloud Infrastructure to quickly set up a cluster to test against, and then follow these steps yourself. If you don’t already have an Oracle Cloud account, sign up for a free trial today.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.