Automating EC2 instances for database backups

😱

A nightmare scenario for any business: needing a backup that isn't there!

Procedures for performing and verifying backups have always been necessary, and if this can happen regularly, automatically, monitored for any failures, this will give peace of mind and a high degree of confidence in the disaster recovery plans.

Let us have a look at some techniques within AWS cloud services that we've found useful at Chargify.

Scenario 1: Verify database EBS volume

Take a MySQL database that is stored on an EBS volume, which has EBS snapshots taken regularly as backups. The snapshots are tagged.

Say, once per week, we want to spin up an EC2 instance with the latest snapshot attached, verify MySQL can read it, or perform a database dump. We want to use a spot instance to minimize the cost.

Launching the instance

Here's a sample python lambda script to:

  • find the most recent snapshot,
  • create a new tempoary EBS volume based on the snapshot, and
  • launch an EC2 spot instance with it attached.
import boto.ec2
from boto.ec2.blockdevicemapping import BlockDeviceType, BlockDeviceMapping, EBSBlockDeviceType

def handle(event, context):
  conn = boto.ec2.connect_to_region('us-east-1')

  snapshots = conn.get_all_snapshots(filters={'tag-key': 'dbsnapshot'})
  snapshots.sort(cmp=lambda x,y: cmp(y.start_time, x.start_time))
  latest_snapshot = snapshots[0]
  vol_size = latest_snapshot.volume_size

  bdm = BlockDeviceMapping()
  bdm['/dev/sda1'] = EBSBlockDeviceType(size=10,delete_on_termination=True)
  bdm['/dev/sde'] = BlockDeviceType(snapshot_id=latest_snapshot.id,delete_on_termination=True,volume_type="gp2")

  conn.request_spot_instances(price=0.10, image_id='ami-d15a75c7',                           instance_profile_arn='...', instance_type='m3.medium', security_group_ids=['...'], subnet_id="...", user_data=user_data_script, block_device_map=bdm)
  return event

The security group and subnet IDs need to be filled in appropriately based on your EC2 network settings. The tag-key snapshot filter needs to be changed to whatever string your EBS snapshots are tagged with.
The instance_profile_arn is optional, to point to an IAM role document if you need to assign the instance any extra permissions (for example, to upload to S3).

Instance configuration with EC2 user-data

Using EC2 user-data can preconfigure an otherwise blank Ubuntu server instance to run commands on startup, following the cloud-init specification. Here's an example:

user_data_script = """Content-Type: multipart/mixed; boundary="===169238765==="
MIME-Version: 1.0

--===169238765===
Content-Type: text/cloud-config; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="cloud-config.yml"

#cloud-config
package_update: true
power_state:
  mode: poweroff
cloud_final_modules:
 - rightscale_userdata
 - scripts-per-once
 - scripts-per-boot
 - scripts-per-instance
 - scripts-user
 - keys-to-console
 - phone-home
 - final-message
 - power-state-change

--===169238765===
Content-Type: text/x-shellscript; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="run_on_load.sh"

#!/bin/bash
apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y --force-yes mysql-server postfix
/etc/init.d/mysql stop

exec > >(tee -a -i /tmp/mail.txt)
exec 2>&1
echo "Subject: MySQL Snapshot Report" ; echo

/etc/init.d/mysql stop
mkdir /vol && mount /dev/xvde /var/lib/mysql
chown -Rf mysql:mysql /var/lib/mysql
/etc/init.d/mysql start &
sleep 300
mysql --table -A DatabaseName -e 'show tables; select created_at as "latest event" from events order by id desc limit 1'
sendmail example@recipient.com < /tmp/mail.txt
/sbin/poweroff &

--===169238765===
"""

This cloud-init script has these parts:

  • Configures cloud-init to run a shell script on boot-up
  • Embeds the shell script in a multi-part document
  • The shell script installs MySQL,
  • Mounts the EBS volume into /var/lib/mysql
  • Starts MySQL and sleeps (to give MySQL time to restore InnoDB to consistent state)
  • Outputs a list of database tables and the timestamp of the most recent "event" record.
  • Emails all of this console output to someone who can verify that it all looks correct and complete.

The benefit of using user-data is that no other instance configuration is necessary; this one file contains the lot. The instance is only going to live long enough to perform this one function and then shut down.

Automatically running the function with Lambda

Lambda gives the ability to periodically run a single Javascript or Python function, without needing to configure and spin up a whole server.

It isn't suitable for very long-running things (like running a MySQL backup itself), but it does give us a neat way to spin up an EC2 instance to perform a single task and then shut down.

Putting it all together now, lets make it run by itself, as an AWS Lambda function. Schedule it to run weekly:

And configure the function:

Role refers to an IAM Role Policy document, which for example, would contain this policy snippet:

{
  "Effect": "Allow",
  "Action": ["ec2:Describe*", "ec2:CreateVolume", "ec2:AttachVolume", "ec2:DetachVolume", "ec2:CreateTags", "ec2:ModifyInstanceAttribute"],
  "Resource": ["*"]
},
{
  "Effect": "Allow",
  "Action": ["ec2:RunInstances", "ec2:RequestSpotInstances"],
  "Resource": ["*"]
}

This gives the lambda function permission to find the EBS snapshot within your AWS account, and create a spot instance with a new EBS volume attached.

Scenario 2: DynamoDB backup

Here's a sample snippet of user-data shell script to have the EC2 instance dump the contents of a DynamoDB database to a file in S3, using the dynamo-backup-to-s3 utility.

apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y --force-yes nodejs npm sendmail
npm install dynamo-backup-to-s3
nodejs <<EOF
var DynamoBackup = require('dynamo-backup-to-s3');
var moment = require('moment');
var now = moment.utc();
var path = ('dynamodb/backup-' + now.format('YYYY-MM-DD-HH-mm-ss'));
console.log('Saving to: ' + path);
var backup = new DynamoBackup({ readPercentage: .75, bucket: 'dynamo-backups', saveDataPipelineFormat: false, backupPath: path, stopOnFailure: false, base64Binary: true, awsRegion: 'us-east-1'});
backup.on('error', function(data) {
  console.log('Error backing up ' + data.table);
  console.log(data.err);
});
backup.on('start-backup', function(tableName, startTime) {
  console.log('Starting to copy table ' + tableName + ' at ' + startTime);
});
backup.on('end-backup', function(tableName, backupDuration) {
  console.log('Done copying table ' + tableName + ' duration: ' + backupDuration + 'ms');
});
backup.backupAllTables(function() {
  console.log('Finished backing up DynamoDB');
});
EOF

And here's IAM role policy document snippet to allow the instance to backup dynamo:

{
  "Effect": "Allow",
  "Action": ["dynamodb:ListTables","dynamodb:DescribeLimits","dynamodb:DescribeReservedCapacity","dynamodb:GetRecords","dynamodb:DescribeTable","dynamodb:Scan","dynamodb:Query","dynamodb:GetItem","dynamodb:BatchGetItem"],
  "Resource": ["*"]
},
{
  "Action": "s3:*",
  "Effect": "Allow",
  "Resource": ["arn:aws:s3:::dynamo-backups/dynamodb","arn:aws:s3:::dynamo-backups/dynamodb/*"]
}

Note that the role policy for the Lambda function needs something extra to allow it to pass the above policy on to the EC2 instance it creates, replacing xxxxx with the full path of the above policy:

{
  "Effect": "Allow",
  "Action": "iam:PassRole",
  "Resource": "arn:aws:iam::xxxxx"
}