As part of our build automation here at Riot, we've been trying to find solid options to backup our servers (configs, logs, data etc.) to an off-site location. Our provider does daily backups of our servers and restores data on demand, which is certainly nice, but left us wanting more fine grained control of the process. Cost, simplicity and security were our top concerns, and our search led us to start using duplicity combined with Amazon's S3. Here's how we use it.
You will need to have librsync installed on your system as well. In ubuntu:
apt-get install librsync-dev
Since duplicity is a python app, we chose to install it in a virtualenv. It's pip installable, but is not in pypi, so you will have to point pip at the tarball.
virtualenv duplicity cd duplicity source bin/activate pip install -E . http://code.launchpad.net/duplicity/0.6-series/0.6.11/+download/duplicity-0.6.11.tar.gz boto
or in ubuntu:
apt-get install duplicity
If you want to encrypt your backups you will need to generate a GnuPG key, like so:
gpg --gen-key
You can accept the default options during install, make sure you add in a passphrase to the key, as duplicity will not work without it.
S3 is just one of the many backends duplicity supports. Their docs have more info.
Here's our backup script:
export AWS_ACCESS_KEY_ID='xxxxxx' export AWS_SECRET_ACCESS_KEY='xxxxxx' export PASSPHRASE='xxxxxx' export NOW=`date +"%Y-%m-%d-%H-%M"` duplicity --exclude ".*" --include "**" --full-if-older-than 30D \ --log-file /var/log/duplicity/s3-$NOW.log --verbosity 6 \ --s3-use-rrs --s3-use-new-style --asynchronous-upload \ /var/www/backups s3+http://riot.xxxx.xxxx export AWS_ACCESS_KEY_ID= export AWS_SECRET_ACCESS_KEY= export PASSPHRASE= export NOW=
Restoring is a snap too. Though we haven't had the need to restore yet, this is how you would:
# Restore a file duplicity --file-to-restore var/www/backups/code.tar s3+http://riot.xxxx.xxxx ~/tmp/restore # Restore a directory duplicity --file-to-restore var/www/backups/db s3+http://riot.xxxx.xxxx ~/tmp/restore # Restore everything from a point in time duplicity -t 2011-02-19T12:20:45 s3+http://riot.xxxx.xxxx ~/tmp/restore
The backup script runs hourly and does incremental backups to our S3 bucket.
I wrote an implementation of the Levenshtein algorithm in python a few days back, and today while noodling around, I came across another implementation of the same algorithm, written by Magnus Hetland the author of Python Algorithms and wanted to see which was the "faster" implementation.
So, enter the timeit() module in python. Here's what I did:
>>> def levenshtein(a,b): ... "Magnus's Code" ... ... [ Code here ] ... >>> def leven(a,b): ... "Rohit's Code" ... ... [ Code here ] ... >>> import timeit >>> t1 = timeit.Timer(setup='from __main__ import levenshtein', stmt='levenshtein("plumber","causes")').timeit() >>> t1 50.655728101730347 >>> t2 = timeit.Timer(setup='from __main__ import leven', stmt='leven("plumber","causes")').timeit() >>> t2 68.573153972625732
Seems like Magnus has me beat :(.
One point to note here is that timeit() temporarily turns off garbage collection, so if your code requires it you will need to add it in.
>>> import gc >>> setup = """\ ... from __main__ import levenshtein ... gc.enable() ... """ >>> t2 = timeit.Timer(setup=setup, stmt='levenshtein("plumber","causes")').timeit()
There is also quite a nice collection of python performance tips here.