So as many already know, EC2 doesn't save data if the instance is turned off...in fact, as of right now my data isn't backed up. I have an image, but it's dated, meaning this post won't be "safe" if something were to happen.
While I don't expect my instance to crash and while I'm not doing anything that would make it crash, you never know. So everyone is figuring out backup and failover solutions. In fact, it's a requirement for any production site.
So I'm going to jot down a list of possible solutions and tips over time so I can go back to figure out which to employ.
First, Subversion is your friend. I already have this site on SVN. I don't use FTP (bleh) any longer in fact. All changes to this site are done via SVN post commit scripts. Meaning after I make a change and submit it to the repository for versioning, it updates this site with just the changes. This really saves a boat load of time.
We can use Subversion and it's great hook scripts for more though. In addition to running an update command on the working copy that runs this site, we can also mirror our SVN repository to an S3 account. Not only are we backing up to S3 now (on each change to the site), but we're getting it all under version control.
The downside in my mind to this is that I make a bunch of changes (especially in development) so this will give us an increased number of requests which can add up. I really think requests are the silent killer when it comes to S3. So perhaps instead of running on the post-commit, we can simply setup a cron job to run the command to sync the repository every day or whatever time period.
Remember with SVN: It only updates what has changed, so the transfer in will be much less than if you sent all the files or even the changed file. SVN is jus sending the changes.
So that's two options.
1. Keeping the site's code completely backed up from the minute it changes.
2. Backing it up on a daily/hourly/etc. basis.
It will completely depend on the level of risk you can or wish to take. I'll most likely (I haven't set this up yet) run a cron job every day. Also important to note: If there are no changes, nothing happens. So this backup is efficient.
Drawbacks: It won't copy over the database or any images/files that were uploaded outside the SVN. For example, I can attach an image with this post and SVN won't know about it. So it would be lost along with this post. A mySQL replication/backup solution would have to be figured out along with some sort of rsync of the files (ignoring .svn directories) to catch the images. This does backup the site's scripts as well...but doesn't put them under version control which is a very nice thing to have. So in the event of a crash on a larger site, not only would there be recovery, but all the changes that were made to the site would still be saved...Very handy if it was a scripting error that caused the instance to crash somehow. You could pinpoint it easier.
Here's a more info about how from the free SVN book.