Amazon Web Services: EC2 in North California

January is going crazy for me down here in Moscow, lot’s of stuff happening, loads of work. No time to tweet, not time to blog. As I mentioned in my earlier post, I quit my job at GSL and now working at a new local startup. I’ll make sure to announce it as soon as the website is alright, so stay tuned ;) Anyways, as I wrote back in December, I’m moving all my stuff to the new EC2 in the Northern California region, and I guess I can say that I’m finally done.

The process is not too different from simply moving to a new dedicated hosting or to a new EC2 instance in the same region, though there are a few nuances. I was surprised to note that the S3 Fox plugin for Firefox haven’t yet added the new region (Europe is present though). I thought it might not work for some reason (S3 and EC2 being in different regions), but hopefully it does. I also considered using the good old mod_php for Apache instead of running mod_suphp which gave me a tiny boost in performance. All the configurations were straightforward, copy from one EC2, paste into the other. Not without a few changes of course.

I also had a change in the Elastic IP address, but hey that was whitelisted by Twitter! So I guess I’ll have to write to them again for the new whitelisting. Oh well.. One more interesting thing is that I’m now running on an EBS-backed instance, which was introduced by Amazon not so long ago. I wouldn’t have to worry about getting my stuff lost on a terminate or a rebooted machine as the whole drive is being dumped into an EBS. So backups are now completely instant via the AWS Management Console, they’re called Snapshots, takes one click and a few minutes ;) Now if I’d like to terminate one EC2 instance and start the whole thing over on another one, I’d just restore from EBS or Snapshot! Unless, of course, I decide to move to another region. I believe EBS blocks and Snapshots are restricted to regions, furthermore, EBS and EC2 compatibility are restricted to a certain zone in one region, which is obvious. I wouldn’t like to run an EC2 instanced in one data center, backed by a hard drive located in a different one.

Another good question would be Amazon CloudFront. Well, since the S3 buckets haven’t changed, CloudFront should work the way it used to despiting the move. Or at least I hope so ;)

W3 Total Cache with Amazon S3 and CloudFront

A few days ago Frederick Townes, author of the W3 Total Cache for WordPress has released an update to this wonderful plugin, and yes, it now fully supports Amazon S3 and CloudFront as the Content Delivery Network! This is a major one for me as I manually upload most of the static assets to my CloudFront account which may take quite a lot of time. The W3 Total Cache plugin does that for you in seconds! Post attachments, images, javascript, css.. All those could go to CloudFront in just 4 clicks. Frederick also mentioned that the upcoming update will also be surprising, which keeps me wondering.

I also tried out the other options for page and database caching. A few tests showed up that memcache is faster than APC, so that’s where I stopped at database caching. Page caching was switched to enhanced, which I believe is a new option. The site performance graph at Google Webmaster Tools shows pretty good performance for Novermber and December (very close to 1.5 seconds) although the overall average is still up at 3.5 seconds, which in terms of Google is slower than 59% of sites. This is probably caused by the force majeures in September and October. Page load time peaked at over 7 seconds there.

One more funny fact about Google’s Site performance and Page Speed tools is the “Minimize DNS lookups” section, which most of the time shows up a single entry:

The domains of the following URLs only serve one resource each. If possible, avoid the extra DNS lookups by serving these resources from existing domains: http://www.google-analytics.com/ga.js

Interesting. Perhaps I should copy that javascript file and serve it from my CDN, I wonder if that will work. Oh and then I’ll be missing all the nifty updates to Google Analytics, like the most recent one called Asynchronous Tracking – very neat by the way!

Optimizing Your Amazon Web Services Costs

I’ve been with Amazon for quite a long time now and you must have heard that their web hosting services aren’t very cheap. The average total of one instance per month (including EBS, S3 and all the others) was around $120 at the start. That was back in July 2009 when I had no idea about how all this stuff works. With a lot of experimenting I managed to drop my instance per month costs down by around 40%. Below are a few tips that can help you lower your Amazon Web Services charges:

  • Use reserved EC2 Instances where possible. Amazon charges $0.085 per hour for an m1.small Linux instance in the US, that’s around $61 per month and $734 per year. A reserved instance costs me $227 for one year, plus $0.03 per running hour, that makes it around $490 per year for an m1.small instance. Use reserved instances only if you’re sure that you’ll be using it for a whole year. You can save even more if you purchase a reserved instance for three years.
  • Storage: EBS vs EC2. Pick EC2! That’s right, EC2! EBS charges you for provisioned storage, IO requests and snapshots. These may rise pretty quickly if you’re running MySQL on an EBS block – very risky! Run your MySQL on EC2. The php files and everything else should preferably be on EC2 aswell. You can use your EBS block for tiny backups of core PHP files if you’re running more than one EC2 instance.
  • EBS is cheaper than S3. S3 should only be used in cases where you have to serve your static content from different servers (perhaps through CloudFront), and maybe store some backups there too (don’t forget to remove the old ones!), but EBS is cheaper, even with snapshots.
  • CloudFront is okay. It does speed up your website, but you have to know that it’s more expensive for requests to Japan and Hong Kong

There you go. With these tips you should be able to get the Amazon hosting services for around $90/month, unless of course you have a 3 million visitors per day website ;) Also, for those of you wondering.. I haven’t used RackSpace, but I did compare their prices to Amazon’s and they’re more expensive.

Cloud Tips: Rediscovering Amazon CloudFront

So, three months later I realized I wasn’t using CloudFront at all! Huh? I took a deeper look at my Amazon Web Services bill last month and found out that I wasn’t even charged for CloudFront! But hey, I delivered all my static content through CloudFront distributions from S3 and I had a subdomain mapped to those distributions and everything was working fine (thought I).. Let’s see:

Amazon CloudFront delivers your content using a global network of edge locations. Requests for your objects are automatically routed to the nearest edge location, so content is delivered with the best possible performance.

Right, and that’s probably what they charge for in the CloudFront section, so the fact is that I haven’t been using it at all. Gathering all the static content from the so-called “origin server” is far from what CloudFront can do. What I’ve been using for the past few months is simply delivering content from my S3 server, which is also good, but “good” is not enough. I browsed throughout the AWS Management Console for hours and couldn’t find out what I was doing wrong, the server kept pulling the content from the origin. Then, finally I realized that after I’ve created a distribution I was given two addresses and as they said, one was the origin server, the second one was the CloudFront server (it’s a .cloudfront.net subdomain underlined red), thus the settings I got all wrong were at the DNS level, not the Management Console.

Cloud Tips: Rediscovering Amazon CloudFront

So I logged back to my registrar, found the DNS management options and switched my CNAMEs to the CloudFront domain instead of the origin bucket and hoped that everything works well. The very next day I got my very first bill for Amazon CloudFront – three cents! Hurray! I’m not sure if this is well written in the documentation for CloudFront and S3 (I doubt that people read them) but I have a few friends who have experienced the same problem and why the address of the origin bucket in the first place? Weird. The S3 Firefox Organizer groups both fields into one and that’s even more weird. Oh well, glad I sorted it out.

WordPress: The template_redirect Hook Returns 404

This is a real quick one. I’ve started using the W3 Total Cache plugin a few days ago which I’m totally satisfied with. Unlike the WP Super Cache plugin, W3TC can keep the caches in memory via memcached, serve the static content through CDNs. The guys at W3 Edge promised to add Amazon CloudFront compatibility which I’m very excited about. As we all know (I guess) Mashable is running WordPress, and guess what! They’re using W3 Total Cache too, so if they could trust those guys, then I’m totally in. Their support over Twitter is awesome, you don’t even have to look for them! Just tweet your problem mentioning “W3 Total Cache” or “W3TC” – they’ll find you via search and help you solve whatever your issue is.

Anyways, that being said, after writing my post announcements to Twitter via Twitter Friendly Links some peeps shouted out that I was returning 404 errors on all of my short links. Strange, thought I, as I was able to see everything fine. Looking at the settings of the plugin I realized that I, as a logged in user, weren’t looking at cached pages, so I turned that off and guess what! I got those 404s. Creepy! After a little talk over Twitter and IM with the guys at W3 Edge, I started to look through the caching plugin code. It’s so well-written and I’ve actually learned a couple of techniques out there! Anyways, the reason I was getting 404s was because W3 TC looked at the $wp_query object and it’s $is_404 variable, which I don’t know why (Frederick from W3 Edge mentioned a redirection issue with WordPress) was returning true on all of my Twitter friendly links!

I figured out that the template_redirect hook (which is not documented in the WordPress Codex at all) didn’t quite do what I wanted it to. The easiest way out was adding a couple of lines before the redirect actually happened:

global $wp_query;
$wp_query->is_404 = false;

And guess what, it worked! This may be not the smartest decission but hey, it’s working! One more workaround would be adding a regular expression to the W3 Total Cache page settings, exclude “[0-9]+/?$” which would not cache any .com/1234 queries, but that’s not so reliable for a couple of reasons. First, why not cache if we can cache? And second, what if we’re using Twitter Friendly Links in alphanumeric mode? That would mean “[0-9a-zA-Z]+/?$” which would probably match for all the other top-level pages, thus we’re losing all the benefits from caching. Meh!

I guess this is not the only place where the template_redirect action would return a 404. Some plugins that define their own permalinks through the template_redirect hook (WP Polls for poll results for instance, not sure though) will also get 404s if they don’t set it to false in $wp_query. Many wordpress websites have profile pages which are not actual pages in the database, but are handled through the template_redirect action for .com/profile_name sort of permalinks which is indeed cool. I’m actually planning on using it with one of my upcoming projects and I’m really glad I sorted it out.

So the big shoutout goes to W3 Total Cache and the W3 Edge team!

Cloud Tips: Automatic Backups to S3

In a previous post about backing up EC2 MySQL to an Amazon S3 bucket we covered dumping MySQL datasets, compressing them and uploading to S3. After a few weeks test-driving the shell script, I came up with a new version that checks, fixes and optimizes all tables before generating the dump. This is pretty important as mysqldump will fail on whatever step would cause an error (data corruption, crashed tables, etc), thus your uploaded to S3 archive would be kind of corrupt. Here’s the script:

filename=mysql.`date +%Y-%m-%d`.sql.gz
echo Checking, Fixing and Optimizing all tables
mysqlcheck -u username -p password --auto-repair --check --optimize --all-databases
echo Generating MySQL Dump: ${filename}
mysqldump -u username -p password --all-databases | gzip -c9 > /tmp/${filename}
echo Uploading ${filename} to S3 bucket
php /ebs/data/s3-php/upload.php ${filename}
echo Removing local ${filename}
rm -f /tmp/${filename}
echo Complete
<pre>

There you go. If you remember my previous example I stored the temporary backup file on Amazon EBS (Elastic Block Storage) which is quite not appropriate. Amazon charges for EBS storage, reads and writes, so why the extra cost? Dump everything into your temp folder on EC2 and remove afterwards. Don't forget to make changes in your upload.php script ($local_dir settings). Also, just as a personal not and to people who didn't figure out how to upload archives with data to S3, here's another version of the script which takes your public_html (www, htdocs, etc) directory, archives it, compresses and uploads to an Amazon S3 bucket:

<pre>filename=data.`date +%Y-%m-%d`.sql.gz
echo Collecting data
tar -czf /tmp/${filename} /ebs/home/yourusername/www
echo Uploading ${filename} to S3 bucket
php /ebs/data/s3-php/upload.php ${filename}
echo Removing local ${filename}
rm -f /tmp/${filename}
echo Complete

Oh and have you noticed? Amazon has changed the design a little bit, and woah! They’ve finally changed the way they show the Access Secret without a trailing space character! Congrats Amazon, it took you only a few months.

Cloud Tips: Backing Up MySQL on Amazon EC2 to S3

Now that I’m all set up in the Amazon cloud I’m starting to think about backups. Elastic Block Storage (EBS) on Amazon is great and the Snapshots (backups) can be generated with a few clicks from the Management Console, but, for a few reasons I’d like to set up my own backup scripts and here’s why:

  • Amazon EBS snapshots are cool, but there might be a situation where I’d like to restore only one file, or a few rows from a MySQL dump or whatever. EBS works in bundled mode, this means that you store an image of the hard-drive you’re working it, no matter how many files there are. It might be painful setting up an extra hard-drive just for backups and work with its snapshots
  • I don’t believe there’s a feature to schedule or automate EBS snapshots
  • I’d like a simple way to download backed up date onto my local PC
  • Some of my clients like to get their weekly backups by FTP

I don’t really care about my php files, images and other stuff that’s located on my EBS, cause I’m sure I have local copies of all that. The most important part in all my projects is the data stored in my MySQL database, thus I’m going to show you how to setup a simple shell script to generate daily MySQL backups and a simple php script to upload them to a secure S3 bucket.

Take a look at this simple shell script:

filename=mysql.`date +%d.%m.%Y`.sql.gz
echo Generating MySQL Dump: ${filename}
mysqldump -uyour_username -pyour_password --all-databases | gzip -c9 > /ebs/backups/${filename}
echo Uploading ${filename} to S3 bucket
php /ebs/data/s3-php/upload.php ${filename}
echo Removing local ${filename}
rm -f /ebs/backups/${filename}
echo Complete

I’m assuming you’re familiar with shell scripting, thus there’s no need to explain the first few lines. Don’t forget to type in your own username and password for MySQL access, also, I used the path /ebs/backups/ for my daily MySQL backups. You choose your own.

There are a few scripts located in /ebs/data/s3-php/ including upload.php, which takes a single parameter – filename (don’t put your path there, let’s keep things simple). The script simply reads the given file and uploads it to a preset path into your preset S3 bucket. I’m working with the S3-php5-curl class by Donovan Schonknecht. It uses the Amazon S3 REST API to upload files and it’s just one php file called S3.php, which in my case is located in /ebs/data/s3-php right next to my upload.php script.

Before going on to upload.php take a look at this file which I called S3auth.php:

$access_id = 'your_access_id';
$secret_key = 'your_secret_key';
$bucket_name = 'bucket';
$local_dir = '/ebs/backups/';
$remote_dir = 'backups/';

This is the settings file which I use in upload.php. These particular settings assume your backups will be located in /ebs/backups and will be backed up to an Amazon S3 bucket called ‘bucket’ and in the ‘backups’ directory within that bucket. Using single quotes is quite important, especially with the secret_key, as Amazon secret keys often include the backslash symbol. Here’s the upload.php script:

require("S3.php");
require("S3auth.php");

$s3 = new S3($access_id, $secret_key);
$s3->putBucket($bucket_name, S3::ACL_PRIVATE);
$s3->putObjectFile($local_dir.$argv[1], $bucket_name, $remote_dir.$argv[1], S3::ACL_PRIVATE);

All simple here according to the S3.php class documentation. Notice the $argv[1], that’s the first argument passed to the upload.php script, thus the filename of the backup file.

That’s about everything. Try a few test runs with the shell script (remember to chmod +x it otherwise you’ll not be able to execute it) and finally setup a cron running the script daily. Mine works like a charm!