Tag Archives: performance

Get an Expired Transient in WordPress: Good Idea or Crazy Talk?

I wonder if there’s an easy way to get an expired transient in WordPress? Now, for this to make sense I guess I should provide a little more context, so here it goes :)

The Transients API is an extremely easy way to cache parts of your WordPress code, that may be CPU/memory intensive, or rely on a third-party server and so on. A great example is grabbing a tweet from Twitter and caching it for a few minutes, so that we don’t query the Twitter API on every page load. So in theory:

if ( false === ( $tweet = get_transient( 'my_latest_tweet' ) ) ) {
    $tweet = get_my_latest_tweet(); // Queries the Twitter API
    set_transient( 'my_latest_tweet', $tweet, 60*60 );
}

Is a great way to cache my latest tweet for an hour. Perfect, but there’s a problem. The code above will work well for 60 minutes. When the transient expires, it will have to spend time again to fetch the tweet, thus impacting your page load time. Once every 60 minutes.

Suppose Twitter is being slow today and taking 2 seconds to respond back, and suppose I want to cache the tweet for 5 minutes, not an hour. This means that every 5 minutes, somebody will have to wait two extra seconds for my page to load. Suppose I’m querying the Flickr API, the Facebook API and fetching a few RSS feeds too, that will all add up to the page load time, to that unlucky person who visited my site, when the cache was expired. Bummer!

Is there a solution? I don’t know, but I can think of one — always serve cached data, even if it’s expired. That way visitors will never have to wait extra when you’re speaking to third party servers and APIs. I can think of a way to accomplish this by fetching the new data in a different request, transparent to the user.

Like an asynchronous request with jQuery (aka AJAX) to a special action that would revalidate expired cache. Crazy talk? Here’s the scenario:

  1. You visit your page when caches are empty, you don’t see the tweet.
  2. An async jQuery request is fired to the server which grabs the latest tweet and caches it, this happens behind your back.
  3. You refresh the page and you see the tweet from cache. Yey!
  4. Ten minutes have passed, cache is expired, but not trashed. You visit your page, you see the old tweet from (expired) cache.
  5. An async request in the back fetches a new tweet from Twitter and replaces the one expired.
  6. You refresh the page and you see the (new) tweet from cache. Yey!

So in steps 1, 3, 4 and 6 you’re always serving what’s in cache, whether it expired or not, even if it’s empty (first step) so you’re serving as fast as possible. Steps 2 and 5 happen behind your back, they don’t impact page load time, and they never return anything back. You won’t see them happening unless you’re looking at your Netwok tab in Chrome’s developer tools or Firebug.

Now, suppose Twitter is down. The visible steps are not impacted, because they return stored data, they don’t have to fetch it from Twitter. The hidden steps will fail, but your visitors will never know, since they’ll be still seeing the old tweet fetched some time earlier, right?

This can easily be done with the Options API but hey, transients are already a great tool for caching. Wouldn’t it be even easier if transients never actually expired? And a function to fetch such an expired transient, get_expired_transient perhaps? :)

Okay, that’s just off the top of my head, it’s quite late so, sorry if I’m totally on the wrong track. Let me know if there’s an easier solution, or maybe a caching plugin that already does this. Object caching can benefit too by the way. Share your thoughts in the comments section below!



Google Analytics Proxy with Nginx

Here’s a quick tip! If you need to serve a specific script, stylesheet or any other file from your own domain, you can easily proxy it with nginx. A good example is the ga.js file for Google Analytics. Here’s how I proxy it with nginx, in the server context:

# Google Analytics Proxy
rewrite ^/ga.js$ /ga/ last;
location /ga/ {
        proxy_pass http://www.google-analytics.com/ga.js;
        break;
}

This rewrites the ga.js filename to the /ga/ pseudo-directory, in the context of which I can use the proxy_pass directive to fetch the file from Google. This way I have total control over the file that’s being served and especially the HTTP headers, which I was after in the first place.

You can repeat the trick with basically any file, but keep in mind that each one is a little extra load on your server, so add a caching layer where possible.



Varnish and Preview Posts in WordPress

I wrote earlier that I started playing around with Varnish here on my site and that post has a little snippet that strips all incoming and outgoing cookies (except the admin of course.) Today I stumbled on a problem where I had no permission to preview a post I was drafting in WordPress and they all turned 404′s when I tried to.

I first thought Varnish was stripping out the preview query string but I was wrong, the problem was that WordPress new I was logged in and editing the post when I were in the admin panel, but when I tried to preview it on the front end Varnish was stripping out my login cookies, hence it didn’t display my draft post.

Here’s a snippet for vcl_recv and vcl_fetch that should go before the unset cookies statement which passes the request to the web server if preview=true is found in the request URL.

if (req.url ~ "preview=true") {
	return(pass);
}

Restart the Varnish service and voila! Your cookies aren’t stripped out anymore and you can now preview your posts and pages. Do note though that if somebody manually types in the preview query string in the browser, they’ll by-pass Varnish as well.



Varnish and WordPress Comment Cookies

I wrote before that I’ve been running some experiments with Varnish lately. Not that I have huge traffic here but I’ve always used my own website to test things out. As I wrote earlier, the problem with WordPress and Varnish is that WordPress relies on cookies which create cache misses on Varnish and in that previous post I shared a snippet on how to strip all incoming and outgoing cookies, which hopefully solves one problem.

The second problem now is that cookies are disabled throughout the whole website (except the admin section of course) so if you’ve got commenters on your website, their browsers will no longer save their names and e-mails, so they’ll have to type them in every time they want to leave a comment. I agree it’s a pain, which is why I was searching for a solution, which turned out to be quite simple — handle it all on the client side using javascript.

So I wrote a little plugin which you can download and use. It simply enqueues a javascript file if the request is a singular page and comments are open. It passes in some cookie constants used by WordPress. The javascript itself hooks onto the submission of the comment form to create cookies and restores the form fields upon page load.

Note that the plugin can fail if you’re rendering your comment form differently (with different DOM IDs) from WordPress, and don’t forget to purge.url your Varnish cache after activating!



My website is now super-fast with Varnish – an open source HTTP accelerator which sits on top of your HTTP servers and serves your cached pages.

It didn’t take me longer than 20 minutes to get Varnish up and running on my Ubuntu VPS, and a few more minutes to set all my nginx configurations to port 8080, bringing Varnish to port 80. The problem with WordPress however, is that it heavily relies on cookies, resulting in almost no cache hits with Varnish. Here’s a little snippet for your Varnish configuration file that removes all cookies and still leaves your admin dashboard accessible:

# Remove all cookies sent to web server except for wp-login and admin
sub vcl_recv {
	if (!(req.url ~ "wp-(login|admin)")) {
		unset req.http.cookie;
	}
}

# Remove all cookies sent by web server except wp-login and admin
sub vcl_fetch {
	if (!(req.url ~ "wp-(login|admin)")) {
		unset beresp.http.set-cookie;
	}
}

Anyhow, even if you don’t look for the wp-login and wp-admin and clear out all cookies sent back and forth, you could still access your admin panel by asking for it directly from your webserver, i.e. from port 8080 (or however you have configured it) which will bypass Varnish.



Did you know that you can give a hint to MySQL on which index to use? It’s called Index Hint and can be part of a query where you feel MySQL is doing a wrong choice (although that’s quite unlikely these days.)

SELECT * FROM table1 USE INDEX (your_index)
    WHERE col1 = 1 AND col2 = 2 AND col3 = 3;

Where your_index is the name of the index you’d like to use for this query. You can supply several comma-separated indexes and MySQL will pick the one it thinks is best. Alternatively you can tell MySQL to IGNORE INDEX too! Do benchmark though before making the final decision ;)



Pickle vs JSON — Which is Faster?

If you’re here for the short answer — JSON is 25 times faster in reading (loads) and 15 times faster in writing (dumps). I’ve been thinking about this since I wrote the ObjectProperty and JsonProperty classes for Google App Engine. They’re both easy to use and work as expected. I did have some trouble with ObjectProperty but I figured it out in the end.

As my previous posts mention, the ObjectProperty class uses Python’s pickle module, while JsonProperty works with simplejson (bundled with Python 2.6 and above, available through django.utils in Google App Engine). I decided to measure the performance of these two.

Unfortunately I couldn’t do much benchmarking on top of Google App Engine since there’s too much lag between the application server and Google’s Datastore so I decided to write simple benchmarks and find out which is faster — pickle or JSON. I started out by constructing a dataset which I’ll be pickling and “jsoning”, which resulted in some random lists, dictionaries and nested dictionaries containing lorem ipsum texts.

I then used Python’s timeit module to measure how long it took to “dumps” and “loads” the dataset using pickle and simplejson. I also measured the resulted pickle/json strings length to see which will be smaller in size, and guess what — JSON wins in all rounds. I ran the tests 10, 20, 50, 100, 500 and 1000 times for reading, writing and length comparison. Below are three charts illustrating the results:

As you see, dumps in JSON are much faster — by almost 1500%, and that is 15 times faster than Pickling! Now let’s see what happens with loads:

Loads shows even more goodies for JSON lovers — a massive 2500%, how’s that!? Of course some of you might be concerned with size, memory usage, etc. Since there’s no good method of measuring the actual bytes, I used Python’s len function to simply measure the number of characters in the resulting pickle/JSON string.

So yes, JSON is faster in all three aspects. If you’d like to experiment yourself, feel free to use the source code I wrote. Beware of running the 500/1000 tests, those can take hours ;)

The benchmark was done on an Ubuntu 10.10 64-bit machine with Python 2.6 installed, but I don’t think that results will be different on others. The conclusion to this is that if you need to store complex objects, such as functions, class instances, etc., you have to use pickle, while if you’re only looking for a way to store simple objects, lists and nested dictionaries, then you’re better off with JSON.

Thank you for reading and retweeting ;)

Update: If you’re sticking to Pickling objects and have the freedom to use C compiled libraries, then go ahead with cPickle instead of pickle, although that still lacks behind JSON (twice in loading and dumping). As to App Engine, I tried running a short benchmark with cPickle vs simplejson from the django.utils package, results were better for pickle, but still not enough to beat JSON which is 30% faster. I know there are other libraries worth mentioning here, but my hands are tied since I’m running App Engine with Python 2.5 and not allowed to install extra modules ;) Cheers and thanks for the comments!



Driving the (ve) Server at Media Temple

It’s been a few weeks now since Media Temple launched their new (ve) Server and I’ve been testing it out for a few days now. I’m actually hosting my blog there to experience some real traffic load and my first impressions are awesome!

I started off with the simplest 512 MB server and transferred a few websites to the new platform. I’m not too used to the Ubuntu Linux operating system but I found my way around quickly. They do have other operating systems options, but Ubuntu is the one they recommend. First few tests showed that my load time decreased dramatically compared to my Amazon EC2 instance, which I was quite happy with. Next step was to run a few load tests using the Apache Benchmark tool (ab), and very soon I realized that I got quite a few failed requests, memory shortage and other strange stuff.

Media Temple’s (ve) servers are hosted on the Virtuozzo platform by Parallels, and after browsing their documentation I found out that there’s no swap space available for Virtuozzo containers. They do allow around 80% of burstable RAM (so you get around 1 GB when running 512 MB) but when that runs out, you’re left with nothing, not even some swap space on your hard drive. Some heavy load tests showed 30% request failure, which is quite horrible.

Media Temple don’t give much information on the new platform via the support system and in memory shortage questions in their user forums they advice you to upgrade, of course! Well, I wouldn’t like to upgrade to just run a couple of load tests, and what about Digg-traffic? Should I predict that and upgrade before the spike? Then downgrade again to save some cash? Of course not.

A good option I found here is to tune Apache a little bit, reduce it’s resources limits. This will not increase performance, but may guarantee a 100% fail-safe workflow. We wouldn’t like our users to see a blank page (or a memory shortage error) when a spike hits, but we would rather want them to wait more than often and still load the requested page. The settings mostly depend on what software you’re running, which services and the RAM available in your container.

You might want to reduce the KeepAliveTimeout in your apache settings (mine’s now set to 5), and the rest is up to the mpm prefork module. You’ll have to modify your settings and then run some tests until you’re comfortable with the results. Mine are the following:

<IfModule mpm_prefork_module>
    StartServers 3
    MinSpareServers 2
    MaxSpareServers 5
    MaxClients 10
    MaxRequestsPerChild 0
</IfModule>

This is on a 512 MB (~ 400 more burstable) container. An Apache Benchmark test showed that 100 concurrent (simultaneous) requests performed in 26 seconds with 0% failed requests, this makes 3.84 requests per second, which is quite good. To give a comparison, the same test ran on the mashable.com website gave 30 seconds with 3.32 requests per second, and of course a 0% failure. Also check out other MPMs for Apache which could give results too.

This definitely requires more fine-tuning and if the page load time becomes too high then yes, there is a reason to upgrade, but don’t forget about other performance tricks such as CDNs, gzip (deflate) and others. When you’re done with Apache, proceed to MySQL fine-tuning & php configuration, there are some tricks there too to give you some extra speed & performance.

I’ll keep playing around with this server, plus I’ve purchased a 1GB (ve) this morning, so there’s quite lot of tests that have to be run. Anyways, if you’re looking for a good, high-performance VPS, then Media Temple is definitely a choice to consider. For only $30/mo you can get quite a good looking virtual server. It is more interesting than their old dedicated virtual servers (although still in beta). Cheers, and don’t forget to retweet this post ;)



W3 Total Cache with Amazon S3 and CloudFront

A few days ago Frederick Townes, author of the W3 Total Cache for WordPress has released an update to this wonderful plugin, and yes, it now fully supports Amazon S3 and CloudFront as the Content Delivery Network! This is a major one for me as I manually upload most of the static assets to my CloudFront account which may take quite a lot of time. The W3 Total Cache plugin does that for you in seconds! Post attachments, images, javascript, css.. All those could go to CloudFront in just 4 clicks. Frederick also mentioned that the upcoming update will also be surprising, which keeps me wondering.

I also tried out the other options for page and database caching. A few tests showed up that memcache is faster than APC, so that’s where I stopped at database caching. Page caching was switched to enhanced, which I believe is a new option. The site performance graph at Google Webmaster Tools shows pretty good performance for Novermber and December (very close to 1.5 seconds) although the overall average is still up at 3.5 seconds, which in terms of Google is slower than 59% of sites. This is probably caused by the force majeures in September and October. Page load time peaked at over 7 seconds there.

One more funny fact about Google’s Site performance and Page Speed tools is the “Minimize DNS lookups” section, which most of the time shows up a single entry:

The domains of the following URLs only serve one resource each. If possible, avoid the extra DNS lookups by serving these resources from existing domains: http://www.google-analytics.com/ga.js

Interesting. Perhaps I should copy that javascript file and serve it from my CDN, I wonder if that will work. Oh and then I’ll be missing all the nifty updates to Google Analytics, like the most recent one called Asynchronous Tracking – very neat by the way!



Loading jQuery from a CDN in WordPress

This may seem like an easy task to do but is quite tricky in WordPress. Using a CDN these days is very popular, cheap and helps speed up your website taking the load off your web server. I personally love Amazon CloudFront! The tips at Google Code suggest you serve all your static content from different domains, preferably ones without cookies, so CDNs are perfect.

All the problem with WordPress is script dependancies, and this applies not only to jQuery but to all the other predefined javascript libraries (prototype, scriptaculous, thickbox, see wp_enqueue_script for more info). It’s all about the handles and plugins that use jQuery will probably use the jquery handle in their dependency lists, which will automatically make WordPress include the standard jQuery from its wp-includes directory. This means that using the code:

wp_enqueue_script("my-handle", "http://s.kovshenin.com/jquery.js");

You might end up including two instances of the jQuery library, one from your CDN (s.kovshenin.com) and another one from the WordPress wp-includes directory, which will end up in a total mess. Strange though, that you cannot redefine an already known handle, such as jquery like this:

wp_enqueue_script("jquery", "http://s.kovshenin.com/jquery.js");

The javascript library will still be loaded from the default location (wp-includes on your local web server). So the right way to do it is with a little hack in your functions.php file (in case you’re doing it within your theme) or any other plugin file (in case you’re doing it within your plugin):

add_filter('script_loader_src', 'my_script_loader_src', 10, 2);
function my_script_loader_src($src, $handle) {
	if ($handle == "jquery")
		return "http://s.kovshenin.com/js/jquery.1.3.2.min.js";

	return $src;
}

Then any call to wp_enqueue_script with the jquery handler will output the correct path to your CDN version of jQuery. Oh and please, try not to use generic function names like my_script_loader_src, I used that just as an example, we don’t want any function name conflicts and can’t expect other plugin/theme developers to use non-generic names ;)