Caching Gone Wrong

We all want to make Drupal run faster, but meddling with things you don’t understand can get you into big trouble. Here’s why.

Photo of Greg Harvey
Thu, 2013-02-28 12:29By greg

For the uninitiated, the term ‘caching’ in computing is the idea of saving a copy of something that usually needs to be generated, in its complete form, to avoid generating it again later. This happens on many levels in a typical computer system.

One good example is the popular APC ‘opcode’ cache available for PHP. PHP as a programming language is ‘interpreted’, which means every time you want to run an application written in PHP the computer needs to ‘compile’ that application (prepare it in a form that the computer can understand and run) before it can actually execute it. This is time-consuming, and that’s where APC comes in. APC caches (saves a copy of) the compiled PHP code so the next time someone requests that exact same piece of code, and provided it hasn’t changed, APC can intercept the request and serve the pre-compiled code without compiling it again. This is a classic demonstration of caching in computing – saving some work that was done earlier to avoid having to do that work again later.

Knowing broadly what caching means, let’s move on. At its heart Drupal has an API for caching ‘stuff’. For example, one of Drupal’s many caches is the page cache. If you have not logged into a Drupal site and you request a page, and if someone else has requested that page before you, in all likelihood the page you will see will come from Drupal’s page cache. When some previous person accessed this Drupal page, some time in the past, Drupal had to go through the motions of building the page, which is slow and time-consuming. So once it was finished Drupal saved a copy of the built page in the page cache. From that point forward, unless or until the page changes, everyone will get the pre-built page instead of making Drupal build it all over again.

Drupal modules actually cache lots of different things – the page cache is the easiest one to explain, but there are block caches, variable caches, menu caches, etc. etc. all designed to speed things up. And they do – massively. But there is another ‘problem’ with Drupal. Not unique to Drupal by any means, every content management system with a database backend that needs to build pages in real time has the same problem:

The database is a bottleneck. There is only one database and there is no (easy) way to scale the database. (It is possible, but it’s expensive, introduces risk and requires specialist knowledge.) Drupal, by default, stores the caches it generates in the database. For a high performance site this is bad. The database has enough to do already, without serving the cache as well! If we can take any load off the database then we should.

In a high performance environment!!!

Don’t get me wrong, every single Drupal site benefits from not using the database as a cache, this is a fact, but a badly configured alternative cache is worse than just leaving things alone. If your site doesn’t need to run particularly quickly, if it doesn’t have high traffic and, crucially, if you don’t know how to configure a Linux server properly... leave it alone!! (Or at worst, install the File Cache module, that’s harmless enough – http://drupal.org/project/filecache)

I’ll give you an example. We were recently called in to do a performance audit of a website that was choking at a relatively low number of concurrent users, especially given the size of server it was residing on. We battered the site with BlazeMeter to get up to traffic numbers that started to make the application sweat and then had a look at the stack traces New Relic was generating:

Oh my. The function responsible for serving the Drupal page cache is taking up almost 50% of the page load time and look at that time!! 60s, just to retrieve the cache! That is a big ouch. (That was with 50 concurrent users, but still, that’s a horrific response time.)

We dug further and here’s what we found. Drupal had been set up to use APC (the opcode caching application I mentioned at the start of the post) as the Drupal cache, not just for caching the popular PHP functions, using the APC module – http://drupal.org/project/apc

However APC itself had been left with default settings, which meant only 32M of RAM was allocated to it. 32M of RAM is not enough for Drupal as an opcode cache, never mind if you want APC to replace the Drupal cache too! So APC could not possibly handle the amount of data it was being asked to cache and was continually stowing cache data back to disk, then trying to retrieve it, then stowing something else, then retrieving another thing, etc. This is commonly known as ‘thrashing the disk’ and it’s very, very bad for performance, because the disk is the slowest component of any computer.

This wasn’t a site with a particularly high volumes of traffic. It had no specific performance requirements, nor did it have specific performance issues per se (I mean, we found other stuff that wasn’t great, but nothing the server couldn’t have handled reasonably enough). Our first action was to disable the APC module for Drupal and immediately the site performed 50% more quickly. (We later swapped APC for memcached, but that’s another story.)

The point here is this: whoever set this site up thought “I know, I’ll use APC to make this Drupal website run faster, because everyone knows APC is faster than Drupal’s default cache, right?” WRONG!! APC is quicker than Drupal’s default cache if you know how to set it up properly. If not, you’ll probably do more harm than good.

The message is simple. If you have a ‘normal’ Drupal site with ‘normal’ performance requirements and you’re not prepared to learn how to configure Linux properly, leave Drupal’s caching alone. It is fine. If you have specific performance issues and you need to change the way Drupal handles its cache to alleviate them, either get researching or call in the Drupal performance experts. Don’t just switch stuff on and walk away.

This isn’t just an APC issue. So far this year we’ve seen a badly configured Boost implementation that was caching referring pages as well as Drupal and filling the disk. Daily! We’ve also seen memory limits tweaked for MySQL with disastrous effects. And we’re still in February. If you don’t know what you’re doing, either get educated or get help from our experts.

</rant>