Joomla Behind Amazon Cloudfront

Amazon has this nifty caching service called CloudFront.  As the name implies, if puts the cloud in front of your web services.  Sounds awesome, no?  Well it is.

One way of using CloudFront is to position it in front of an Apache HTTPD server hosted at your data center of choice.  Amazon calls this "download" mode; it functions quite simply as a caching HTTP proxy server.  This method is the one we're going to talk about here.  Amazon will download content from your "origin" server and make whatever version it got available for the specified length of time.

The other method pairs CloudFront with an Amazon S3 instance.  Amazon calls this "streaming".  This allows you to secure your URLs with keys and whatnot.  This method is useful for serving videos and never-changing content.  There is no cache refresh here.  There is a cache-invalidation API available to kill cached content in an emergency but it is by no means intended to be used regularly.  Content stored here cannot be overwritten.  So if you have a new version of a file, for example, you must give it a completely new filename.  This is similar to how Netflix has implemented their streaming video service.  (Netflix sits on S3 instances but its actually too big to use CloudFront.)

Problems Arise...

So, I just move my main website to another URL and point my www CNAME at the CloudFront server and I'm done, right?  No, no, not at all really.  In fact that's the very last thing you want to do.  Now we're getting into some greasy little details most people never think about even though they should!  

Joomla goes out of its way to disable and/or sabotage HTTP level caches.  I mean, think about it, its sitting there generating content seemingly on the fly.  So sure cache should be disable!  Disable it everywhere!  Off with its head!  But that's stupid.  If you think about it for more than a few seconds you realize that almost none of your content will ever change.  And even most of the bits that do change don't absolutely have to change the very instant you update it.  So what if there is a 1 hour or 12 hour or 24 hour delay?  The challenge then becomes deciding which content needs to be refreshed often and which content needs to be refreshed infrequently.

Joomla makes cache control a royal pain in the rear.  But we have some options.  I am going to assume you are using Apache HTTPD 2.x.  I have no idea how you would do any of this on another HTTP server and I don't much care to find out.  So hopefully this still serves you as a guide.

Template Cache Changes

Joomla will allow us to modify pretty much any content from within the template itself.  So the very first thing we need is our own custom template.  I don't suggest you directly modify any of the stocks templates for this purpose.  If nothing else, copy one of the stock templates and make your changes on the copy.  However you do that is entirely up to you.  Good luck on that portion.

Locate the index.php in your template's root folder.  Open it with a text editor.  At the top of the index.php just below the _JEXEC security check add something like the following code:

/* disable base href in Joomla; 
 * base href causes the browser to bypass CloudFront and go directly to the origin server
 */
$this->setBase(null);
 
/* Enable cache control in Joomla; 
 * Removes the no-cache directive from the headers
 */
jimport('joomla.environment.response');
JResponse::allowCache(true);
 
/* Set the date and last modified date to now;
 * This causes CloudFront to set the time-to-live based on the first request date of a resource
 */
$dt = gmdate('D, d M Y H:i:s', time()).' GMT';
JResponse::setHeader('Date', $dt, true );
JResponse::setHeader('Last-Modified', $dt, true );
 
/* Cache control directives tell it what to do when the cache expires;
 * You may want to add max-age to your cache control here.
 * I will be letting Apache set that by mime type below in server headers
 */
JResponse::setHeader('Cache-Control', 'must-revalidate', true );
 

Web Server Header Changes

Apache has a nifty module called Expires that gives you a number of ways to set expiration and cache headers.  If you are on a debian-based system, such as Ubuntu, then you can enable the expires module by running the command sudo a2enmod expires from a terminal on the server.  Then reload or restart the Apache service.

Now that you have the expires module loaded you need to enable it and setup some expiration dates.  For that, open the .htaccess file in the root of your joomla installation in a text editor.  Somewhere toward the top of that file add the following lines:

# enable cache control headers
ExpiresActive on
# set default expiry time
ExpiresDefault "access plus 24 hours"
# setup expiration times for familiar mime types
# images, videos, documents, etc should rarely or never change
ExpiresByType image/jpg "access plus 1 months"
ExpiresByType image/gif "access plus 1 months"
ExpiresByType image/jpeg "access plus 1 months"
ExpiresByType image/png "access plus 1 months"
ExpiresByType application/pdf "access plus 1 months"
ExpiresByType application/x-shockwave-flash "access plus 1 months"

Conclusion

Now you are ready to point CloudFront to your content.  I'd suggest a test URL first...  It might save you some embarrassment at first.  You are bound to find little things here or there that need adjustment.  Its best to do that before you have the entire world hitting your CloudFront cache.

In summary, its not terribly difficult to modify Joomla and your general way of thinking about the life cycle of resources to accommodate CloudFront.

I will be updating this article to include more cache optimizations as I discover them.  For example, I expect that I will eventually want to set category pages to update hourly and the content pages to update daily.  That will require a plugin for Joomla I suspect.  One which may already exist.  There are a few SEF/SEO plugins that look like they may be useful in this regard.

Check back soon...

---