Vhosts: Proxy Caching

This document briefly describes vhost caching.

Description

CAE's vhosts now have the ability to optionally enable caching of vhost content at the frontend proxy via mod_mem_cache or mod_disk_cache or both (in that order) using the vhost control panel tools at https://my.cae.wisc.edu/tools/account/vhosts/. Under some circumstances this can help reduce page load time. However, there are many details to be concerned with when enabling this features, so it is not currently done by default.

It should be noted that much of this discussion actual pertains to setting appropriate headers in the responses so that any cache, whether that be at the end client's browser, our frontend proxy servers, or any where in between that path, can cache content as appropriate.

A full discussion of these matters are beyond the scope of this document, however we will give a brief overview.

Typical Scenario

In most browsing scenarios, such as following hyperlinks, a client's browser will first use its local cache to render a page. If it is lacking any resources in its local cache, or they are deemed stale due to old or lacking Expires response header information, then the client will make a conditional request for the resource using If-Modified-Since and/or If-None-Match request headers which reference the resources last Last-Modified and Etag response headers respectively. Any proxies that receive this request along the path will perform a similar operation, first checking their local cache to see if they have a more up to date version that is capable of being delivered to the client, or else passing it along to the backend application server. If the application server, which has authoritative knowledge regarding the cache validity of the request, determines that the resource the client or proxies have is up to date, then it will return a much smaller 304 Not Modified response, else it will return the full and updated content. At this point each proxy and client along the return path can choose to cache the response based on updated Expires and Etag information (if present) as well as a number of other Cache-Control directives (see Security below).

Refreshing the Cache

Usually, when a client's browser refreshes a page, reloads a page (eg: Shift-reload), or loads a URL directly in the location bar, the browser will send an extra Cache-Control: max-age=0 and/or Cache-Control: no-cache request header to indiciate to all proxies and backend server along the path that it wants to receive the most up to date copy of the resource. This allows all clients and developers a very simple and standard way to make sure that they're not being burdened by out of date caching information.

Caveats

For caching to occurr at all, the application must provide appropriate Cache-Control, Expires, Last-Modified, Etag, etc. response headers, or else the client or proxy has no clues to check the current validity of the cached content on the next request.

Appropriate use of these headers allows clients, as well as an intermediary caches or proxies the client may be communicating through (including, but not limited to our own), to cache the content locally to themsevles.

In the case of an intermediary proxy cache, the cache is treated as shared, and the next request for the content, by a different client, may be able to be served by the cache on the proxy rather than contacting the backend application server directly. For busy sites serving lots of the same content this can have an important impact on the perceived responsiveness of the site.

It should be noted that we say may be able to be served from the cache since there are many factors such as the content negotiation Vary response headers that impact whether or not one client can recieve the same response as another.

Additionally, these headers should not be overused or else dynamically generated content will not see updates on clients in a timely manner.

By default dynamic content produced by PHP includes response headers to instruct the client and any proxies not to cache anything, though many PHP application frameworks now set their headers via calls to header().

Examples

Although the determination of when to cache dynamic content is almost certainly best done in the application code itself (eg: PHP), the following are some simple ways of adding or modifying appropriate Expires response headers via Apache's .htaccess rules (NOTE: you must have enabled the mod_expires Apache module for your vhost to do this).

In this example we set Expires response headers for content based on its type.

<IfModule mod_expires.c>
	ExpiresActive On

	# By default cache everything for an hour.
	ExpiresDefault "access plus 1 hour"

	# Cache basic page style resources longer.
	ExpiresByType image/* "access plus 1 day"
	ExpiresByType text/css "access plus 1 day"
	ExpiresByType text/javascript "access plus 1 day"
	ExpiresByType application/javascript "access plus 1 day"

	# Assuming html is generated by PHP dynamically, don't cache it for
	# very long.
	ExpiresByType text/html "access plus 10 second"
	ExpiresByType text/xml "access plus 10 second"
</IfModule>

In this example we set Expires response headers for content based on its file extension. Note that these rules match files on the filesystem, not URLs mapped via something like mod_rewrite.

<IfModule mod_expires.c>
	# Cache image files and style sheets longer.
	<FilesMatch "\.(jpe?g|png|gif)$">
		ExpiresActive On
		ExpiresDefault "access plus 1 day"
	</FilesMatch>
	<FilesMatch "\.(css|js)$">
		ExpiresActive On
		ExpiresDefault "access plus 1 day"
	</FilesMatch>

	# Cache documents for a relatively short period.
	<FilesMatch "\.(pdf|odt)$">
		ExpiresActive On
		ExpiresDefault "access plus 1 hour"
	</FilesMatch>

	# Assume .html files are written statically and don't change very
	# often, compared with dynamic content such as .php files.
	<FilesMatch "\.(html|xml)$">
		ExpiresActive On
		ExpiresDefault "access plus 1 hour"
	</FilesMatch>

	# Leave all other files without any additional Expires headers.
</IfModule>

Arguments can be made for either scheme so it really depends on your application as to which one to use.

Security

Since content can potentially be served from a cache without contacting the usual backend, all security restrictions that the backend would normally employ are unavailable. This might allow subsequent requests to retrive content from the cache, thereby bypassing the authentication/authorization restrictions that might have been in place on the backend. As such, it is the application's responsibility to set Cache-Control: private response headers as appropriate.

For instance, you could add the following in your .htaccess rules. NOTE: You must have enabled the mod_headers Apache module for your vhost to use this.

<IfModule mod_headers.c>
	Header merge Cache-Control private
</IfModule>

In our testing, this comment applies to Basic auth, Shibboleth, application handled session cookie, etc.

Similar to the discussion regarding mod_expires directives above, this determination is usually best handled in the application and amended via .htaccess rules only as necessary or where updating the application is infeasible.

If you are concerned about not getting this detail right, then it's probably best not to enable the proxy caching feature for your vhost, though it should be noted that all of these concerns will still apply for any other shared proxy sitting in between the client and the backend application server.

See Also




Keywords:mod_disk_cache proxy cache caching headers expires mod_expires mod_headers Cache-Control vhost apache   Doc ID:34539
Owner:Brian K.Group:Computer-Aided Engineering
Created:2013-10-15 15:35 CDTUpdated:2016-03-24 11:31 CDT
Sites:Computer-Aided Engineering
Feedback:  1   0