Vhosts: Proxy Caching
This document briefly describes vhost caching.
CAE's vhosts now have the ability to optionally enable caching of vhost content
at the frontend proxy via
mod_disk_cache or both (in that order) using the vhost control
panel tools at https://my.cae.wisc.edu/tools/account/vhosts/.
Under some circumstances this can help reduce page load time. However, there
are many details to be concerned with when enabling this features, so it is not
currently done by default.
It should be noted that much of this discussion actual pertains to setting appropriate headers in the responses so that any cache, whether that be at the end client's browser, our frontend proxy servers, or any where in between that path, can cache content as appropriate.
A full discussion of these matters are beyond the scope of this document, however we will give a brief overview.
In most browsing scenarios, such as following hyperlinks, a client's browser
will first use its local cache to render a page. If it is lacking any
resources in its local cache, or they are deemed stale due to old or lacking
Expires response header information, then the client will make a
conditional request for the resource using
If-None-Match request headers which reference the resources last
Etag response headers respectively.
Any proxies that receive this request along the path will perform a similar
operation, first checking their local cache to see if they have a more up to
date version that is capable of being delivered to the client, or else passing
it along to the backend application server. If the application server, which
has authoritative knowledge regarding the cache validity of the request, determines
that the resource the client or proxies have is up to date, then it will return a
304 Not Modified response, else it will return the
full and updated content. At this point each proxy and client along the return
path can choose to cache the response based on updated
Etag information (if present) as well as a number of other
Cache-Control directives (see Security below).
Refreshing the Cache
Usually, when a client's browser refreshes a page, reloads a page (eg:
Shift-reload), or loads a URL directly in the location bar, the
browser will send an extra
Cache-Control: max-age=0 and/or
Cache-Control: no-cache request header to indiciate to all proxies
and backend server along the path that it wants to receive the most up to date
copy of the resource. This allows all clients and developers a very simple and
standard way to make sure that they're not being burdened by out of date
For caching to occurr at all, the application must provide appropriate
Etag, etc. response headers, or else the client or proxy has no clues to
check the current validity of the cached content on the next request.
Appropriate use of these headers allows clients, as well as an intermediary caches or proxies the client may be communicating through (including, but not limited to our own), to cache the content locally to themsevles.
In the case of an intermediary proxy cache, the cache is treated as shared, and the next request for the content, by a different client, may be able to be served by the cache on the proxy rather than contacting the backend application server directly. For busy sites serving lots of the same content this can have an important impact on the perceived responsiveness of the site.
It should be noted that we say may be able to be served from the cache
since there are many factors such as the content negotiation
response headers that impact whether or not one client can recieve the same
response as another.
Additionally, these headers should not be overused or else dynamically generated content will not see updates on clients in a timely manner.
By default dynamic content produced by PHP includes response headers to instruct the
client and any proxies not to cache anything, though many PHP
application frameworks now set their headers via calls to
Although the determination of when to cache dynamic content is almost certainly
best done in the application code itself (eg: PHP), the following are some
simple ways of adding or modifying appropriate
headers via Apache's
.htaccess rules (NOTE: you must have enabled
mod_expires Apache module for your vhost to do this).
In this example we set
Expires response headers for content based on its type.
In this example we set
Expires response headers for content based
on its file extension. Note that these rules match files on the filesystem,
not URLs mapped via something like
<IfModule mod_expires.c> # Cache image files and style sheets longer. <FilesMatch "\.(jpe?g|png|gif)$"> ExpiresActive On ExpiresDefault "access plus 1 day" </FilesMatch> <FilesMatch "\.(css|js)$"> ExpiresActive On ExpiresDefault "access plus 1 day" </FilesMatch> # Cache documents for a relatively short period. <FilesMatch "\.(pdf|odt)$"> ExpiresActive On ExpiresDefault "access plus 1 hour" </FilesMatch> # Assume .html files are written statically and don't change very # often, compared with dynamic content such as .php files. <FilesMatch "\.(html|xml)$"> ExpiresActive On ExpiresDefault "access plus 1 hour" </FilesMatch> # Leave all other files without any additional Expires headers. </IfModule>
Arguments can be made for either scheme so it really depends on your application as to which one to use.
Since content can potentially be served from a cache without contacting the
usual backend, all security restrictions that the backend would normally employ
are unavailable. This might allow subsequent requests to retrive content from
the cache, thereby bypassing the authentication/authorization restrictions that
might have been in place on the backend. As such, it is the application's
responsibility to set
Cache-Control: private response headers as appropriate.
For instance, you could add the following in your
NOTE: You must have enabled the
mod_headers Apache module for your vhost to use this.
<IfModule mod_headers.c> Header merge Cache-Control private </IfModule>
In our testing, this comment applies to Basic auth, Shibboleth, application handled session cookie, etc.
Similar to the discussion regarding
mod_expires directives above,
this determination is usually best handled in the application and amended via
.htaccess rules only as necessary or where updating the
application is infeasible.
If you are concerned about not getting this detail right, then it's probably best not to enable the proxy caching feature for your vhost, though it should be noted that all of these concerns will still apply for any other shared proxy sitting in between the client and the backend application server.