The Low-level Cache API

Sometimes, caching an entire rendered page doesn’t gain you very much and is, in fact, inconvenient overkill. Perhaps, for instance, your site includes a view whose results depend on several expensive queries, the results of which change at different intervals. In this case, it would not be ideal to use the full-page caching that the per-site or per-view cache strategies offer, because you wouldn’t want to cache the entire result (since some of the data changes often), but you’d still want to cache the results that rarely change.

For cases like this, Django exposes a simple, low-level cache API. You can use this API to store objects in the cache with any level of granularity you like. You can cache any Python object that can be pickled safely: strings, dictionaries, lists of model objects, and so forth. (Most common Python objects can be pickled; refer to the Python documentation for more information about pickling.)

Accessing The Cache

You can access the caches configured in the CACHES setting through a dictionary-like object: django.core.cache.caches. Repeated requests for the same alias in the same thread will return the same object.

>>> from django.core.cache import caches
>>> cache1 = caches['myalias']
>>> cache2 = caches['myalias']
>>> cache1 is cache2
True 

If the named key does not exist, InvalidCacheBackendError will be raised. To provide thread-safety, a different instance of the cache backend will be returned for each thread.

As a shortcut, the default cache is available as django.core.cache.cache:

>>> from django.core.cache import cache 

This object is equivalent to caches['default'].

Basic Usage

The basic interface is set(key, value, timeout) and get(key):

>>> cache.set('my_key', 'hello, world!', 30)
>>> cache.get('my_key')
'hello, world!'

The timeout argument is optional and defaults to the timeout argument of the appropriate backend in the CACHES setting (explained above). It’s the number of seconds the value should be stored in the cache. Passing in None for timeout will cache the value forever. A timeout of 0 won’t cache the value. If the object doesn’t exist in the cache, cache.get() returns None:

# Wait 30 seconds for 'my_key' to expire...

>>> cache.get('my_key')
None 

We advise against storing the literal value None in the cache, because you won’t be able to distinguish between your stored None value and a cache miss signified by a return value of None. cache.get() can take a default argument. This specifies which value to return if the object doesn’t exist in the cache:

>>> cache.get('my_key', 'has expired')
'has expired'

To add a key only if it doesn’t already exist, use the add() method. It takes the same parameters as set(), but it will not attempt to update the cache if the key specified is already present:

>>> cache.set('add_key', 'Initial value')
>>> cache.add('add_key', 'New value')
>>> cache.get('add_key')
'Initial value'

If you need to know whether add() stored a value in the cache, you can check the return value. It will return True if the value was stored, False otherwise. There’s also a get_many() interface that only hits the cache once. get_many() returns a dictionary with all the keys you asked for that actually exist in the cache (and haven’t expired):

>>> cache.set('a', 1)
>>> cache.set('b', 2)
>>> cache.set('c', 3)
>>> cache.get_many(['a', 'b', 'c'])
{'a': 1, 'b': 2, 'c': 3}

To set multiple values more efficiently, use set_many() to pass a dictionary of key-value pairs:

>>> cache.set_many({'a': 1, 'b': 2, 'c': 3})
>>> cache.get_many(['a', 'b', 'c'])
{'a': 1, 'b': 2, 'c': 3}

Like cache.set(), set_many() takes an optional timeout parameter. You can delete keys explicitly with delete(). This is an easy way of clearing the cache for a particular object:

>>> cache.delete('a') 

If you want to clear a bunch of keys at once, delete_many() can take a list of keys to be cleared:

>>> cache.delete_many(['a', 'b', 'c'])

Finally, if you want to delete all the keys in the cache, use cache.clear(). Be careful with this; clear() will remove everything from the cache, not just the keys set by your application.

>>> cache.clear()

You can also increment or decrement a key that already exists using the incr() or decr() methods, respectively. By default, the existing cache value will be incremented or decremented by 1. Other increment/decrement values can be specified by providing an argument to the increment/decrement call.

A ValueError will be raised if you attempt to increment or decrement a non-existent cache key:

>>> cache.set('num', 1)
>>> cache.incr('num')
2
>>> cache.incr('num', 10)
12
>>> cache.decr('num')
11
>>> cache.decr('num', 5)
6 

You can close the connection to your cache with close() if implemented by the cache backend.

>>> cache.close()

Note that for caches that don’t implement close methods close() is a no-op.

Cache Key Prefixing

If you are sharing a cache instance between servers, or between your production and development environments, it’s possible for data cached by one server to be used by another server. If the format of cached data is different between servers, this can lead to some very hard to diagnose problems.

To prevent this, Django provides the ability to prefix all cache keys used by a server. When a particular cache key is saved or retrieved, Django will automatically prefix the cache key with the value of the KEY_PREFIX cache setting. By ensuring each Django instance has a different KEY_PREFIX, you can ensure that there will be no collisions in cache values.

Cache Versioning

When you change running code that uses cached values, you may need to purge any existing cached values. The easiest way to do this is to flush the entire cache, but this can lead to the loss of cache values that are still valid and useful. Django provides a better way to target individual cache values.

Django’s cache framework has a system-wide version identifier, specified using the VERSION cache setting. The value of this setting is automatically combined with the cache prefix and the user-provided cache key to obtain the final cache key.

By default, any key request will automatically include the site default cache key version. However, the primitive cache functions all include a version argument, so you can specify a particular cache key version to set or get. For example:

# Set version 2 of a cache key
>>> cache.set('my_key', 'hello world!', version=2)
# Get the default version (assuming version=1)
>>> cache.get('my_key')
None
# Get version 2 of the same key
>>> cache.get('my_key', version=2)
'hello world!'

The version of a specific key can be incremented and decremented using the incr_version() and decr_version() methods. This enables specific keys to be bumped to a new version, leaving other keys unaffected. Continuing our previous example:

# Increment the version of 'my_key'
>>> cache.incr_version('my_key')
# The default version still isn't available
>>> cache.get('my_key')
None
# Version 2 isn't available, either
>>> cache.get('my_key', version=2)
None
# But version 3 *is* available
>>> cache.get('my_key', version=3)
'hello world!'

Cache Key Transformation

As described in the previous two sections, the cache key provided by a user is not used verbatim – it is combined with the cache prefix and key version to provide a final cache key. By default, the three parts are joined using colons to produce a final string:

def make_key(key, key_prefix, version):
    return ':'.join([key_prefix, str(version), key])

If you want to combine the parts in different ways, or apply other processing to the final key (e.g. taking a hash digest of the key parts), you can provide a custom key function. The KEY_FUNCTION cache setting specifies a dotted-path to a function matching the prototype of make_key() above. If provided,
this custom key function will be used instead of the default key combining function.

Cache Key Warnings

Memcached, the most commonly-used production cache backend, does not allow cache keys longer than 250 characters or containing whitespace or control characters, and using such keys will cause an exception. To encourage cache-portable code and minimize unpleasant surprises, the other built-in cache backends issue a warning (django.core.cache.backends.base.CacheKeyWarning) if a key is used that would cause an error on memcached.

If you are using a production backend that can accept a wider range of keys (a custom backend, or one of the non-memcached built-in backends), and want to use this wider range without warnings, you can silence CacheKeyWarning with this code in the management module of one of your INSTALLED_APPS:

import warnings

from django.core.cache import CacheKeyWarning

warnings.simplefilter("ignore", CacheKeyWarning)

If you want to instead provide custom key validation logic for one of the built-in backends, you can subclass it, override just the validate_key method, and follow the instructions for using a custom cache backend.

For instance, to do this for the locmem backend, put this code in a module:

from django.core.cache.backends.locmem import LocMemCache

class CustomLocMemCache(LocMemCache):
    def validate_key(self, key):
        # Custom validation, raising exceptions or warnings as needed.
        # ...

… and use the dotted Python path to this class in the BACKEND portion of your CACHES setting.

Downstream Caches

So far, this chapter has focused on caching your own data. But another type of caching is relevant to Web development, too: caching performed by downstream caches. These are systems that cache pages for users even before the request reaches your Web site. Here are a few examples of downstream caches:

  • Your ISP may cache certain pages, so if you requested a page from http://example.com/,
    your ISP would send you the page without having to access example.com directly.
    The maintainers of example.com have no knowledge of this caching; the ISP sits between example.com and your Web browser, handling all of the caching transparently.
  • Your Django Web site may sit behind a proxy cache, such as Squid Web Proxy Cache,
    that caches pages for performance. In this case, each request first would be handled by the proxy, and it would be passed to your application only if needed.
  • Your Web browser caches pages, too. If a Web page sends out the appropriate headers, your browser will use the local cached copy for subsequent requests to that page, without even contacting the Web page again to see whether it has changed.

Downstream caching is a nice efficiency boost, but there’s a danger to it: Many Web pages’ contents differ based on authentication and a host of other variables, and cache systems that blindly save pages based purely on URLs could expose incorrect or sensitive data to subsequent visitors to those pages.

For example, say you operate a Web email system, and the contents of the inbox page obviously depend on which user is logged in. If an ISP blindly cached your site, then the first user who logged in through that ISP would have their user-specific inbox page cached for subsequent visitors to the site. That’s not cool.

Fortunately, HTTP provides a solution to this problem. A number of HTTP headers exist to instruct downstream caches to differ their cache contents depending on designated variables, and to tell caching mechanisms not to cache particular pages. We’ll look at some of these headers in the sections that follow.

Using Vary Headers

The Vary header defines which request headers a cache mechanism should take into account when building its cache key. For example, if the contents of a Web page depend on a user’s language preference, the page is said to vary on language. By default, Django’s cache system creates its cache keys using the requested fully-qualified URL – e.g., http://www.example.com/stories/2005/?order_by=author.

This means every request to that URL will use the same cached version, regardless of user-agent differences such as cookies or language preferences. However, if this page produces different content based on some difference in request headers – such as a cookie, or a language, or a user-agent – you’ll need to use the Vary header to tell caching mechanisms that the page output depends on those things.

To do this in Django, use the convenient django.views.decorators.vary.vary_on_headers() view decorator, like so:

from django.views.decorators.vary import vary_on_headers

@vary_on_headers('User-Agent')
def my_view(request):
    # ...

In this case, a caching mechanism (such as Django’s own cache middleware) will cache a separate version of the page for each unique user-agent. The advantage to using the vary_on_headers decorator rather than manually setting the Vary header (using something like “response['Vary'] = 'user-agent'”) is that the decorator adds to the Vary header (which may already exist), rather than setting it from scratch and potentially overriding anything that was already in there. You can pass multiple headers to vary_on_headers():

@vary_on_headers('User-Agent', 'Cookie')
def my_view(request):
    # ...

This tells downstream caches to vary on both,
which means each combination of user-agent and cookie will get its own cache value. For example, a request with the user-agent “Mozilla” and the cookie value “foo=bar” will be considered different from a request with the user-agent “Mozilla” and the cookie value “foo=ham”. Because varying on cookie is so common, there’s a django.views.decorators.vary.vary_on_cookie() decorator. These two views are equivalent:

@vary_on_cookie
def my_view(request):
    # ...

@vary_on_headers('Cookie')
def my_view(request):
    # ...

The headers you pass to vary_on_headers are not case sensitive; “User-Agent” is the same thing as “user-agent”. You can also use a helper function, django.utils.cache.patch_vary_headers(), directly. This function sets, or adds to, the Vary header. For example:

from django.utils.cache import patch_vary_headers

def my_view(request):
    # ...
    response = render_to_response('template_name', context)
    patch_vary_headers(response, ['Cookie'])
    return response 

patch_vary_headers takes an HttpResponse instance as its first argument and a list/tuple of case-insensitive header names as its second argument. For more on Vary headers, see the official Vary specification.

Controlling Cache: Using Other Headers

Other problems with caching are the privacy of data and the question of where data should be stored in a cascade of caches. A user usually faces two kinds of caches: their own browser cache (a private cache) and their provider’s cache (a public cache).

A public cache is used by multiple users and controlled by someone else. This poses problems with sensitive data – you don’t want, say, your bank account number stored in a public cache. So Web applications need a way to tell caches which data is private and which is public.

The solution is to indicate a page’s cache should be private. To do this in Django, use the cache_control view decorator. Example:

from django.views.decorators.cache import cache_control

@cache_control(private=True)
def my_view(request):
    # ...

This decorator takes care of sending out the appropriate HTTP header behind the scenes. Note that the cache control settings private and public are mutually exclusive. The decorator ensures that the public directive is removed if private should be set (and vice versa).

An example use of the two directives would be a blog site that offers both private and public entries. Public entries may be cached on any shared cache. The following code uses django.utils.cache.patch_cache_control(), the manual way to modify the cache control header (it is internally called by the cache_control decorator):

from django.views.decorators.cache import patch_cache_control
from django.views.decorators.vary import vary_on_cookie

@vary_on_cookie
def list_blog_entries_view(request):
    if request.user.is_anonymous():
        response = render_only_public_entries()
        patch_cache_control(response, public=True)
    else:
        response = render_private_and_public_entries(request.user)
        patch_cache_control(response, private=True)

    return response 

There are a few other ways to control cache parameters. For example, HTTP allows applications to do the following:

  • Define the maximum time a page should be cached.
  • Specify whether a cache should always check for newer versions, only delivering the cached content when there are no changes. (Some caches might deliver cached content even if the server page changed,simply because the cache copy isn’t yet expired.)

In Django, use the cache_control view decorator to specify these cache parameters. In this example, cache_control tells caches to revalidate the cache on every access and to store cached versions for, at most, 3,600 seconds:

from django.views.decorators.cache import cache_control

@cache_control(must_revalidate=True, max_age=3600)
def my_view(request):
    # ...

Any valid Cache-Control HTTP directive is valid in cache_control(). Here’s a full list:

  • public=True
  • private=True
  • no_cache=True
  • no_transform=True
  • must_revalidate=True
  • proxy_revalidate=True
  • max_age=num_seconds
  • s_maxage=num_seconds

For explanation of Cache-Control HTTP directives, see the Cache-Control specification. (Note that the caching middleware already sets the cache header’s max-age with the value of the CACHE_MIDDLEWARE_SECONDS setting. If you use a custom max_age in a cache_control decorator, the decorator will take precedence, and the header values will be merged correctly.)

If you want to use headers to disable caching altogether, django.views.decorators.cache.never_cache is a view decorator that adds headers to ensure the response won’t be cached by browsers or other caches. For example:

from django.views.decorators.cache import never_cache

@never_cache
def myview(request):
    # ...

What’s Next?

In the next chapter we will be looking at Django’s middleware.