W3C Lib Using

The Cache Manager

Caching is a required part of any efficient Internet access applications as it saves bandwidth and improves access performance significantly in almost all types of accesses. The Library supports two different types of cache: The memory cache and the file cache. The two types differ in several ways which reflects their two main purposes: The memory cache is for short term storage of graphic objects whereas the file cache is for intermediate term storage of data objects. Often it is desirable to have both a memory and a file version of a cached document, so the two types do not exclude each other. The following paragraphs explains how the two caches can be maintained in the Library.

Memory Cache

The memory cache is largely managed by the application as it simply consists of keeping the graphic objects described by the HyperDoc object in memory as the user keeps requesting new documents. The HyperDoc object is only declared in the Library - the real definition is left to the application as it is for the application to handle graphic objects. The Line Mode Browser has its own definition of the HyperDoc object called HText. Before a request is processed over the net, the anchor object is searched for a HyperDoc object and a new request is issued only if this is not present or the Library explicitly has been asked to reload the document, which is described in the section Short Circuiting the Cache

As the management of the graphic object is handled by the application, it is also for the application to handle the garbage collection of the memory cache. The Line Mode Browser has a very simple memory management of how long graphic objects stay around in memory. It is determined by a constant in the GridText module and is by default set to 5 documents. This approach can be much more advanced and the memory garbage collection can be determined by the size of the graphic objects, when they expire etc., but the API is the same no matter how the garbage collector is implemented.

File Cache

The file cache is intended for intermediate term storage of documents or data objects that can not be represented by the HyperDoc object which is referenced by the HTAnchor object. As the definition of the HyperDoc object is done by the application there is no explicit rule of what graphic objects that can not be described by the HyperDoc, but often it is binary objects, like images etc.

The file cache in the Library is a very simple implementation in the sense that no intelligent garbage collection has been defined. It has been the goal to collect experience from the file cache in the W3C proxy server before an intelligent garbage collector is implemented in the Library. Currently the following functions can be used to control the cache, which is disabled by default:

HTCache_enable(), HTCache_disable(), and HTCache_isEnabled()
Use these functions to enable and disable the cache
HTCache_setRoot() and HTCache_getRoot()
Use these functions to set and get the value of the cache root
An important difference between the memory cache and the file cache is the format of the data. In the memory cache, the cached objects are graphic objects ready to be displayed to the user. In the file cache the data objects are stored along with their metainformation so that important header information like Expires, Last-Modified, Language etc. is a part of the stored object.

Mode for Cache Refresh

In situations where a cached document is known to be stale it is desired to flush any existent version of a document in either the memory cache or the file cache and perform a reload from the authoritative server. This can for example be the case if an expires header has been defined for the document when returned from the origin server. Forcing a refresh from either the memory cache, the file cache, or both can be done using the following function:
void HTRequest_setReload (HTRequest *request, HTReload mode);
HTReload HTRequest_reload (HTRequest *request);
where HTReload can be either of the values
HT_ANY_VERSION
Use any version available, either from memory cache or from local file cache
HT_MEM_REFRESH
Non-authoritative update of any version stored in memory. The new version can either come from the local file cache, a proxy cache or the network. If the request falls through to the network, the Library issues a conditional GET using a If-Modified-Since header. There are two main purposes for this mode:
  1. If the disk cache is private to exactly one application then a version stored in the local disk cache does normally not differ in time from a version in memory - they have been created at the same time. However, in a shared cache environment, the two versions can differ and this flag can be used to force an update to the latest version in the file cache.
  2. If the application wants to see the metainformation as received from the network, then the object in the file cache provides this information whereas the version in memory does not.
HT_CACHE_REFRESH
Authoritative update of any version stored in the local file cache or a proxy cache. The Library issues a conditional GET using a If-Modified-Since header and a Pragma: no-cache to ensure that the response is authoritative.
HT_FORCE_RELOAD
Unconditinal reload from the network using the Pragma: no-proxy directive in order to insure that the reload is passed to any proxy server on the way to the origin server
If the Library receives either an authoritative or non-authoritative "304 Not Modified" response upon any of the requests above, it

Handling Expired Documents

There are various ways of handling Expires header when met in a history list. Either it can be ignored all together, the user can be notified with a warning, or the document can be reloaded automatically. The Libarry supports either way, as it should be up to the user to decide. The default action is HT_EXPIRES_IGNORE, but other modes are to notify the user that a document is stale without reloading it, and to do an automatic relaod of the document. Th functions to use are in this case:
void HTAccess_setExpiresMode (HTExpiresMode mode, char *  notify);
HTExpiresMode HTAccess_expiresMode ();
where HTExpiresMode can take any of the values:
    HT_EXPIRES_IGNORE
    HT_EXPIRES_NOTIFY
    HT_EXPIRES_AUTO


Henrik Frystyk, libwww@w3.org, December 1995