Page Caching

How to get the best from HTML Caching.

By default, Foswiki pages are delivered to the browser using HTML, which is regenerated from the data stored in Foswiki each time the page is requested. Page caching is the process of storing (caching) this HTML on the server, as and when it is generated, so that the next time someone asks for the same page it can be delivered from the cache instead of having to be re-rendered from scratch. Page caching can significantly improve the performance of a Foswiki site, especially where most requests are views (this is normal on most sites). Since version 1.1, Foswiki has built-in page caching.

For most Foswiki installations you can enable page caching by simply selecting the {Cache}{Enabled} option and connecting it to a database backend in the Tuning section of configure. However to get the most out of the cache, you need to understand what it is doing, and may want to tune the cache options for your site.

When Foswiki generates a page, it may refer to a lot of content stored in the Foswiki database. For example, a topic 'MyTopic' may be rendered differently depending on a setting in the web's WebPreferences, which is itself a topic. If WebPreferences were to be edited, then the cache of MyTopic would have to be regenerated, because the setting it depends on might have been changed during the edit. For an average page there are a large number of these dependencies that have to be tracked. This tracking has to be fast, so the cache uses a database engine to record the dependencies and save the rendered pages.

Using the Cache

By its very nature, a cache should be invisible to the end user: editing and viewing a page should always deliver up-to-date content. To do so Foswiki's page cache tracks the ingredients used to render the page and keeps track of any change of them.

For example, a cached topic that includes another topic will be refreshed automatically whenever the included topic changes.

Foswiki's page cache performs a so called deep dependency tracking of every page it caches and stores this dependency graph into its meta-data DB. Based on this information, cache entries of topics are invalidated automatically.

While this works out for most normal wiki use cases, cache maintenance is not able to fully track all required dependencies of a page because some of them are either out of scope for Foswiki or simply not available a priori.

Cache Expiry

Normally a page will be cached as long as it is valid, that is no newer version needs to be rendered. In addition to this basic strategy an expiry date or timespan can be added to a page specifying how long the cache entry is valid. When this timespan expires will a new page be created from scratch. This renewed page will then replace the old one in the cache.

The expiry time can be specified using the CACHEEXPIRE preference variable or the cache_expire url parameter. As usual, this variable can be set per topic, per web, or per user as you like. By default the expiry time falls back to infinite so that pages are invalidated only by the normal displacement strategy. Note that whichever constraint - either the expiry time or the refreshing strategy - holds first will the page be refreshed.

The CACHEEXPIRE value can be specified using either absolute or relative values. Most of the time the relative timespan makes most sense in which case the page is cached for that time and will be recomputed afterwards.

Examples for valid cache expiry values are:

  • +30s: 30 seconds from now
  • +10m: ten minutes from now
  • +1h: one hour from now
  • +1d: one day from now
  • +3M: in three months
  • +10y: in ten years time
  • Thursday, 25-Apr-1999 00:40:33 GMT at the indicated time & date

Loading parts of a page asynchronously

When a page is made up of a lot of independently computed fragments, like on a dashboard, then you might want to make use of JQueryLoader to load these fragments asynchronously. While this generally can improve the load time of the page itself by delaying computation of fragments until required, it also helps caching.

Firstly, the main page will be cached with asynchronously computed fragments taken out. As a consequence the main page will have less dependencies on additional content and is less likely to be invalidated from the page cache.

Second, each fragment requested via an ajax call and inserted into the main page dynamically is computed and cached separately, each with its own set of dependencies. So when one fragment's dependency is fired (one of its ingredients has been updated), then only this single fragment and not the complete page nor the other fragments need recomputation.

The general pattern for asynchronously loaded page fragments looks like this:

<verbatim class="jqLoader {section:'name_of_section'}">
%STARTSECTION{"name_of_section"}%
%SEARCH{
...
}%
%ENDSECTION{"name_of_section"}%
</verbatim>

See JQueryLoader for more information.

Dirty Areas

Sometimes caching complete pages is too coarse-grained. There may be parts of a page that change frequently, while the rest of the same page never changes. In this case the author of the topic can tell the cache not to save certain parts of it, called dirty areas. Dirty areas are marked in the topic using the <dirtyarea>...</dirtyarea> tags. Foswiki markup within a dirty area is stored unrendered in the cache, and only rendered when the topic is served from the cache. For example,

This page was cached at %SERVERTIME%.

<dirtyarea> The current time of your request is %SERVERTIME%. </dirtyarea>

ensures that the cache will never store the SERVERTIME expression inside the <dirtyarea> section, forcing it to be re-computed every time the page is served. So both times will diverge the longer the page stays in cache.

Controlling which pages to cache and which ones not

When enabling the page cache in configure by switching on {Cache}{Enabled} every page is cached. This might be too aggressive for some topics or webs that you want to exclude from page caching all together.

First choice is to try and figure out how to make a page cacheable at least partial using dirty areas. If that doesn't work out or is simply to complicated to get right, a page or a complete web can be flagged to be non-cacheable all together. That way they won't ever end up in the cache store.

This is achieved by using the preference variable CACHEABLE in a topic, the SitePreferences or any WebPreferences like this:

    * Set CACHEABLE = off

Adding it to a single topic will prevent this page from being cached. Adding it to WebPreferences will disable the page cache for the complete web. Adding it to SitePreferences will switch off the page cache for the complete site. This makes sense only when switching on the page cache explicitly for a subset of pages and webs again using

    * Set CACHEABLE = on

Refreshing the cache

Sometimes it is necessary to force a cache refresh manually. To support this, Foswiki provides the refresh url parameter, which works with all scripts that produce cacheable output. You may force the current topic to be recomputed by adding refresh=on or refresh=cache to an url.

<a href="%SCRIPTURLPATH{"view"}%/%WEB%/%TOPIC%?refresh=cache">Refresh this page.</a>

The complete cache for all topics can be cleared as well using refresh=all. This action is restricted to the admin super user, and to members of the AdminGroup.

TIP The cache should be refreshed after installing a Foswiki upgrade, or after installing or upgrading any plugin that would change the output of macros.

Command line cache refresh

If you have topics that are updated externally to foswiki, which do not use the recomended Foswiki::Func::saveTopic API, you can (carefully) call update from the commandline:

eg. sudo -u www ./view /Tasks/Item12035  -refresh=cache.

NOTE it is very important that you run this as the same user that the web server uses to access foswiki (and the cache).

see CommandAndCGIScripts

Caching SEARCH results

When you enable the Foswiki page cache, all SEARCH results will automatically be cached as part of the process. As a consequence a SEARCH on a page will not be performed again as long as this page is cached.

There are a few things to keep in mind when caching a page with a SEARCH:

  • If a new topic is created that the SEARCH should find, it will not be listed until after the page the SEARCH is on is recomputed.
  • If content in an existing topic is changed so it will start being found by the SEARCH, it will not be listed.
  • If the content of an already found topic changes the cached page will be updated automatically.

To avoid this effect you can

  • Specify a CACHEEXPIRE timespan after which the SEARCH is performed again; in the meantime the same cache results will be displayed
  • Put the SEARCH into an #jqloader[asynchronously loaded fragment]]
  • Put the SEARCH inside a dirty area
  • List the topic with the SEARCH in WEBDEPENDENCIES Topics in this list will be refreshed whenever a topic in this web is editted.
  • Add a refresh button to the page to allow users to manually refresh the page.

Note that page caching doesn't only affect SEARCH. Every other information that changes without the dependency tracker noticing it will not be rendered up-to-date on a cached page. Other candidates are WEBLIST, as well as any plugin reading external information, for instance results of an SQL, SOAP or LDAP query being rendered on a Foswiki topic.

This class of SEARCH-like expressions all show the same caching artefact: results are not 100% up-to-date. At the same time these expressions are the candidates that benefit the most from caching as their computation are the most expensive ones.

Caching REST results (Developers)

If a function called to service a rest script returns it's result using return, then the result will be cached. If the function returns an empty result or undef, then the function is totally responsible for its own output, and there is no caching.

Configuring the Cache

Choosing a database engine

The database used by the page cache is selected by the {Cache}{Implementation} setting in configure. It offers a set of standard connectors, i.e. using the perl DBI interface, to store meta data about cached pages in a database. For larger sites in production use, you should choose either Foswiki::PageCache::DBI::MySQL or Foswiki::PageCache::DBI::PostgreSQL. For smaller sites and personal wikis, Foswiki::PageCache::DBI::SQLite is approriate as well.

SQLite

SQLite can be used with no special configuration. The page cache will be indexed by working/sqllite.db, and topics are cached in the working/cache directory. There is one configuration parameter that can be changed:
  • {Cache}{DBI}{SQLite}{Filename} = '$Foswiki::cfg{WorkingDir}/sqlite.db';

The SQLite database will grow over time and should be cleaned on occasion to reclaim space and defragment the data. Clearing the cache with the ?refresh=all url option does not reclaim space.

Reclaim space by using the sqlite3 vacuum command. Be sure to run it as the web-server user:

 cd /path/to/foswiki/working
 sudo -u www-data sqlite3 sqlite.db "VACUUM;"

ALERT! Caution: This can run a long time and can be disruptive to the wiki.

You can also manually clear the cache by:
  • Removing the working/sqlite.db file
  • and removing all files in working/cache directory

MySQL

Before Foswiki is able to use MySQL for caching, several conditions must be met:
  • The mysql server and the perl DBI / DBD modules for MySQL must be installed.
  • A database, must be created.
  • The user needs to be created, and appropriate permissions granted.
       foswikihost: ~ $ mysql -u root -p
       Enter password:  ........
       Welcome to the MySQL monitor.  Commands end with ; or \g.
    
       mysql> create database foswiki_db;
       Query OK, 1 row affected (0.00 sec)
    
       mysql>  grant usage on *.* to foswiki@localhost identified by 'foswikipass';
       Query OK, 0 rows affected (0.00 sec)
    
       mysql> grant all privileges on foswiki_db.* to foswiki@localhost ;
       Query OK, 0 rows affected (0.00 sec)
    
       mysql> FLUSH PRIVILEGES;
       Query OK, 0 rows affected (0.00 sec)
    
       mysql> exit
       Bye
    
       
  • Foswiki must be configured with the database, user and password information. For example:
    • {Cache}{DBI}{MySQL}{Database} = 'foswiki_db';
    • {Cache}{DBI}{MySQL}{Host} = 'localhost';
    • {Cache}{DBI}{MySQL}{Password} = 'foswikipass';
    • {Cache}{DBI}{MySQL}{Port} = ”;
    • {Cache}{DBI}{MySQL}{Username} = 'foswiki';

PostgreSQL

Before Foswiki is able to use PostgreSQL for caching, several conditions must be met:
  • The PostgreSQL server and the perl DBI / DBD modules for PostgreSQL must be installed.
  • A database, must be created.
  • The user needs to be created, and appropriate permissions granted.
       user@server: sudo su -- postgres
       postgres@server:/root$ createuser foswiki --pwprompt
    
       postgres@server:/root$ psql
       psql (9.4.4)
       Type "help" for help.
    
       postgres=# alter user foswiki with password 'foswikipass';
       ALTER ROLE
       postgres=# create database foswiki;
       CREATE DATABASE
    
       \q
       
  • Foswiki must be configured with the database, user and password information. For example:
    • {Cache}{DBI}{PostgreSQL}{Database} = 'foswiki';
    • {Cache}{DBI}{PostgreSQL}{Host} = 'localhost';
    • {Cache}{DBI}{PostgreSQL}{Password} = 'foswikipass';
    • {Cache}{DBI}{PostgreSQL}{Port} = ”;
    • {Cache}{DBI}{PostgreSQL}{Username} = 'foswiki';

See the Foswiki::PageCache documentation for more detailed information.

Tuning

A number of options exist to help you tune the cache so that it works better for your specific content.

Excluding topics from the dependency check

Some topics change so infrequently that it's worth telling the cache about them so they can be ignored when computing dependencies. The {Cache}{DependencyFilter} EXPERT setting in configure defines a regular expression that, when matched by the name of a topic, exclude the topic from dependencies in the cache. This helps to reduce the number of topics a page depends on, but at a higher risk that you may experience unwanted caching effects. These effects may be seen where a cached page is delivered that does not reflect changes made in a topic that matched the filter.

The default value for {Cache}{DependencyFilter} is set to
System\..*|Trash\..*|TWiki\..*

So this will disable any dependencies to topics that match this expression.
ALERT! Caution: by default dependency tracking is switched off for all of the System web. Changing topics in the System web will not invalidate any cache entries. This is because the System web is considered read-only content.

Add topics to the dependency check

Some other topics change automatically whenever any topic in the same web is changed. These are usually topics that are generated by searches over the content. You can list the names of these topics in the {Cache}{WebDependencies} EXPERT setting in configure. Whenever any topic in the same web is saved, the cached versions of the listed topics are removed from the cache as well.

Web dependencies can also be specified using the WEBDEPENDENCIES web preference setting. If WEBDEPENDENCIES is set, it overrides the setting of {Cache}{WebDependencies} for that web. To use the preference set it to a comma separated list of topics (no wildcards are allowed). Example,

The default value for {Cache}{WebDependencies} is set to WebRss, WebAtom, WebTopicList, WebIndex, WebSearch and WebSearchAdvanced.

This means, that for instance WebRss is a page that is cached the first time it is accessed, so every further request is speeded up. When one of the topics in the web it is in changes, a new rss feed is provided containing the most recent update as expected.

Example:
    * Set WEBDEPENDENCIES = WebRss, WebAtom, WebTopicList, WebIndex,
      WebSearch, WebSearchAdvanced, MyOwnProtectedTopic

This will add MyOwnProtectedTopic to the set of default topics to be refreshed on every change in the web it is in.

Controlling growth of topic dependencies table

By default, Foswiki will track every linked topic mentioned on a page as a dependency of that page. This can result in very rapid growth of the cache dependencies table, especially when the WebLeftBar uses the WEBLIST macro, or links to a large number of pages. These are tracked so that when a topic is added or removed, the referencing pages will be refreshed to get the correct link to the topic.

An optional (expert) configuration parameter {Cache}{TrackInternalLinks} can be set to disable tracking these links for guests or for everyone. For sites exposed to the Internet and search bot traffic, change this setting from on to authenticated to ignore links in topics viewed by guests. The setting off will disable all topic reference tracking, and is not recommended.

HTTP Compression

Modern browsers can decompress content encoded using gzip compression. You can save a lot of network bandwidth by compressing pages, at the expense of some server performance spent on compressing the pages. Foswiki supports content compression when the {HttpCompress} option is enabled in configure, even when the cache is not enabled.

Content compression makes even more sense when used with the cache, as the cache by default stores pages on disk already compressed, so serving a compressed page is very fast. However if you don't have Compress::Zlib, or you're not using {HttpCompress} and do not care about disk space, you can disable this compression by disabling the {Cache}{Compress} EXPERT option in configure.

Note that only pages without any dirty areas will be compressed. Any other page will be transmitted uncompressed. Note also, that compressing a page requires computation on both sides, the server performing the compression and the browser to uncompress the received HTML before rendering it. So only use page compression when network bandwidth is a bottleneck. Do not use compression on fast network connections that can easily deliver the same content uncompressed.

Technical notes

Working with browsers

The Foswiki cache works closely with the browser to optimise the use of network bandwidth. Whenever a page is cached, Foswiki also stores its etag, computed based on the time it was added to the server cache. The Etag and Last-Modified headers will be added to the response. So whenever a page is requested again, using a request that includes the If-None-Match and/or If-Modified-Since request headers, then Foswiki will answer with a 304 - Not modified message and an empty body. This tells the browser to reuse the page stored in its own client-side cache.

Identifying cache entries

Depending on the values of a number of different parameters, a generated page may have very different output. For example, depending on the user who is logged in, a page might be displayed very differently.

The cache thus has to consider the context a page has been rendered within. The context holds all sort of environment information. This information is captured while storing pages as well as identifying them later on.

The context consists of:

  • The server serving the request (HTTP_HOST)
  • The port number of the server serving the request (HTTP_PORT)
  • The language of the current session, if any
  • All session parameters EXCEPT:
    • All those starting with an underscore
    • VALIDATION
    • REMEMBER
    • FOSWIKISTRIKEONE.*
    • VALID_ACTIONS.*
    • BREADCRUMB_TRAIL
    • DGP_hash
  • All the request parameters EXCEPT:
    • All those starting with an underscore
    • refresh
    • foswiki_redirect_cache
    • logout
    • topic
    • cache_ignore
    • cache_expire
    • request parameters listed in the cache_ignore request parameter

Note that this also means that users will never share cached pages among each other. This separation is required to prevent users from seeing pages that have been rendered with different access rights to the content than their own. For instance, a SEARCH will show rather differnt results based on the concrete clearance on the content. When such a page is cached it must only be retrieved for the identical user it was generated for the first time.

Software requirements

  • {HttpCompress} depend on Compress::Zlib,
  • DBD::Pg to connect to a PostgreSQL database or
  • DBD::mysql to connect to a MySQL database or
  • DBD::SQLite to connect to an SQLite database.

You will need either of one the DBD:: drivers to make use of page caching.

Topic revision: r1 - 02 Mar 2020, UnknownUser
© 2020 Ultranauts - 75 Broad Street, 2nd Floor, Suite 206, New York, NY 10004 - info@ultranauts.co