Keeping the Cache Hot
November 15th, 2007
Problem
The exipry of content within caching architectures is only identified when a user makes a request for expired data. Hence a % of the visitors to the site will not be able to take advantage of caching.Many different caching architectures are used within a typical dynamic site. Hence the solution needs to be cache agnostic.
Architecture
Emmao bot was the name given to the python program which is used to keep the cache hot.
Figure 1: Emmaobot Server UML Model

Solution
Emmao bot has been built to act as a user agent and request pages. mod_python is used to make the apache children log their requests in a special format. Emmao bot is running in the background as a daemon process. It can be run from the webhead or an alternative server. It examines the special apache log files and adds events for when these expiry. Lib event is used to manage these events. Pages have different rankings based on analysis as emmao runs. It uses this to ensure that the most important/popular/heavy pages never expiry. Also if there is a limit on the number of pages to focus on, rank can be used to decided which pages to ignore.
The Cost
Although the number of pages that emmao bit manages can be set to limit load on the webserver, there is still an increase in traffic due to Emmao Bot.
In live production environments with Emmao bot managing 10,000 pages I have not found the peformance outway the benfit of reducing maximum user fetch time.
Links
LibEvent http://monkey.org/~provos/libevent
ModPython http://www.modpython.org/
Squid and members
November 15th, 2007
Task
Use Squid to manage a cache for a website where there are member users (logged in to site) and public users. Squid must cache both member views of a page and public views.
Squid needs to check the authentication of the user and decided whether it should redirect them to a cache for members or for public users. There are only two discrete sets of users and any content that is specific to users if handled via AJAX.
- Squid will be operating as a transparent proxy.
- Usernames/Passwords are stored within a MSSQL database.
- Squid is hosted on a unix box along with Apache