=Paper=
{{Paper
|id=Vol-485/paper-8
|storemode=property
|title=Visualising web server logs for a Web 1.0 audience using Web 2.0 technologies
|pdfUrl=https://ceur-ws.org/Vol-485/paper8-F.pdf
|volume=Vol-485
|dblpUrl=https://dblp.org/rec/conf/um/QuinceyKF09
}}
==Visualising web server logs for a Web 1.0 audience using Web 2.0 technologies==
Workshop on Adaptation and Personalization for Web 2.0, UMAP'09, June 22-26, 2009
Visualising web server logs for a Web 1.0 audience
using Web 2.0 technologies: eliciting attributes for
recommendation and profiling systems
Ed de Quincey1, Patty Kostkova1 and David Farrell1
1
City eHealth Research Centre (CeRC), City University, London, UK
{Ed.de.Quincey}@city.ac.uk
Abstract. Web server logs have been used via techniques such as user profiling
and recommendation systems to improve user experience on websites. The data
contained within server logs however has generally been inaccessible to non-
technical stakeholders on website development projects due to the terminology
and presentation used. We describe a process that uses visualisation to enable
these stakeholders to identify questions about site usage including user profiling
and behaviour. The development of this tool utilising Web 2.0 technologies is
described as well as feedback from the first stage of user evaluation on a real-
world multi-national web development project called e-Bug. The potential for
this process to elicit user attributes and behaviour that can be incorporated into
automated user profiling systems is also discussed.
Keywords: Visualisation, Web Server Logs, User Profiling, Web 2.0
Technologies
1 Introduction
Research into online user behaviour has been aided by the relative ease of collecting
feedback data using implicit methods such as web server logs [1, 2, 3], compared to
explicit methods such as usability testing [4, 5], tagging [6] and ratings [7]. The data
stored in server logs has been used to create a number of recommendation [8, 9, 10]
[11, 12] and profiling systems [13, 14].
This has had a dramatic impact on the user experience e.g. Amazon [15] but apart
from deliberate or accidental releases of server log data (e.g. NetFlix Prize1, AOL),
the information contained within the logs has been generally hidden from the users of
a website and more importantly from non-technical stakeholders of a web
development project. This means that few people outside of the server log analysis or
web development communities fully understand the information that is stored in web
logs and the user behaviour that it can explain.
1 http://www.netflixprize.com/
69
Workshop on Adaptation and Personalization for Web 2.0, UMAP'09, June 22-26, 2009
There have been several commercial attempts (Google Analytics2, Sawmill3,
WebTrans4) that have tried to make server logs, and therefore user behaviour, more
accessible to site owners. However, these applications analyse generic features of
sites that do not answer specific questions that certain stakeholders will have and do
not help them identify trends in user behaviour due to the sheer volume and technical
nature of the information presented [16].
A potential solution to this problem is to use techniques from the field of Software
Visualisation (SV) to make the data contained within server logs more accessible to
non-technical stakeholders in website development projects. Using these methods
utilises the innate pattern matching ability [17] of the human cognitive system to
identify trends in user behaviour which might be missed by the current automated
profiling and recommendation systems. Once identified, non-technical stakeholders,
such as content providers, can adapt content and the site design to fit user behaviour
[16]. This human expertise could then potentially be integrated into current automated
recommendation and profiling systems.
This paper describes the process of developing and using visualisation techniques
to disseminate site usage information to non-technical stakeholders, in order to
identify potential attributes for user profiling and recommendation systems. An
ongoing multinational project in e-Health, called e-Bug (www.e-bug.eu), has been
used as a test-bed and feedback from project stakeholders is detailed. The future
possibilities of this technique are discussed as well as general implementation issues
from using Web 2.0 technologies.
2 Background Information
2.1 Visualisation and Metaphors
Visualisation is concerned with making large amounts of information more
comprehensible for the user by using a visual representation. Software Visualisation
has been successfully used by software engineers to “make software more visible”
[18] by representing the significant features of code using a visual metaphor. A well
known example of a visualisation is the London Underground Tube Map5 which is a
representation of a complex, real world artifact that can be understood immediately
and navigated simply. A detailed taxonomy of SV has been produced by Brice et al.
[19] and also the related fields of Information Visualisation, Visual Analytics [20][21]
and Metaphors used in interface design [22] contain a number of related and relevant
techniques.
2 http://www.google.com/analytics/
3 http://www.sawmill.net/
4 http://www.webtrans.co.uk/
5 http://www.tfl.gov.uk/gettingaround/1106.aspx
70
Workshop on Adaptation and Personalization for Web 2.0, UMAP'09, June 22-26, 2009
2.2 The e-Bug Project
e-Bug is a European Commission funded project that aims to reduce inappropriate
antibiotic use and improve hygiene through improving the education of young people
in seventeen participating countries. e-Bug combines traditional methods of
classroom delivery with online, browser-based (Flash) games to teach a pupils in
junior and senior schools about microbes, hand and respiratory hygiene, and
antibiotics. Example lessons and media are available on the e-Bug website6 alongside
games that can be used alongside the pack or standalone [23].
Currently the server logs from the e-Bug project are analysed using a proprietary
application called Sawmill. This produces standard reports that cover information
such as visits, hits, content viewed, visitor demographics and systems and referrers.
These reports are produced monthly and uploaded onto the e-Bug website7. It was
found however that although the project partners expressed a high degree of interest
in the website statistics during meetings, the format that the reports were currently in
were not easily accessible to non-technical users. This was mainly due to the
terminology used and the statistics presented not answering specific questions that the
project partners had regarding the users of the site [D. Farrell 2009, pers. comm.]. It
was decided therefore that the server logs from the e-Bug project website would make
a suitable test-bed to use visualisation techniques to analyse and present the statistics
in a way that reduced the confusion and elicited potential attributes for user profiling.
3 Method for server log visualisation
A User Centred Methodology (UCD) [24] was used to develop a prototype
application that would visualise the statistics that were currently calculated by the
Sawmill application e.g. visits during particular months/years, geolocations of visits.
Sketching has been used previously to create code visualisation software [25] and
so the same approach was used initially to explore potential metaphors and
representations that could be used. An example sketch is shown below in Figure 1.
6 http://www.e-bug.eu
7 http://www.e-bug.eu/ebug_secret.nsf/England-Project-General/eng_eng_p_wp_gn_stats
71
Workshop on Adaptation and Personalization for Web 2.0, UMAP'09, June 22-26, 2009
Fig. 1. Example sketch illustrating the weather map metaphor and bar charts
At this stage two potential metaphors were identified: a weather map metaphor and
a timeline metaphor. After discussion with members of the project team it was
decided to begin by developing the weather map metaphor as this would support one
of the main features that was missing from the current reports: accurate geographical
distribution of the users of the site.
3.1 Web 2.0 Technologies for Visualisation
Having identified possible interface designs for the application, an online prototype
system was developed and suitable technologies explored for creating the map
metaphor. The following figure shows the first version of the prototype.
Fig. 2. Visualisation of visitors in September 2008 with each red icon representing a visitor
The interface incorporates two main visualisations. An area on the left hand side of
the screen that shows the number of visitors and page views in a particular year and
72
Workshop on Adaptation and Personalization for Web 2.0, UMAP'09, June 22-26, 2009
month (and their daily distribution) using simple bar charts. The area on the right
contains a map with individual visitors denoted by their location with a marker (in
this case the e-Bug logo). Users can select particular months from the drop down
menu on the left and navigate the map using the navigation icons and the mouse
pointer.
The map was created using the Google Maps API, which uses JavaScript to make
asynchronous calls (AJAX) to display the map and the markers. The data for the
markers is stored in an XML file that is generated by a PHP page parsing a CSV file
that is created using Sawmill8. The CSV file contains paired values of a users’
hostname and the number of page views that came from that IP address. PHP is then
used along with the GEOIP Lite Open Source reverse geolocation database9 to
calculate a longitude and latitude for each hostname. These are then saved in an XML
file in the following format:
The bar charts were created using the Google Charts API, which creates dynamic
images based on parameters passed in the querystring, for example:
The parameters were determined using PHP pages and CSV files that contain
monthly and daily totals of visits and page views.
4 Evaluation
This prototype was then uploaded to the e-Bug website and feedback was elicited
from members of the e-Bug project team from seventeen European countries, as well
as researchers involved in similar projects at UK Universities as part of the UCD
process. The evaluation was in the form of an email with a set of open-ended
questions that respondents were asked to answer regarding the interface. The main
focus of this exercise was to ascertain whether the information that was being
represented was clear enough, whether appropriate metaphors were being used and
also whether there were any other statistics that users would be interested in. As this
is an ongoing project, feedback has so far been received from nine respondents.
The majority of respondents reacted positively to the interface and the visualisation
and a number of them were able to give detailed feedback, indicating that they were
able to understand what the page was showing and what it did not. The main recurring
points from this feedback are detailed below:
Add representation that shows “magnitude of visitors” as it is difficult to gauge
repeat visitors, number of pages viewed and markers that overlap.
8 The data from Sawmill was used rather than the raw server logs due to the fact that Sawmill
filters out certain web crawlers as well as using custom filters that have been created to
remove certain IP addresses.
9 http://www.maxmind.com/app/geolitecity
73
Workshop on Adaptation and Personalization for Web 2.0, UMAP'09, June 22-26, 2009
Add specific place markers to the map that do not appear (unless at a higher zoom
level).
Add specific evaluation areas/overlays onto the map10.
Show the density of visitors in each area i.e. show visitors per 100,000 population
to get more meaningful comparisons.
Add in a view of popular pages downloads and where they originate from.
Highlight returning visitors.
Add in a view that shows the times of day that various pages are being accessed
e.g. if the games are being viewed outside of school hours this could indicate that
students are playing them at home.
Ability to compare months and countries.
One of the most interesting points noted by the stakeholders however was the fact
that the data being represented itself is a potential area of confusion. For example, a
number of users gave the general impression that they did not know the difference
between a visitor and a hit. It became clear that the target users of this application do
not posses the same knowledge that experts in the field take for granted and further
investigation into this area is being conducted.
Following on from this, a second version is currently being developed to take into
account the feedback and also to tackle some of the issues that have been raised with
regards to the interface and the information that users would like displayed. A
screenshot(s) from the second iteration of the software is shown below:
Fig. 3. Version 2 of the software visualises different types of file downloads, represented with
two different colours
As well as markers and statistics for visitors, information regarding pack
downloads (educational resources for teachers in Word and PowerPoint files) has
been included and split into “Junior” and “Senior” versions.
This version of the application also uses an updated visitors’ visualisation that
takes into account the number of page views from a particular users. The well-known
temperature scale visualisation used on weather maps has been utilised to be able to
differentiate between the levels of activity in various regions.
10 this can potentially be achieved using the Google Polylines’ API
74
Workshop on Adaptation and Personalization for Web 2.0, UMAP'09, June 22-26, 2009
5 Discussion
5.1 Potential for use in User Profiling
Initial feedback has already indicated that visual representations of the data have
allowed the non-technical stakeholders in the project to start to identify user types and
user behaviour. One particular interest is whether pupils are accessing the games
pages at home or at school and whether the tool can identify whether it is a student
viewing the website or a teacher. By geographically representing visitors in relation to
the location of target schools, along with the time they are accessing the site can
potentially achieve this simple user profiling task.
This example and others detailed in Section 4 indicate that providing non-technical
stake holders with a visual representation of the server logs has allowed them to
communicate requirements for further analysis which can either be integrated into the
filters used in the Sawmill application or into the visualisation tool. Without the use of
visualisation techniques, it is doubtful that these questions regarding the users of the
site and their behaviour would have been raised.
Further investigation of user profiles and understanding of national profiling
differences is a subject of our ongoing research.
5.2 Strengths of Web 2.0 Technologies for Visualisation
There are a number of advantages with using Web 2.0 technologies such as the
various Google API’s and AJAX such as being able to create richer and more
interactive online interfaces but the main advantage relates to being able to utilise
users’ pre-existing skills and experience. The majority of users have prior experience
with interfaces such as Google Maps and in the same way that the Desktop has
become the standard metaphor used for operating systems, maps and markers and the
various methods of interaction that Google has developed have become a standard in
this area. Being able to “piggy-back” on to that frees the user from the interface and
allows them to focus on the visualisation, even though this application is a bespoke
solution.
An associated advantage is that Google is a global organisation and so is its
software. The potential users of this software are from a diverse set of countries with
a number of different languages and levels of expertise. With Google being even
more popular in Europe than the US [26], and its projected market share expected to
take over the number one position from MapQuest by the end of the year [27], means
that the chances of a user having had previous exposure to the Google Maps interface,
and therefore the interface of this application, is quite high. This also has follow on
advantages for issues such as localisation and internationalisation.
The other advantage is the increased speed in development. Being able to harness
pre-existing API’s allows for rapid prototyping and the ability to demonstrate a
working concept to users to elicit feedback almost immediately and also allows for
faster changes and incremental versions.
75
Workshop on Adaptation and Personalization for Web 2.0, UMAP'09, June 22-26, 2009
Finally, the fact that Web 2.0 technologies are also designed to be accessible via a
number of different browsers and platforms also allows for speedier access and
dissemination of the information which is vital for cross-nation projects such as e-
Bug.
5.3 Limitations of Web 2.0 Technologies for Visualisation
One of the main problems with the Google Maps marker metaphor is the problem of
occlusion, something that is common when using 3-D visualisations. If a user visits
the site numerous times or downloads numerous pack or pages it is difficult to
represent that with numerous markers on the map as they will overlap with one
another. This can partially solved with the colour coding of markers but the accuracy
of the geolocation database and the fact that numerous visitors can originate form the
same area means that the markers often overlap. To improve this a method for
clustering the markers so that close “neighbours” are represented by one marker and
for this information to be presented textually once a user clicks on a clustered marker
are being investigated.
A related problem is the amount of data that can be represented using these tools
and the limitations of the browser. During testing of the application it was found that
once around five thousand markers were placed on the screen using the standard
Google method, the browser would slow down and become unusable. For this
prototype this problem was solved by filtering out duplicate markers and also non-
European hits (as this was not required at this stage in the site’s development).
However once the site it is launched and publicised further this year, there will be an
increase in visitors and therefore an increase in markers. Clustering methods are
therefore currently being investigated.
One final problem that was highlighted from user feedback was that relying on
users having had prior experience on Google Maps means that for those who have
not, or those who do not realise that this is a Google Maps interface, have initial
problems with the interface. Adding extra methods of navigation or instructional
video/instructions are currently being piloted.
6 Conclusion
The process of identifying appropriate visualisations to allow non-technical users to
start to identify site usage from server logs is important for successful web site
development and evaluation. The process presented in this paper has provided a
number of insights into the potential of using Web 2.0 tools and metaphors for
visualisation and dissemination of information. Although at an early stage, the tool is
already providing insights into a number of usage patterns on the site which are
enabling non-technical stakeholders of the e-Bug project to start to identify distinct
user profiles and most importantly to start to be able to utilise the data stored in server
logs more readily.
Future work will include an investigation into pre-existing taxonomies that exist of
software visualisation [19] to see which might be relevant for representing web server
76
Workshop on Adaptation and Personalization for Web 2.0, UMAP'09, June 22-26, 2009
log data and also which can be supported by Web 2.0 technologies. Also, current
visualisation techniques from the biological sciences will be studied to see if any of
these are appropriate e.g. spread of user activity being represented in a similar way to
disease spread.
Following on from this, the tool will be used in an investigation into user
behaviour on the e-Bug website in order to see whether researchers can identify usage
trends visually and what are the attributes of these trends e.g. time of day a user visits
plus geographical location might indicate whether they are a pupil or a teacher. This
will then feed directly into the development and tailoring of content for the site and
the potential for incorporating this into an automated profiling system will be
investigated.
References
1. Dupret, G. E. and Piwowarski, B.: A user browsing model to predict search engine click
data from past observations. In Proceedings of the 31st Annual international ACM SIGIR
Conference on Research and Development in information Retrieval (Singapore, Singapore,
July 20 - 24, 2008). SIGIR '08. ACM, New York, NY, 331-338 (2008)
2. Joachims, T.: Optimizing search engines using clickthrough data. In Proceedings of the
Eighth ACM SIGKDD international Conference on Knowledge Discovery and Data
Mining (Edmonton, Alberta, Canada, July 23 - 26, 2002). KDD '02. ACM, New York, NY,
133-142 (2002)
3. Joachims, T., Granka, L., Pan, B., Hembrooke, H., Radlinski, F., and Gay, G.: Evaluating
the accuracy of implicit feedback from clicks and query reformulations in Web
search. ACM Trans. Inf. Syst. 25, 2 (Apr. 2007), 7 (2007)
4. Nielsen, J.: Usability inspection methods. In Conference Companion on Human Factors in
Computing Systems(Boston, Massachusetts, United States, April 24 - 28, 1994). C. Plaisant,
Ed. CHI '94. ACM, New York, NY, 413-414 (1994)
5. Spool, J. and Schroeder, W.: Testing web sites: five users is nowhere near enough. In CHI
'01 Extended Abstracts on Human Factors in Computing Systems (Seattle, Washington,
March 31 - April 05, 2001). CHI '01. ACM, New York, NY, 285-286 (2001)
6. Kipp, M. E. and Campbell, D. G.: Patterns and inconsistencies in collaborative tagging
systems: An examination of tagging practices. Available from
http://dlist.sir.arizona.edu/1704/01/KippCampbellASIST.pdf (2006)
7. Anand, S. S., Kearney, P., and Shapcott, M.: Generating semantically enriched user profiles
for Web personalization.ACM Trans. Interet Technol. 7, 4 (Oct. 2007), 22 (2007)
8. Schafer, J. B., Konstan, J., and Riedi, J.: Recommender systems in e-commerce.
In Proceedings of the 1st ACM Conference on Electronic Commerce (Denver, Colorado,
United States, November 03 - 05, 1999). EC '99. ACM, New York, NY, 158-166 (1999)
9. Schafer, J. B., Konstan, J. A. and Riedl, J.: Ecommerce recommendation application, Data
Mining and Knowledge Discovery, 5(1/2):115–153 (2001)
10.Sarwar, B. M., Karypis, G., Konstan, J. A. and Riedl, J.T.: Item-based collaborative filtering
recommendation algorithms, Proc. of the Tenth Int. WWW Conf., pp. 285–295 (2001)
11.Ali, K. and van Stam, W.: TiVo: making show recommendations using a distributed
collaborative filtering architecture. In Proceedings of the Tenth ACM SIGKDD international
Conference on Knowledge Discovery and Data Mining (Seattle, WA, USA, August 22 - 25,
2004). KDD '04. ACM Press, New York, NY, 394-401 (2004)
77
Workshop on Adaptation and Personalization for Web 2.0, UMAP'09, June 22-26, 2009
12.Herlocker, J. L., Konstan, J. A., Terveen, L. G., and Riedl, J. T. 2004. Evaluating
collaborative filtering recommender systems. ACM Trans. Inf. Syst. 22, 1 (Jan. 2004), 5-53
(2004)
13.Kostkova, P., Diallo, G. and Jawaheer, G.: User Profiling for Semantic Browsing in Medical
Digital Libraries, Proceedings of ESWC 2008, The Semantic Web: Research and
Applications, 5th European Semantic Web Conference, Tenerife, Canary Islands, Spain,
June 1-5 (2008)
14.Danilowicz, C. and Indyka-Piasecka, A.: Dynamic User Profiles Based on Boolean
Formulas, Lecture Notes in Computer Science, Volume 3029/2004, pages 779-787 (2004)
15.Linden, G., Smith, B. and York, J.: Amazon.com Recommendations Item-to-Item
Collaborative Filtering, IEEE Internet Computing, January-February (2003)
16.Drott, M. C.: Using Web server logs to improve site design. In Proceedings of the 16th
Annual international Conference on Computer Documentation (Quebec, Quebec, Canada,
September 24 - 26, 1998). SIGDOC '98. ACM, New York, NY, 43-50 (1998)
17.Sinha, P.: Recognizing complex patterns. Nat Neurosci 5:1093–1097 (2002)
18.Petre, M. and de Quincey, E.: A gentle overview of software visualisation. Computer
Society of India Communications. August, 6-11. ISSN 0970-647X (2006)
19.Price, B. A. Baecker, R. M. and Small, I. S. A Principled Taxonomy of Software
Visualization, Journal of Visual Languages and Computing, 4, 1993, 211—266 (1993)
20.Thomas, J.J. and Cook, K.A.: Illuminating the Path: The R&D Agenda for Visual Analytics.
National Visualization and Analytics Center. p.3-33 (2005)
21.Tufte, E.R.: The Visual Display of Quantitative Information. Graphics Press (1983)
22.Erickson, T.: Working With Interface Metaphors, 65-73, edited by B. Laurel, The Art of
Human-Computer Interface Design, AddisonWesley Publishing Company Inc, Reading,
Massachusetts, USA. (1990)
23.Kostkova, P. and Farrell, D.: e-Bug online Games for Children: Educational Games
Teaching Microbes, Hand and Respiratory Hygiene and Prudent Antibiotics Use in Schools
across Europe, In the Proceedings of the ESCAIDE 2008 conference, Berlin, November
2008 (2008)
24.U.S. Department of Health & Human Services: Step-by-Step Usability Guide. Available:
http://www.usability.gov/. Last accessed 4th February 2009. (2009)
25.Craft, B. & Cairns, P.: Using Sketching to Aid the Collaborative Design of Information
Visualisation Software-A Case Study, In Human Work Interaction Design: Designing for
Human Work, Springer Boston, Volume (221), p. 103—122 (2006)
26.Lipsman, A.: Google Holds Top Spot in European Site Rankings, According to comScore
World Metrix. Available: http://www.comscore.com/press/release.asp?press=988. Last
accessed 1st March 2009. (2006)
27.ABI Research: The Google Maps vs MapQuest Online Mapping Portal War Is Driving Map
2.0 Innovations. Available: http://tinyurl.com/dlwj7d. Last accessed 5th March 2009. (2009)
78