Putting Linked Data to Use in a Large
           Higher-Education Organisation

                                  Mathieu d’Aquin

       Knowledge Media Institute, The Open University, Milton Keynes, UK
                            {m.daquin}@open.ac.uk


      Abstract. In this paper, we describe applications built on top of the
      Open University’s linked data platform (data.open.ac.uk), from the
      point of view of the way they implement particular forms of interactions
      with linked data. We especially focus on the common advantages and
      pitfalls in interacting with linked data that these applications illustrate,
      from both the end-users’ and the developer’s perspectives. We conclude
      on suggested steps forwards regarding the ways to facilitate the realisa-
      tion and adoption of applications interacting with linked data.


1   Introduction

Building and deploying linked data in a large organisation represents a challenge
at many diﬀerent levels. Many of the past research and development works have
focused on the publication process for linked data: how to obtain data from
legacy information systems; how to model these data according the linked data
principles; how to link the organisation’s data with external sources; how to
expose the data online for wide accessibility. However, it is becoming more and
more clear that there is a need to investigate another, possibly more important
aspect – the other side of the coin: what are the issues related to interacting
with linked data, from an end-user as well as a developer perspective.
    In this paper, we rely on our experience in building, deploying and apply-
ing the Open University’s linked data platform (http://data.open.ac.uk) to
investigate this perspective. Data.open.ac.uk, developed through the LUCERO
project (http://lucero-propject.info), was the ﬁrst initiative to expose the
public information of a university as linked open data, collecting and providing
access to data from across the institution’s departments1 . One of the challenges
  Copyright � c 2012 by the paper’s authors. Copying permitted only for private and
  academic purposes. This volume is published and copyrighted by its editors.
  In: C. Unger, P. Cimiano, V. Lopez, E. Motta, P. Buitelaar, R. Cyganiak (eds.): Pro-
  ceedings of Interacting with Linked Data (ILD 2012), Workshop co-located with the
  9th Extended Semantic Web Conference, Heraklion, Greece, 28-05-2012, published
  at http://ceur-ws.org
1
  It has since then been followed by a number of other initiatives applying linked
  data in the higher-education sector, with high potential impact regarding the reuse
10        M. d’Aquin

related to pioneering the use of linked data in a particular sector is the need
to demonstrate the advantages that it brings to the users of the organisation.
Several applications have been developed that are at diﬀerent stages of their
lifecycle, and have been deployed for diﬀerent audiences (in size, technological
awareness, etc.) Through these applications, common beneﬁts as well as the chal-
lenges of providing linked data-based functionalities to ’real users’ are emerging.
    In this paper, we discuss some of these applications from the perspective of
the way they provide means to interact with existing linked data sources. We
discuss the lessons learnt from our experience regarding the issues, challenges and
pitfalls of interacting with linked data. We not only consider here the perspective
of the end-users in using the developed applications, but also the ones of the
developers having a more direct interaction with linked data with the purpose
of providing usable functionalities.


2      The Open University’s Linked Data Platform
Data.open.ac.uk is a linked data endpoint that collects data from many diﬀerent
sources within the organisation, using a variety of diﬀerent vocabularies and
linking to external sources such as dbpedia.org or geonames.org (see [4]). Data
collection is based on identifying streams of data inside the organisation and,
in collaboration with the data owners, re-modelling the data to ﬁt exposure as
linked data. The architecture of the platform is based on a triple store providing
a SPARQL endpoint, on ad-hoc mechanisms to extract and update data from
the considered streams, and on a basic URI delivery mechanism. The process
is continuous, with more data being exposed whenever new resources are made
available. The current sets of data include:

Course information: This includes information about courses that are cur-
   rently on oﬀer at the Open University. The information includes a short
   description of the course, information about the levels and number of credits
   associated with it, the topics, and the conditions of enrolment (the coun-
   tries in which it is available, the dates for registration and the student fees).
   Example: http://data.open.ac.uk/course/m366
Research publications: This includes metadata for the research articles and
   other publications authored by Open University researchers, as available on
   the publication repository of the university (ORO, see http://oro.open.
   ac.uk). An article typically includes information about the authors, dates,
   abstract and venue of the publication.
   Example: http://data.open.ac.uk/oro/29916
Podcasts: This includes the metadata for audio and video podcasts produced
   and made openly available by the Open University as open educational
   resources (see http://podcast.open.ac.uk). A typical podcast entity in-
   cludes a short description, the topics, a link to a representative image and to
     and interoperability of educational resources from across universities (see http://
     linkeduniversities.org)
      Putting Linked Data to Use in a Large Higher-Education Organisation     11

   a transcript if available, as well as information about the course the podcast
   might relate to and license information regarding the content of the podcast.
   Example: http://data.open.ac.uk/podcast/218dce44a4ed17b36ada50
   d18b866b03
Open Educational Resources: This includes metadata about units of Open
   Educational Resources made available by the Open University through its
   OpenLearn system (see http://openlearn.open.ac.uk). A typical ’Open-
   LearnUnit’ includes a short description of the units, the topics, tags used to
   annotate the resource, its language, as well as the course it might relate to,
   and the license that applies to the content.
   Example: http://data.open.ac.uk/openlearn/m366\_2
Youtube videos: This includes metadata about videos published by the Open
   University on Youtube, as promotional videos or open educational resources.
   Such metadata include a short description of the video, the tags that were
   used to annotate the video, the collection it might be part of and a link to
   the related course if relevant.
   Example: http://data.open.ac.uk/page/youtube/A7BA7C1155BE887E/
   1E5D9A1BA21BDC51
University buildings: This includes information about the building owned by
   the University. The Open University being a distance learning education, be-
   sides the main campus located in Milton Keynes, it also includes regional
   centres located in diﬀerent locations across the UK territory. Building de-
   scriptions include their address (including links to the corresponding admin-
   istrative areas in http://data.ordnancesurvey.co.uk/), a picture of the
   building and the sub-divisions of the building into ﬂoors and spaces.
   Example: http://data.open.ac.uk/location/building/rbedrb
Library catalogue: This includes metadata about items available at the Open
   University’s library that relate to Open University courses (textbooks and
   setbooks). The description of each item includes information about the top-
   ics, the authors, the publisher and ISBN, as well as the course it relates to.
   Example: http://data.open.ac.uk/library/406973
Other speciﬁc data: Other datasets are also included that concern speciﬁc
   research projects (e.g., the Open Arts Archive – http://openartsarchive.
   org/) or speciﬁc departments of the Open University (e.g. the FOAF proﬁles
   of people from the Knowledge Media Institute – http://people.kmi.open.
   ac.uk).


3   Applying Linked Data at the Open University

Amongst the many applications developed on top of the data.open.ac.uk plat-
form (see e.g. [3]), we choose to describe here the ones that had a concrete
deployment and impact on diﬀerent categories of users of the Open University,
focusing on our experience regarding beneﬁts and issues at the level of interact-
ing with the data, both from an end-user perspective, and from a developer’s
perspective.
12       M. d’Aquin

3.1     The ’Study at the OU’ Mobile Application
“Study at the OU” the website of the Open University that contains the descrip-
tion of the courses and qualiﬁcations that can be obtained from the University
(see http://www3.open.ac.uk/study/). A mobile application was recently de-
veloped by the communication services of the University so that this course
catalogue and additional information about the topics covered can be accessed
from various types tablets and smartphones (see Figure 1). As part of this ap-
plication, it is possible to select a topic and obtain information both about the
courses available on this topic, and about the related resources such as pod-
casts, Youtube videos and OpenLearn units. This last feature is implemented
using data.open.ac.uk, simply querying resources that are directly related to
the topic being considered, or for resources attached to courses that are related
to this topic.
    Since its launch in January 2012, this part of the application has been ac-
cessed more than 25,000 times.


     Fig. 1. Screenshots of the ’Study at the OU’ application on an Android phone.


    From the users’ perspective, while connecting to relevant resources could
be a very valuable feature, especially for prospective students, there is no indi-
cation that this has been realised with linked data. In other terms, without
knowing the underlying information architecture, there is no reason not to be-
lieve that this functionality was realised using more common technologies. The
added value here is however that such a simple and straightforward feature re-
quires the combination of information coming from diﬀerent, and mostly isolated
systems (the course database, the podcast system, etc.), which are seamlessly
integrated through linked data.
      Putting Linked Data to Use in a Large Higher-Education Organisation      13

    It is from the developers’ perspective however, that the beneﬁts of in-
teracting with linked data to build such an application is appearing obvious.
Indeed, providing the same feature using the usual information infrastructure of
the Open University would have required accessing and connecting to many dif-
ferent systems that use diﬀerent platforms, technologies, formats, conventions,
etc. It would also have required an ad-hoc integration of the data, with an ad-
ditional level of complexity.
    It is still a problem however for developers used to more common technologies
to make use of linked data technologies such as SPARQL (http://www.w3.org/
TR/rdf-sparql-query/), and to integrate them with their usual development
environment. The solution adopted here was to create an intermediary view
generating an ad-hoc XML descriptions of the relevant information from the
results of pre-established SPARQL queries. This also allowed the inclusion of
caching mechanisms, to avoid adding unnecessary overhead to the SPARQL
query engine to process identical queries from potentially thousands of users.


3.2   Supporting the ‘Research Excellence Framework’ Activities

The ‘Research Excellence Framework’ (REF) is the process applied to evaluate
and assess the quality of research in UK universities (see http://www.hefce.
ac.uk/research/ref/). As part of this process, each university is required to
submit to their corresponding funding body a report summarising the research
carried out at the university in various disciplines. In order to achieve this, 18
diﬀerent panels have been formed at the Open University (to cover the disciplines
in which the Open University is carrying-out research), in charge of identifying
individual researchers with a selection of their publication to be part of the
submission.
    To support this work, an application was developed (see Figure 2) to be
used by individual researchers to select what they considered to be their ‘best’
publications (since 2008), to indicate to which discipline they are associated,
and to annotate their selected publications to include supporting statements for
their selection (describing their signiﬁcance, originality, etc.) This application
has already been accessed by about 600 researchers concerned with the REF at
the Open University, and a similar application is being developed at the moment
to support the work of the 18 panels.
    The application uses linked data to obtain for each individual researcher,
the list and description of their publications in the recent years, as well as to
connect them with information regarding their role in the organisation and the
faculty/department they relate to. The captured information (selection of pub-
lications and their annotations) is also processed according to linked data prin-
ciples and technologies, creating another (private) triple store
    From the users’ perspective, the main advantage of using linked data in
this application is that information about their publications is directly obtained
and integrated with other information, without them having to provide any ad-
ditional input. Their linked data identiﬁer (i.e., their URI) is directly derived
14      M. d’Aquin


Fig. 2. Partial screenshot of the interface for researchers to annotate their publications
for the REF.


from the login name they use to access the application (as well as all other sys-
tems on the Open Universities intranet), meaning that the relevant publications
are displayed directly as they access the application. The use of linked data is in
principle, as in the previous section, hidden from the user. However, some ele-
ments can sometimes create confusion due to the diﬀerent modelling of the data
from the original sources. In the original source for example, while each author is
associated with a unique ID, their name is recorded separately for each publica-
tion. In the linked data version, the names are all aggregated under the person’s
ID, meaning that each publication might appear slightly diﬀerently than in the
original source (ORO). This also means that errors that can be very localised in
the original source (a typo in the name of an author in one of their publications),
might have a larger impact in the linked data version. Similarly, the application
relies on the fact that each person and each publication is associated with a
stable and unique identiﬁer. This is however a strong assumption which, even
considering well curated sources, is often hard to achieve (e.g., people changing
user IDs, publications entered multiple times, etc.)
    Here again, it is from the developers’ perspective that the advantages
of relying on linked data technologies and principles are the most obvious. In-
deed, this allows to build on top of existing data and process various sources
in an homogeneous way. Also, in producing new data (selections and annota-
tions of publications), linked data technologies allow more ﬂexibility and agility
than with traditional, relational database systems: ‘adding a ﬁeld’ in the data is
trivial and does not require any database administration task, just adding data
according to the newly considered ontological property. This is even more facil-
itated with the appearance of robust implementations of the SPARQL Update
(http://www.w3.org/TR/sparql11-update/) language (on which this applica-
tion relies), allowing the homogeneous use of the HTTP protocol both for query-
ing and updating a triple store. Adding and integrating new sources of data is
also made easier, as long as this information is provided using URIs consistent
      Putting Linked Data to Use in a Large Higher-Education Organisation          15

with the ones already in use, and the produced data is naturally reusable to
build further applications.
    Many issues appear here however that are not usually present when using
more traditional technologies. One of them concerns the way to deal with incom-
plete data. Indeed, in applications like the one considered here that rely on a set
of established queries, assumptions are made regarding certain properties of the
data, which are never made explicit. For example, it is expected that every pub-
lication is associated with the list of authors. In case this assumption is not valid,
and some publications do not have a list of authors, these publications would
simply not appear in any result, making it diﬃcult to recognise that a potentially
problematic issue have emerged. This adds to the diﬃculty of knowing whether
a problem that is identiﬁed at the level of the application originates from the
application, the linked data representation or the original sources. Dealing with
such issues generates more complexity in the querying process, as well as in the
development/maintenance of the application.


3.3   Understanding Research Communities at the Open University

In the continuity of the application described in the previous section, another
application was developed to help research managers within the university in
understanding and monitoring their research communities. Called RADAR (Re-
search Analysis with DAta and Reasoning), this application makes use of infor-
mation about research publications from ORO (as above), as well as other sources
regarding the positions of researchers, their projects, funding, supervision his-
tory, etc. to visualise diﬀerent indicators of research activities for individuals and
groups.
    Two parts of the application were developed. The ﬁrst one relies on a generic
framework for the visualisation and exploration of linked data sets, which is
parametrised by an ontology of the particular domain of the application. It uses
basic ontological reasoning to classify individuals into diﬀerent classes, and uses
automatically generated charts and tag clouds to visualise the distribution of val-
ues of the properties in individual classes (see Figure 3). In our case, the classes
correspond for example to diﬀerent categories of academic staﬀ (senior/junior
researchers, lecturers, professors, etc.) or diﬀerent types of projects (internal,
national, european, etc.) The indicators being visualised here correspond for ex-
ample to the amount of funding received, the number of publications, of projects
or the number of supervised students.
    The second part of the application uses the same data and indicators, but
displays graphs speciﬁcally conceived for the application in research community
analysis, considering for example the distribution of the number of publications
per year, or the overlap between the publications of a group of co-authors (see
Figure 4).
    From the users’ perspective, the advantages and issues related to the
generic part of the RADAR applications appear very clearly. On the one hand,
the application is driven by an ontology, which means that only integrating more
16      M. d’Aquin


Fig. 3. Screenshot of the ‘generic’ part of the RADAR application, showing distribu-
tions of indicators for sub-classes of members of KMi.


Fig. 4. Screenshot of the ‘speciﬁc’ part of the RADAR application, showing distribu-
tions of indicators for a particular group of researchers.
      Putting Linked Data to Use in a Large Higher-Education Organisation        17

knowledge into it is suﬃcient to make it better structured and more comprehen-
sive. There is an advantage also in having an homogeneous representation for
diﬀerent types of objects, and in having a comprehensive view of all the diﬀerent
indicators and types of data available. It is however a lot more demanding for
end-users, as the interface would often include irrelevant elements. Its generic
aspect (meaning that it can be applied similarly to other datasets, in other
domains) also means that he organisation and navigational structure of the ap-
plication is guided by the modelling in the data, which is often not natural to
end-users. Finally, it makes the visualisations presented harder to interpret, as
they do not necessarily refer to the notions considered in the domain.
    The speciﬁc part of the application was developed to counter the deﬁciencies
of the generic view, by showing visualisations more directly relevant, more un-
derstandable and often more complex than what the generic part could do. While
this has clearly been welcomed by users, the main issue with this part is that it
is necessarily limited not only to the speciﬁc data and domain being considered,
but also to the views that the developers of the application implemented.
    From the developers’ perspective, it is natural to think that the generic
version of the application is more diﬃcult to build, as it requires to abstract from
the domain and data-speciﬁc assumptions that can be made with the speciﬁc
version. It is however also more customisable, as many changes can be brought
into it by modifying the data or the ontology on which it relies. This introduces
a number of issues however, at the level of the usability of the application as
mentioned above, as, without such assumptions being explicitly made, many of
the results being shown to the user cannot be properly interpreted.
    The speciﬁc part of the application naturally suﬀers from the inverse issues:
while its interface design is guided by the requirements of the speciﬁc task, it
requires signiﬁcant eﬀorts to be extended to support other visualisations or tasks.
It is in this sense closer to the application presented in Section 3.2.

3.4   Investigating the Presence of the Open University in the Media
This last application is based on data collected in addition to what is available
from data.open.ac.uk. It relies on systems used by the Media Relation ser-
vices to collect clippings from news items mentioning the Open University and
its members. The data collected concern the publication/channel where a news
item has been issued, general metadata about the news item and possible addi-
tional information regarding researchers, lecturers or other members of staﬀ of
the Open University cited or who contributed to the news item. Links are also
created to data.open.ac.uk (regarding people) and to dbpedia.org (regarding
the publications and channels that provided the news items).
    An application has been created that allows members of the Open University
to create charts and reports on top of this data. A ‘chart generation’ interface
is provided that allows to create ﬁlters and identify categories to be visualised
based on properties and values in the data. Once conﬁgured, the interface creates
a linkable and embedable chart that can be customised, and is dynamically
updated based on changes in the data (see the example Figure 5).
18     M. d’Aquin


Fig. 5. Charts showing the distribution of topics of news items mentioning the Open
University and published through channels owned by the BBC.


     Here, two diﬀerent types of users need to be considered. For the users of the
chart, the results are reasonably straightforward, as the charts can be embedded
into dedicated interfaces that implement dashboards and navigation mechanisms
that are understandable in the partiular context. This application also clearly
demonstrates the beneﬁts of linked data, especially through exploiting the links
with external sources of data (in the example Figure 5, the information about
topics and number of corresponding news items is provided by the local data,
while the information regarding the channels owned by the BBC is provided by
dbpedia.org).
     The other type of users are the ones creating the charts using the conﬁgu-
ration interface. This interface requires to be able to understand and follow the
properties used in the data for diﬀerent types of objects. In other terms, while
it is generic and can be used to generate charts from any linked data endpoint,
it needs for the user to be familiar with the schema used in the data, the partic-
ular properties of the data, and with the speciﬁcities of their linked data-based
modelling.
     From the developers’ perspective, this application represent an example
of a generic application, but which is used to generate speciﬁc, customisable
visualisations. In this sense, it completely abstract from any speciﬁcities of the
data being considered, but on the other hand, cannot provide much guidance to
the users with respect to the use of the data, and to the interpretation of the
results.
      Putting Linked Data to Use in a Large Higher-Education Organisation        19

4   Conclusions: Challenges and Pitfalls of Interacting
    with Linked Data, and Steps Fowards
The four applications presented above represent concrete experiences of devel-
opments relying on linked data that have been actually deployed and used in
an environment more accustomed to proprietary, corporate software relying on
more common information systems and data management solutions. As such,
they illustrate common challenges and pitfalls that interacting with linked data
can generate, from both the end-users’ perspective and the developers’ perspec-
tives. In this section, we summarise the general notions that appear from such
experiences to require attention, which we believe help identifying important
directions for research into interacting with linked data.
    First, it appears clearly that linked data should be hidden from the end-
users. While this might appear trivial, this is not an easy goal to achieve: from
our experience, we can see that most of the advantages of linked data should
appear obvious to the application developers, but should as much as possible
not need to be understood by the end users. This of course concerns purely
technical elements such as URIs, RDF and SPARQL, but also more conceptual
considerations, such as the integration of multiple sources of data or the use of
reasoning. This is the case of our “Study at the OU” app and of the REF support
application, where little issues appear on the end-user side. In RADAR and
the media relation applications on the contrary, whenever the technology is too
present, and even if it is to provide advanced features, it introduces confusion for
the users. In other terms, while it is often still needed to convince stakeholders
of the value of linked data, applications that are technological demonstrators
tend to have little value to the users, and the technology should be essentially
be considered from the point of view of the developers.
    Second, there is an elusive trade-oﬀ to be found between developing generic,
reusable frameworks that can be applied on a large variety of datasets and
domains, and speciﬁc applications that are meant to work only with certain
datasets. Indeed, while the value of reusable components is quite obvious, and
making this possible is one of the strong beneﬁts of linked data, the RADAR
application clearly shows that achieving an intuitively useable application that
relies on a generic template for navigation and presentation of the data is
close to impossible. Most applications of linked data nowadays are closer to
the “Study at the OU” app or to the REF application: applications working
in a close environment with a clearly deﬁned and understood set of datasets
and queries. These applications tend to be disappointing as they cannot beneﬁt
from the openness of linked data and the possibility to integrate data which
might originate from other organisations, possibly at run-time. As a conclu-
sion, rather than a complete, generic application framework like what was at-
tempted with the generic part of RADAR, we believe that what is needed are
libraries of reusable and highly customisable interface components that rely on
generic linked data resources, but can be ﬂexibly integrated to create speciﬁc
application interfaces in speciﬁc scenarios. The chart creation feature repre-
sented by the media relation application can be seen as an example of such
20       M. d’Aquin

a library. Other initiatives exist that provide initial building blocks, such as the
SPARK (http://km.aifb.kit.edu/sites/spark/) javascript library, the SIM-
ILE timeline and map widgets (http://simile-widgets.org/), as well as more
general visualisation components, such as Fusion [2] or the Linked Data API2 .
    Third, it appears clearly that one of the obstacles to building reusable in-
terface components based on linked data is the openness and ﬂexibility of the
data model on which linked data relies. At a higher level, integrating data from
external sources represent a major challenge, as interaction needs to implement
a trade-oﬀ between control and the potentially inﬁnite possibilities that opening
the interface to unknown data can bring. More concretely, many tasks require
assumptions related to the data, which are rarely made explicit and formalised.
For example, most of the applications we have considered make the assumption
that each entity appearing in the interface are associated with a human readable
label. It is often the case that, if not requiring that there is only one human read-
able label for each entity, the application will make a choice between the ones
available, either randomly or using domain- and data-speciﬁc criteria. Similarly,
while operational and functioning in most situations, it is clear that the charts
generated by the media relation application can only make sense under certain
conditions. For example, if entities can be associated with more than one iden-
tiﬁer, this would generally lead to these entities being counted as if they were
multiple entities. Also, charts showing distributions of data would be misleading
if not all the entities considered have the same number of values for the visu-
alised property or if some of them do not provide values. In other terms, while the
fact that the formalisms underlying linked data do not make the closed-world
assumption or the unique-name assumptions is at the basis of their openness
and ﬂexibility, it makes them less exploitable as part of generic data processes.
Our suggestion related to this process would be to create a way to annotate
datasets (including such annotations for example in the Void [1] descriptions of
the datasets) related to the particular ‘characteristics’ of the datasets that can
be exploited by data processing/visualisation mechanisms, including for exam-
ple “local unique-name assumptions” indicating that the instances of a certain
class are non-redundant, as well as expressions similar to integrity constraints
(e.g., that there is necessarily a value of a given property for the instances of a
given class). Being able to rely on such characteristics would make it more fea-
sible for generic interface components, as previously suggested, to provide some
levels of guaranties regarding their interpretability and usability when applied
on particular data.


References

 1. Alexander, K., Cyganiak, R., Hausenblas, M., Zhao, J.: Describing Linked Datasets
    – On the Design and Usage of voiD, the ’Vocabulary of Interlinked Datasets’. In:
    Proc. of Linked Data on the Web, WWW 2009
2
     http://code.google.com/p/linked-data-api/
     Putting Linked Data to Use in a Large Higher-Education Organisation         21

2. Araújo, S., Houben, G.-J., Schwabe, D., Hidders, J.: Building Linked Data Appli-
   cations with Fusion: A Visual Interface for Exploration and Mapping. In: ISWC
   Posters and Demos (2010)
3. Zablith, F., d’Aquin, M., Brown, S., Green-Hughes, L.: Consuming Linked Data
   Within a Large Educational Organization. In: Proc. of the Second International
   Workshop on Consuming Linked Data (COLD), ISWC 2011
4. Zablith, F., Fernández, M., Rowe, M.: The OU Linked Open Data: Production and
   Consumption. In: ESWC Workshops Proceedings, Linked Learning 2011