Proceedings of the
1st International Workshop on
eLearning Approaches for the
       Linked Data Age
    (Linked Learning 2011)

            collocated with the
  8th Extended Semantic Web Conference
               (ESWC2011)


      29 May 2011, Heraklion, Greece
Preface
While sharing of educational resources on the Web became common practice throughout
the last years, a large amount of research was dedicated to interoperable eLearning
repositories based on semantic technologies. Data interoperability is even more crucial,
since sharing of online resources at Web-scale is widely facilitated by established APIs,
such as OAI-PMH or SQI. Moreover, adoption of social computing aspects within
personal learning environments has become a dominant paradigm building on principles
such as user-centred identity management, service-orientation, and social participation.
However, it remains an unresolved challenge to provide a meaningful, automated and
personalized integration of diverse learning resources, e.g., formal and informal ones as
found on the Web.
Though the eLearning area has brought up a number of comprehensive metadata
standards (e.g., ADL SCORM, IEEE LOM, IMS LD) aiming at interoperability across
eLearning environments, actual take-up is still fragmented. This can be attributed to their
merely XML-driven approaches, the lack of established controlled vocabularies and the
incompatibility of individual schemas. Several research efforts tried to address these
issues by using Semantic Web technologies and ontology-based approaches. However,
these efforts often failed to attract a critical mass of adopters. This is due to reasons such
as inherent complexity, the lack of scalable and high-performance tool support when
following complex reasoning-based approaches and the inavailibility of vocabularies
when following proprietary representation schemes. In the meantime, the Semantic Web
has redefined itself throughout the last years as a Web of “Linked Data” by establishing
principles which support sharing of large datasets on the Web together with a technology
stack (use of URIs, RDF, and SPARQL) aimed at their realisation. The huge success and
widespread adoption of the Linked Data approach has lead to the availability of vast
amounts of public data such as DBPedia, WordNet RDF or the data.gov.uk initiative.
Although the Linked Data approach is not yet adopted widely within the eLearning
domain, this workshop has emerged on the fundamental belief that the Linked Data
approach has the potential to fulfill the eLearning vision of Web-scale interoperability of
eLearning resources and highly personalised and adaptive eLearning applications. The
workshop has been established to become a highly interactive research forum for
exploring the promises of the Web of Linked Data in technology-enhanced learning by
gathering researchers from the areas of the Semantic Web and technology-enhanced
learning.
The response to the call for papers was overwhelming for the first edition of this
workshop. Therefore, after extensive peer review (each submission was reviewed by at
least two independent reviewers) we were able to select 13 papers for presentation in the
program of the workshop. In addition, the workshop program also had an excellent
keynote speaker – Vania Dimitrova of the University of Leeds – a well-known researcher
in the areas of technology enhanced learning and Semantic Web.
The workshop would not be possible without contributions of many persons and
institutions. We are very thankful to the organisers of the ESWC 2011 conference for
providing us with an opportunity to organize the workshop, for their excellent
collaboration, and for looking after many important logistic issues. We are also very
grateful to the members of the program committee for their commitment in reviewing the
papers and assuring the good quality of the workshop program. We also thank the authors
for their invaluable contributions to the workshop by writing, revising and presenting their
papers. Of course, great appreciation of her time and expertise goes to our keynote
speaker Vania Dimitrova. We also want to express our strong gratitude to the publishers
of CEUR for publishing the Linked Learning 2011 workshop proceedings, to the
European Commission (EC) and the EC-funded research project mEducator for
sponsoring the best paper award and to the EasyChair developers for supporting the
submission and review process.
May 2011,

Stefan Dietze, Mathieu d'Aquin, Dragan Gasevic, Miguel-Angel Sicilia
Organisers
      Stefan Dietze, The Open University, UK
      Mathieu d'Aquin, The Open University, UK
      Dragan Gasevic, Athabasca University, Canada,
      Miguel-Angel Sicilia, University of Alcalá, Spain
Program Committee
      Lora Aroyo, Free University of Amsterdam, The Netherlands
      Soeren Auer, University of Leipzig, Germany
      Panagiotis Bamidis, Aristotle University of Thessaloniki, Greece
      Charalampos Bratsas, Aristotle University of Thessaloniki, Greece
      Dan Brickley, W3C & Free University of Amsterdam, The Netherlands
      Vania Dimitrova, University of Leeds, UK
      John Domingue, The Open University, UK & Semantic Technologies Insitute
       International, Austria.
      Nikolas Dovrolis, Democritus University of Thrace, Greece
      Marek Hatala, Simon Fraser University, Canada
      Jelena Jovanovic, University of Belgrade, Serbia
      Eleni Kaldoudi,Democritus University of Thrace, Greece
      Tomi Kauppinen, University of Münster, Germany
      Carsten Keßler, University of Münster, Germany
      Effie Lai-Chong Law, Leicester University, UK & ETH, Zurich, Switzerland
      Nikos Manouselis, Greek Research and Technology Network, Greece
      Dave Millard, University of Southampton, UK
      Evangelia Mitsopoulou, St George's University London, UK
      Wolfgang Nejdl, L3S Research Center, Germany
      Mikael Nilsson, Royal Institute of Technology, Sweden
      Carlos Pedrinaci, The Open University, UK
      Davide Taibi, Institute for Educational Technologies, Italian National Research
       Council, Italy.
      Vlad Tanasescu, University of Edinburgh, UK
      Fridolin Wild, The Open University, UK
      Martin Wolpers, Fraunhofer FIT.ICON, Germany
      Hong Qing Yu, The Open University, UK

Reviewers
      Dhaval Thakker, University of Leeds, UK
                 The OU Linked Open Data:
                Production and Consumption

            Fouad Zablith, Miriam Fernandez and Matthew Rowe

             Knowledge Media Institute (KMi), The Open University
             Walton Hall, Milton Keynes, MK7 6AA, United Kingdom
                  {f.zablith, m.fernandez, m.c.rowe}@open.ac.uk


      Abstract. The aim of this paper is to introduce the current efforts to-
      ward the release and exploitation of The Open University’s (OU) Linked
      Open Data (LOD). We introduce the work that has been done within
      the LUCERO project in order to select, extract and structure subsets
      of information contained within the OU data sources and migrate and
      expose this information as part of the LOD cloud. To show the potential
      of such exposure we also introduce three different prototypes that exploit
      this new educational resource: (1) the OU expert search system, a tool
      focused on finding the best experts for a certain topic within the OU
      staff; (2) the Buddy Study system, a tool that relies on Facebook infor-
      mation to identify common interest among friends and recommend po-
      tential courses within the OU that ‘buddies’ can study together, and; (3)
      Linked OpenLearn, an application that enables exploring linked courses,
      Podcasts and tags to OpenLearn units. Its aim is to enhance the brows-
      ing experience for students, by detecting relevant educational resources
      on the fly while reading an OpenLearn unit.


      Keywords: Linked Open Data, education, expert search, social net-
      works.


1   Introduction

The explosion of the Linked Open Data (LOD) movement in the last few years
has produced a large number of interconnected datasets containing information
about a large variety of topics, including geography, music and research publi-
cations among others. [2]
    The movement is receiving worldwide support from public and private sectors
like the UK1 and US2 governments, international media outlets, such as the
BBC [5] or the New York Times [1], and companies with a social base like
Facebook.3 Such organisations are supporting the movement either by releasing
1
  http://data.gov.uk
2
  http://www.data.gov/semantic/index
3
  http://developers.facebook.com/docs/opengraph
large datasets of information or by generating applications that exploit it to
connect data across different locations.
    Despite its relevance and the support received in the last few years, very few
pieces of work have either released or exploited LOD in the context of education.
One of these few examples is the DBLP Bibliography Server Berlin,4 which
provides bibliographic information about scientific papers. However, education is
principally one of the main sectors where the application of the LOD technologies
can provoke a higher impact.
    When performing learning and investigation tasks, students and academics
have to go through the tedious and laborious task of browsing different infor-
mation resources, analysing them, extracting their key concepts and mentally
linking data across resources to generate their own conceptual schema about the
topic. Educational resources are generally duplicated and dispersed among dif-
ferent systems and databases, and the key concepts within these resources as well
as their inter and intra connections are not explicitly shown to users. We believe
that the application of LOD technologies within and across educational insti-
tutions can explicitly generate the necessary structure and connections among
educational resources, providing better support to users in their learning and
investigation tasks.
    In this context, the paper presents the work that has been done within The
Open University (OU) towards the release and exploitation of several educational
and institutional resources as part of the LOD cloud. First, we introduce the
work that has been done within the LUCERO project to select, extract and
structure subsets of OU information as LOD. Second, we present the potential
of this data exposure and interlinking by presenting three different prototypes:
(1) the OU expert search system, a tool focused on finding the best experts for a
certain topic within the OU staff; (2) the Buddy Study system, a tool focused on
exploiting Facebook information to identify common interests among friends and
recommend potential courses within the OU that ‘buddies’ can study together,
and; (3) Linked Open Learn, an application that enables exploring linked courses,
Podcasts and tags to OpenLearn units.
    The rest of the paper is organised as follows: Section 2 presents the state of the
art in the areas of LOD within the education context. Section 3 presents the work
that has been done within the LUCERO project to expose OU data as part of
the LOD cloud. Sections 4, 5 and 6 present example prototype applications that
consume the OU’s LOD for Expert Search, Buddy Study and Linked OpenLearn
respectively. Section 7 describes the conclusions that we have drawn from this
work, and section 8 presents our plans for future work.


2     Related Work

While LOD is being embraced in various sectors as mentioned in the previous sec-
tion, we are currently witnessing a substantial increase in universities adopting
4
    http://www4.wiwiss.fu-berlin.de/dblp/
the Linked Data initiative. For example, the University of Sheffield’s Depart-
ment of Computer Science5 provides a Linked Data service describing research
groups, staff and publications, all semantically linked together[6]. Similarly the
University of Southampton has recently announced the release of their LOD por-
tal (http://data.southampton.ac.uk), where more data will become available in
the near future. Furthermore, the University of Manchester’s library catalogue
records can now be accessed in RDF format6 . In addition, other universities are
currently working on transforming and linking their data: University of Bris-
tol,7 Edinburgh (e.g., the university’s buildings information is now generated
in LOD8 ), and Oxford9 . Furthermore the University of Muenster announced
a funded project, LODUM, the aim of which is to release the university’s re-
search information as Linked Data. This includes information related to people,
projects, publications, prizes and patents.10
    With the increase of the adoption of LOD publishing standards, the exchange
of data will be much easier, not only within one university, but also across the
LOD ready ones. This enables, for example, the comparison of specific qualifi-
cations offered by different universities in terms of courses required, pricing and
availability.


3     The Open University Linked Open Data

The Open University is the first UK University to expose and publish its orga-
nizational information in LOD.11 This is accomplished as part of the LUCERO
project (Linking University Content for Education and Research Online)12 , where
the data extraction, transformation and maintenance are performed. This en-
ables having multiple hybrid datasets accessible in an open way through the
online access point: http://data.open.ac.uk.
    The main purpose of releasing all this data as part of the LOD cloud is that
members of the public, students, researchers and organisations will be able to
easily search, extract and, more importantly, reuse the OU’s information and
data.


3.1   Creating the OU LOD

Detailed information about the process of LOD generation within the OU is
available at the LUCERO project website.12 We briefly discuss in this section
5
   http://data.dcs.shef.ac.uk
6
   http://prism.talis.com/manchester-ac
 7
   https://mmb.ilrt.bris.ac.uk/display/ldw2011/University+of+Bristol+data
 8
   http://ldfocus.blogs.edina.ac.uk/2011/03/03/university-buildings-as-linked-data-
   with-scraperwiki
 9
   http://data.ox.ac.uk
10
   http://www.lodum.de
11
   http://www3.open.ac.uk/media/fullstory.aspx?id=20073
12
   http://lucero-project.info
the steps involved in the creation of Linked Data. To achieve that, the main
requirement is to have a set of tools that generate RDF data from existing data
sources, load such RDF into a triple store, and make it accessible through a web
access point.
    Given the fact that the OU’s data repositories are scattered across many
departments, using different platforms, and subject to constant update, a well-
defined overflow needs to be put in place. The initial workflow is depicted in
Figure 1, and is designed to be efficient in terms of time, flexibility and reusabil-
ity. The workflow is component based, and the datasets characteristics played
a major role in the implementation and setup of the components. For exam-
ple, when the data sources are available in XML format, the XML updater will
handle the process of identifying new XML entities and pass them to the RDF
extractor, where the RDF data is generated, and ready to be added to (or re-
moved from) the triple store. Finally the data is exposed to the web, and can be
queried through a SPARQL endpoint.13
    The scheduler component takes care of initiating the extraction/update pro-
cess at specific time intervals. This update process is responsible for checking
what was added, modified, or removed from the dataset, and accordingly ap-
plies to the triple store the appropriate action. Having such a process in place
is important in the OU scenario where the data sources are continuously chang-
ing. Another point worth mentioning is the linking process that links entities
coming from different OU datasets (e.g., courses mentioned in Podcast data and
library records), in addition to linking external entities (e.g., course offerings in
a GeoNames defined location14 ). To achieve interlinking OU entities, indepen-
dently from which dataset the extraction is done, we rely on an Entity Named
System, which generates a unique URI (e.g., based on a course code) depend-
ing on the specified entity (this idea was inspired from the Okkam project15 ) .
Such unique URIs enable a seamless integration and extraction of linked entities
within common objects that exist in the triple store and beyond, one of the core
Linked Data requirements [3].


3.2   The Data


Data about the OU courses, Podcasts and academic publications is already
available to be queried and explored, and the team is now working to bring
together educational and research content from the university’s campus infor-
mation, OpenLearn (already available for testing purposes) and library mate-
rial. More concretely, data.open.ac.uk offers a simple browsing mechanism, and
a SPARQL endpoint to access the following data:

13
   http://data.open.ac.uk/query
14
   http://www.geonames.org
15
   http://www.okkam.org
                         Fig. 1. The LUCERO Workflow


 – The Open Research Online (ORO) system16 , which contains information
   about academic publications of OU research. For that, the Bibliographic
   Ontology (bibo)17 is mainly used to model the data.
 – The OU Podcasts,18 which contain Podcast material related to courses and
   research interests. A variety of ontologies are used to model this data, in-
   cluding the W3C Media Ontology,19 in addition to a specialised SKOS20
   representation of the iTunesU topic categories.
 – A subset of the courses from the Study at the OU website,21 which provides
   courses information and registration details for students. We model this data
   by relying on the Courseware,22 AIISO23 and GoodRelations ontologies [4],
   in addition to extensions that reflect OU specific information (e.g., course
   assessment types).
   Furthermore, there are other sources of data that are currently being pro-
   cessed. This includes for example the OU list of provided publications, the

16
   http://oro.open.ac.uk
17
   http://bibliontology.com/specification
18
   http://podcast.open.ac.uk
19
   http://www.w3.org/TR/mediaont-10
20
   http://www.w3.org/2004/02/skos
21
   http://www3.open.ac.uk/study
22
   http://courseware.rkbexplorer.com/ontologies/courseware
23
   http://vocab.org/aiiso/schema
      library catalogue, and public information about locations on the OU campus
      (e.g., buildings) and university staff.


4      The OU Expert Search
Expert search can be defined as the task of identifying people who have relevant
expertise in a topic of interest. This task is key for every enterprise, but especially
for universities, where interdisciplinary collaborations among research areas is
considered a high success factor. Typical user scenarios in which expert search is
needed within the university context include: a) finding colleagues from whom
to learn, or with whom to discuss ideas about a particular subject; b) assembling
a consortium with the necessary range of skills for a project proposal, and; c)
finding the most adequate reviewers to establish a program committee.
    As discussed by Yimam-Seid and Kobsa [7], developing and manually updat-
ing an expert system database is time consuming and hard to maintain. How-
ever, valuable information can be identified from documents generated within
an organisation [8]. Automating expert finding from such documents provides
an efficient and sustainable approach to expertise discovery.
    OU researchers, students and lecturers constantly produce a plethora of doc-
uments, including for example conference articles, journal papers, thesis, books,
reports and project proposals. As part of the LUCERO project, these docu-
ments have been pre-processed and made accessible as LOD. The purpose of
this application is therefore to exploit such information so that OU students
and researchers can find the most appropriate experts starting from a topic of
interest.24

4.1     Consumed Data
This application is based on two main sources of information: (a) LOD from the
Open Research Online system, and (b) additional information extracted from
the OU staff directory. The first information source is exploited in order to
extract the most suitable experts about a certain topic. The second information
source complements the previous recommended set of experts by providing their
corresponding contact information within the OU. Note that sometimes, ex-OU
members and external collaborators or OU researchers may appear in the ranking
of recommended experts. However, for those individuals, no contact information
is provided, indicating that those experts are not part of the OU staff.
    As previously mentioned, the information provided by Open Research On-
line contains data that describe publications originating from OU researchers.
In particular, among the properties provided for each publication, this system
exploits the following ones: a) the title, b) the abstract, c) the date, d) the au-
thors and, e) the type of publication, i.e., conference paper, book, thesis, journal
paper, etc.
24
     The OU Expert Search is accessible            to   OU    staff   at:   http://kmi-
     web15.open.ac.uk:8080/ExpertSearchClient
    To exploit this information the system performs two main steps. Firstly when
the system receives the user’s query, i.e., the area of expertise where a set of
experts need to be found (e.g., “semantic search”), the system uses the title and
abstract of the publications to find the top-n documents related to that area of
expertise. At the moment n has been empirically set to 10.
    Secondly, once the top-n documents have been selected, the authors of these
documents are extracted and ranked according to five different criteria: (a) orig-
inal score of their publications, (b) number of publications, (c) type of publica-
tions, (d) date of the publications and, (e) other authors of the publication.
    The initial score of the publications is obtained by matching the user’s key-
word query against the title and the abstract of the OU publications. Publica-
tions that provide a better match within their title and abstract against the key-
words of the query are ranked higher. This matching is performed and computed
using the Lucene25 text search engine. Regarding the number of publications,
authors with a higher number of publications (among the top-n previously re-
trieved) are ranked higher. Regarding the type of publication, theses are ranked
first, then books, then journal papers, and finally conference articles. The ratio-
nality behind this is that an author writing a thesis or a book holds a higher level
of expertise than an author who has only written conference papers. Regarding
the date of the publication, we consider the ‘freshness’ of the publications and
continuity of an author’s publications within the same area. More recent publi-
cations are ranked higher than older ones, and authors publishing in consecutive
years about a certain topic are also ranked higher than authors that have spo-
radic publications about the topic. Regarding other authors, experts sharing a
publication with fewer colleagues are ranked higher. The rationality behind this
is that the total knowledge of a publication should be divided among the exper-
tise brought into it, i.e., the number of authors. Additionally we also consider
the order of authors in the publication. Main authors are considered to have a
higher level of expertise and are therefore ranked higher.
    To perform the first step (i.e., retrieving the top-n documents related to
the user’s query) we could have used the SPARQL endpoint and, at run-time,
searched for those keywords within the title and abstract properties of the pub-
lications. However, to speed the search process up, and to enhance the query-
document matching process, we have decided to pre-process and index the title
and abstract information of the publications using the popular Lucene search
engine. In this way, the fuzzy and spelling check query processing and rank-
ing capabilities of the Lucene search engine are exploited to optimise the initial
document search process.
    To perform the second step, once the top-n documents have been selected,
the rest of the properties of the document (authors, type, and date) are obtained
at run-time using the SPARQL endpoint.
    Finally, once the set of authors have been ranked, we look for them in the OU
staff directory (using the information about their first name and last name). If the
author is included in the directory, the system provides related information about
25
     http://lucene.apache.org/java/docs/index.html
the job title, department within the OU, e-mail address and phone number.
By exploiting the OU staff directory we are able to identify which experts are
members of the OU and which of them are external collaborators, or old members
not further working for the institution.
    Without the structure and conceptual information provided by the OU LOD,
the implementation of the previously described ranking criteria, as well as the
interlinking of data with the OU staff directory, would have required a huge
data pre-processing effort. The OU LOD provides the information with a fine-
grained structure that facilitates the design of ranking criteria based on multiple
concepts, as well as the interlinking of information with other repositories.


4.2     System Implementation

The system is based on lightweight client server architecture. The back end
(or server side) is implemented as a Java Servlet, and accesses the OU LOD
information by means of HTTP requests to the SPARQL endpoint. Some of
the properties provided by the LOD information (more particularity the title
and the abstract of the publications) are periodically indexed using Lucene to
speed-up and enhance the search process by means of the exploitation of its
fuzzy and spell checker query processing, and ranking capabilities. The rest of
the properties (authors, date, and type of publications) are accessed at run time,
once the top-n publications have been selected.
    The front end is a thin client implemented as a web application using only
HTML, CSS and Javascript (jQuery).26 The client doesn’t handle any processing
of the data, it only takes care of the visualisation of the search results and the
search input. It communicates with the back-end by means of an HTTP request
that passes as a parameter the user’s query and retrieves the ranking of authors
and their corresponding associated information by means of a JSON object.


4.3     Example and Screenshots

In this section, we provide an example of how to use the OU expert search
system. As shown in Figure 2, the system receives as a keyword query input
“semantic search”, with the topic for which the user aims to find an expert. As
a result, the system provides a list of authors (“Enrico Motta”, “Vanessa Lopez ”,
etc), who are considered to be the top OU experts in the topic. For each expert,
if available, the system provides the contact details (department, e-mail, phone
extension) and the top publications about the topic. For each publication, the
system shows its title, the type of document, and its date. If the user passes the
cursor on the top of the title of the publication, the summary is also visualised
(see the example in Figure 2 for the publication “Reflections of five years of
evaluating semantic search systems”). In addition the title of the publication
also constitutes a link to its information in the open.ac.uk domain.
26
     http://www.jquery.com
                      Fig. 2. The OU Expert Search system


5      Buddy Study

The Open University is a well-established institution in the United Kingdom, of-
fering distance-learning courses covering a plethora of subject areas. A key factor
in enabling learning and understanding of course materials is support for stu-
dents, provided in the form of an on-hand tutor for each studied module, where
interactions with the tutor are facilitated via the Web and/or email exchanges.
An alternative method of support could be provided through peers, in a similar
manner to a classroom environment, where working together and explanations
of problems from disparate viewpoints enhances understanding.
    Based on this thesis, Buddy Study27 combines the popular social networking
platform Facebook with the OU Linked Data service, the goal being to suggest
learning partners – so called ‘Study Buddies’ – from a person’s social network
on the site together with possible courses that could be pursued together.


5.1     Consumed Data

Buddy Study combines information extracted from Facebook with Linked Data
offered by The Open University, where the former contains ‘wall posts’ – mes-
sages posted publicly on a person’s profile page – and comments on such wall
posts, while the latter contains structured, machine-readable information de-
scribing courses offered by The Open University.
27
     http://www.matthew-rowe.com/BuddyStudy
    Combining the two information sources, in the form of a ‘mashup’, is per-
formed using the following approach. First the user logs into the application –
using Facebook Connect – and grants access to their information. The appli-
cation then extracts the most recent n wall posts and the comments on those
posts – n can be varied, thereby affecting the later recommendations. Given the
extracted content, cleaning is then performed by removing all the stop words,
thus reducing the wall posts and comments to their basic terms.
    A bag of words model is compiled for each person in the user’s social network
as follows: for each wall post or comment posted by a given person all the terms
are placed in the bag, maintaining duplicates and therefore frequencies. This
model maintains information of the association between a user and his/her social
network members in the form of shared terms. A bag of words model is then
compiled for each OU course in a similar manner: first we query the SPARQL
endpoint of the OU’s Linked Data asking for the title and description for each
course. For the returned information, stop words are removed and the title and
description – containing the remaining terms – are then used to build the bag
of words model for the course.
    The goal of Buddy Study is to recommend study partners to support course
learning. Therefore we compare the bag of words model of each person with
the bag of words model of each course, recording the frequency and terms that
overlap. The user’s social network members are then ranked based on the number
of overlapping terms – the intuition being that the greater the number of common
terms with courses, the greater the likelihood of a course being correlated with
the user. Variance of n will therefore affect this ranking, given that the inclusion
of a greater number of posts will increase the number of possible study partners,
while smaller values for n will yield more recently interacted with social network
members. Variance of this parameter is provided in the application.
    The application is not finished yet; we still need to recommend possible
courses that could be studied with each possible study buddy. This is performed
in a similar fashion, by comparing the bag of words model of the social network
member with the model of each course, counting the frequencies of overlapping
terms for each course, and then ranking accordingly. Due to space restrictions,
and to avoid information overload, we only show the top-10 courses. For each
social network user, and for each course that is suggested, Buddy Study displays
the common terms, thereby providing the reasons for the course suggestion.
    If for a moment we assume a scenario where Linked Data is not provided by
the OU, then the function of Buddy Study could, in theory continue, by con-
suming information provided in an alternative form. However, this application
forms the prototype upon which for future work – explained in greater detail
within the conclusions of this paper – is to be based. Such advancements will
utilise concepts for study partner recommendation rather than merely terms,
the reasoning behind this extension is to alleviate the noisy form that terms
take. By leveraging concepts from collections of terms, recommendations would
be generated that are more accurate and better suited to the user in question.
Without Linked Data, this is not possible.
5.2   System Implementation

The application is live and available online at the previously cited URL. It is built
using PHP, and uses the Facebook PHP Software Development Kit (SDK)28 .
Authentication is provided via Facebook Connect,29 enabling access to Facebook
information via the Graph API. The ARC2 framework30 is implemented to query
the remote SPARQL endpoint containing The Open University’s Linked Data,
and parse the returned information accordingly.


5.3   Example and Screenshots

To ground the use of Buddy Study, Figure 3 shows an example screenshot from
the application when recommending study partners for Matthew Rowe – one
of the authors of this paper. At this rank position in the results, the possible
study mate is shown together with the courses that could be studied together.
The courses are hyperlinked to their resource within the OU Linked Open Data
service, and in the proceeding brackets the terms that correlate with the courses
are shown. In this instance the top-ranked course is identified by the common
terms ‘API’ and ‘Info’.


        Fig. 3. Buddy Study showing the 7th ranked social network member


6     Linked OpenLearn

The Open University offers a set of free learning material through the OpenLearn
website.31 Such material cover various topics ranging from Arts32 , to Sciences
and Engineering.33 In addition to that, the OU has other learning resources pub-
lished in the form of Podcasts, along with courses offered at specific presentations
during the year. While all these resources are accessible online, connections are
28
   https://github.com/facebook/php-sdk
29
   http://developers.facebook.com/docs/authentication
30
   http://arc.semsol.org
31
   http://openlearn.open.ac.uk
32
   OpenLearn unit example in Arts: http://data.open.ac.uk/page/openlearn/a216 1
33
   A list of units and topics is available at: http://openlearn.open.ac.uk/course
not always explicitly available, making it hard for students to easily exploit all
the available resources. For example, while there exists a link between specific
Podcasts and related courses, such links do not exist between OpenLearn units
and Podcasts. This leaves it to the user to infer and find the appropriate and
relevant material to the topic of interest.
    Linked OpenLearn34 is an application that enables exploring linked courses,
Podcasts and tags to OpenLearn units. It aims to facilitate the browsing ex-
perience for students, who can identify on the spot relevant material without
leaving the OpenLearn page. With this in place, students are able, for example,
to easily find a linked Podcast, and play it directly without having to go through
the Podcast website.


6.1     Consumed Data


Linked OpenLearn relies on The Open University’s Linked Data to achieve what
was previously considered very costly to do. Within large organizations, it’s very
common to have systems developed by different departments, creating a set of
disconnected data silos. This was the case of Podcasts and OpenLearn units at
the OU. While courses were initially linked to both Podcasts and OpenLearn in
their original repositories, it was practically hard to generate the links between
Podcasts and OpenLearn material. However, with the deployment of Linked
Data, such links are made possible through the use of coherent and common
URIs of represented entities.
    To achieve our goals of generating relevant learning material, we make use
of the courses, Podcasts, and OpenLearn datasets in data.open.ac.uk. As a first
step, while the user is browsing an OpenLearn unit, the system identifies the
unique reference number of the unit from the URL. Then this unique num-
ber is used in the query passed to the OU Linked Data SPARQL endpoint
(http://data.open.ac.uk/query), to generate the list of related courses including
their titles and links to the study at the OU pages.
    In the second step, another query is sent to retrieve the list of Podcasts related
to the courses fetched above. At this level we get the Podcasts’ titles, as well
as their corresponding downloadable media material (e.g., video or audio files),
which enable users to play the content directly within the application. Finally
the list of related tags are fetched, along with an embedded query that generates
the set of related OpenLearn units, displayed in a separate window. The user at
this level has the option to explore a new unit, and the corresponding related
entities will be updated accordingly. The application is still a prototype, and
there is surely room for further data to extract. For example, once the library
catalogue is made available, a much richer interface can be explored by students
with related books, recordings, computer files, etc.

34
     http://fouad.zablith.org/apps/openlearnlinkeddata
6.2     System Implementation

We implemented the Linked OpenLearn application in PHP, and used the ARC2
library to query the OU Linked Data endpoint. To visualise the data on top of
the web page, we relied on the jQuery User Interface library,35 and used the
dialog windows for displaying the parsed SPARQL results. The application is
operational at present, and is launched through a Javascript bookmarklet, which
detects the OpenLearn unit that the user is currently browsing, and opens it in
a new iFrame, along with the linked entities visualised in the jQuery boxes.


6.3     Example and Screenshot

To install the application, the user has to drag the applications’ bookmarklet36
to the browser’s toolbar. Then, whenever viewing an OpenLearn unit, the user
clicks on the bookmarklet to have the related entities displayed on top of the unit
page. Figure 4 illustrates one arts related OpenLearn unit, with the connected
entities displayed on the right, and a running Podcast selected from the “Linked
Podcasts” window. The user has the option to click on the related course to
go directly to the course described in the Study at the OU webpage, or click
on linked tags to see the list of other related OpenLearn units, which can be
browsed within the same window.


                          Fig. 4. Linked OpenLearn Screenshot


35
     http://www.jqueryui.com
36
     The bookmarklet is available at: http://fouad.zablith.org/apps/openlearnlinkeddata,
     and has been tested in Firefox, Safari and Google Chrome
7   Conclusions
In this section we report on our experiences when generating and exploiting LOD
within the context of an educational institution. Regarding our experience on
transforming information distributed in several OU repositories and exposing it
as LOD, the process complexity was mainly dependent on the datasets in terms
of type, structure and cleanliness. Initially, before any data transformation can
be done, it was required to decide on the vocabulary to use. This is where the
type of data to model plays a major role. With the goal to reuse, as much as
possible, already existing ontologies, it was challenging to find the adequate ones
for all our data. While some vocabularies are already available, for example to
represent courses, it required more effort to model OU specific terminologies
(e.g., at the qualifications level). To assure maximum interoperability, we chose
to use multiple terminologies (when available) to represent the same entities.
For example, courses are represented as modules from the AIISO ontology, and
at the same time as courses from the Courseware ontology. Other factors that
affected the transformation of the data are the structure and cleanliness of the
data sources. During the transformation process, we faced many cases where du-
plication, and information not abiding to the imposed data structure, hampered
the transformation stage. However, this initiated the need to generate the data
following well-defined patterns and standards, in order to get easily processable
data to add to the LOD.
    Regarding our experiences exploiting the data, we have identified three main
advantages of relying on the LOD platform within the context of education.
Firstly the exposure of all these material as free Web resources have open oppor-
tunities for the development of novel and interesting applications like the three
presented in this paper. The second main advantage is the structure provided by
the data. This is apparent in the OU Expert Search system, where the different
properties of articles are exploited to generate different ranking criteria, which
when combined, provide much stronger support when finding the appropriate
expertise. Finally, the links generated across the different educational resources
have provided a new dimension to the way users can access, browse and use the
provided educational resources. A clear example of this is the exploitation of
LOD technology within the OpenLearn system, where OpenLearn units are now
linked to courses and Podcasts, allowing students to easily find in a single site,
all the information they are looking for.
    We believe that universities need to evolve the way they expose knowledge,
share content and engage with learners. We see LOD as an exciting opportunity
that can be exploited within the education community, especially by interlinking
people and educational resources within and across institutions. This interlink-
ing of information will facilitate the learning and investigation process of stu-
dents and research staff, enhancing the global productivity and satisfaction of
the academic community. We hope that, in the near future, more researchers
and developers will embrace LOD approach, by creating new applications and
learning from previous experiences to expose more and more educational data
in a way that is directly linkable and reusable.
8      Future Work
The application of Linked Data within the OU has opened multiple research
paths. Regarding the production of Linked Data, in addition to transforming
the library records to LOD, the LUCERO team is currently working on con-
necting the OU’s Reading Experience Database (RED)37 to the Web of Data.
Such database aims to provide access and information about reading experiences
around the world. It helps the readership for books issued in new editions for
new audiences in different countries to be tracked. Its publication as LOD is an
interesting example about how the integration of Linked Data technology can
open new investigation paths to different research areas, in this case humanities.
    Regarding the consumption of LOD, we envision, on the one hand, to en-
hance the three previously mentioned applications and, on the other hand to
generate new applications as soon as more information is available and intercon-
nected. As example of the former, for the Buddy Study application we plan to
extend the current approach for identifying common terms between social net-
work members and courses to instead utilise common concepts. At present the
use of online messages results in the inclusion of abbreviated and slang terms,
resulting in recommendations that are generated from noise. By instead using
concepts, we believe that the suggested courses would be more accurate and
suitable for studying. As an example of the latter, we aim to generate a search
application over the RED database, able to display search results on an interac-
tive map and link them not just to relevant records within the RED database,
but also with relevant objects of the LOD cloud.


References
1. C. Bizer. The emerging web of linked data. IEEE Int. Systems, pages 87–92, 2009.
2. C. Bizer, T. Heath, and T. Berners-Lee. Linked data-the story so far. Int. J.
   Semantic Web Inf. Syst., 5(3):1–22, 2009.
3. T. Heath and C. Bizer. Linked Data: Evolving the Web into a Global Data Space.
   2011.
4. M. Hepp. GoodRelations: an ontology for describing products and services offers
   on the web. Knowledge Engineering: Practice and Patterns, pages 329–346, 2008.
5. G. Kobilarov, T. Scott, Y. Raimond, S. Oliver, C. Sizemore, M. Smethurst, C. Bizer,
   and R. Lee. Media meets semantic webhow the bbc uses dbpedia and linked data
   to make connections. pages 723–737, 2009.
6. M. Rowe. Data.dcs: Converting legacy data into linked data. In Linked Data on the
   Web Workshop, WWW2010, 2010.
7. D. Yimam-Seid and A. Kobsa. Expert-finding systems for organizations: Problem
   and domain analysis and the DEMOIR approach. Journal of Organizational Com-
   puting and Electronic Commerce, 13(1):1–24, 2003.
8. J. Zhu, X. Huang, D. Song, and S. Rüger. Integrating multiple document features in
   language models for expert finding. Knowledge and Information Systems, 23(1):29–
   54, 2010.

37
     http://www.open.ac.uk/Arts/reading
Using Linked Data to Reduce Learning Latency
             for e-Book Readers

              Julien Robinson, Johann Stan, and Myriam Ribière

               Alcatel-Lucent Bell Labs France, 91620 Nozay, France,
                       Julien.Robinson@alcatel-lucent.com


      Abstract. Web 2.0 technologies provide an opportunity to transform
      learning into a social experience: social learning can directly benefit from
      user-generated content analysis. The e-book device is an object central to
      the learning process; it becomes an ideal medium to promote innovative
      learning tools. In this paper, we describe how we leverage user generated
      content in e-books, generally in the form of unstructured annotations,
      to reduce the latency between professor knowledge, book content and
      student assimilation.

      Keywords: Social Learning, Semantic Web, Linked Data, e-Book


1   Introduction

The set of people learning with a given e-book form a de-facto community, which
is even more relevant if they are reading at the same time, with the same purpose
or during the same activity, e.g. a class. Currently, very few platforms allow social
interactions between such people that are dedicated to their reading activity;
readers can connect but at a very high level and generally outside the platform
(e.g. forums in LibraryThing1 , chatting in Bookglutton2 or more general social
cataloging features GoodReads3 and Shelfari4 ).
    The main problem with these platforms is that interactions cannot be directly
linked to the context of reading and thus, do not allow fine-grained interactions
about a specific part of the book. Bookglutton goes a step further by letting
users chat while reading, but does not exploit the contents and context of an-
notations in the e-book. This issue was already explored [1] [2], but no solution
was proposed based on the semantic analysis of the annotations.
    An important observation for our work is the fact that in most cases, an-
notations in e-books are in the form of short, unstructured textual messages,
associated to images, paragraphs or chapters. These can include transcriptions,
reformulations, additional comments, questions or explanations (such as student
1
  http://www.librarything.com/ - visited September 2010
2
  http://www.bookglutton.com/ - visited January 2011
3
  http://www.goodreads.com/ - visited October 2010
4
  http://www.shelfari.com/ - visited March 2011
2

annotations in BookGlutton5 ). Clearly, services that leverage these annotations
can increase the social dimension of reading. They can help readers determine
a common environment to socially learn by seeing others’ annotations or to
generate interactions between students during a same activity.
   In this paper we present our ongoing work on services that leverage semantic
technologies and especially Linked Data for providing a social dimension for
reading and implicitly, learning.


2     Management of Shared Annotations using Linked Data

Linked Data is an evolving component of the Semantic Web. It can be defined as
a community effort to extract or publish data from various sources, interlink it
and store it in a semantic web format, such as RDF. The content of these online
linked datasets is diverse. It consists mostly of data about people, companies,
books, geographic locations, films, music, television and radio programs, infor-
mation about medicine (genes, proteins, clinical trials) and online communities
(statistical data). The most used online dataset is certainly DBPedia[3].
    An important source of social knowledge is the set of annotations produced
by the user. In our framework, our primary hypothesis is to consider these anno-
tations on a conceptual level. More concretely, this means that we associate each
annotation to one or several concepts in a Linked Data source (e.g. DBPedia).
There are several reasons to make this association: (i) provide a rich approxima-
tion of their meaning, (ii) the capability to retrieve related concepts and category
terms, operation that we call semantic expansion6 and (iii) the fact that anno-
tations are generally short and that contextual cues are not always available.
Therefore, Linked Data sources can enrich the description of such annotations
with additional metadata and related concepts.
    The association of annotations to concepts in Linked Data is performed by a
disambiguation process that leverages the context of the annotation to match it
with the most relevant concept in the knowledge base. Each annotation provided
by the user is associated to a context, which is constructed from different sources,
such as previous annotations shared by the user in the e-book and the set of
annotations shared by others that are related to the given paragraph. Parallel
to this, a set of candidate concepts are retrieved from DBPedia, that contain
the given annotation in their label. The candidate concepts are ranked using
cosine similarity between the vector constructed from their abstracts and the
contextual cue vector. Once this disambiguation has been performed, the second
use of Linked Data is the expansion of the concept by exploring both its semantic
neighborhood and hierarchical links.

5
    e.g. http://www.bookglutton.com/glutton/melissthangx0x0/14459
6
    As an example, here is the semantic expansion of the annotation “Facebook”: [Global
    Internet Community, Student Culture, Social Information Processing, Social Net-
    working Service, Social Media, Web 2.0, Internet Technologies]
                                                                                   3

3   Easy Social Navigation with Heatmaps
Key to the exploitation of Linked Data results in social learning is the way the
user will perceive the analysis results and whether it will be useful or not. In [1],
Jae K. Kim et al. describe how they let students place symbolic evaluations as
annotations, and then graphically represent the synthesis of these annotations
as colors in the e-book, thus enabling what is called Social Navigation. We
follow the same investigation track, but instead of focusing on design choices
and symbolic evaluations, we analyze free text annotations using Semantic Web
techniques.
     The user, a learning reader (our target is post-graduate students), opens
an e-book and states their interest using freely chosen keywords; the keywords
express their goal for the reading session. The keyword is semantically analyzed
and a global similarity computation is run on the annotations of the book. Each
annotation is attributed a heat value based on its relevance to the user’s goal in
reading the book. The appearance of the book is modified to display in a colored
way the zones where the annotations are the most heated.
     Figure 1(a) presents an example of the heatmap feature. This was produced
by our prototype of the sBook [4] using data gathered from annotations from
BookGlutton (229 annotations by 39 different users on the e-book ”Franken-
stein“ by Mary Shelley).
     The choice was made to use the paragraph as the finest precision of annota-
tion heat. The heat of a paragraph is the total sum of the heat of each annotation
on the paragraph. Note that the heatmap is a navigation indication for the user,
it is not meant to be the final step for a learning reader; it is completed by the
display of authors of annotations (on the right-hand panel) and the use of what
we call annotation boards: these are separate windows that browse through all
the annotations of a paragraph and the comments inspired by these annota-
tions. The annotation board is where the user may find valuable information
and participate in discussions.
     The choice of the algorithm for heating an annotation is the focus of our
current research. Figure 1(b) presents the administration interface that enables
to choose different implementations for the user to experiment with. Annotations
are heated based on their proximity to the resulting vector of words. We currently
implement three expansion methods:
 – no expansion, we simply search for the keyword
 – a dictionary-based heatmap, that may take into account synonyms, antonyms
   and derived words (e.g. adjectives for a noun)
 – a Linked-Data-based heatmap, that takes into account the expansion of the
   search keyword as described in the present paper.
    Our first experiments show that named entities (e.g. “Paris”) are best ex-
panded using Linked Data, while literary theory terms (e.g. ”poetry“, ”charac-
terization“) are best expanded using dictionary techniques. This is why we keep
both options open and also separate, so as to validate their usefulness with user
experimentation.
4


                  (a) sBook with Heatmap activated (overview)


                   (b) Administration Heatmap Configuration

                          Fig. 1. Heatmap screenshots


    In a first step the annotations are processed as text; currently being im-
plemented, we first process each annotation by associating it to a cluster of
tags, obtained by semantic evaluation techniques. The similarity measure will
be computed between the cluster of tags of the annotation and the vector of
words expanded from the user keyword.
    It may be noted that, while we explore keyword-based heatmaps, we also in-
tend to explore other options: time-based heatmap, where the heat of an annota-
tion is related to its creation date; location-based heatmap, taking into account
the location of the annotation’s author; or more generally context-based and
activity-based heatmaps (which present intersections). These will be used for
further user experimentations, first to compare their effectiveness with keyword-
based heatmaps, second because they are relevant to our general research on the
link between the digital and physical worlds.
                                                                                 5

4   User Interaction Profiles

Each user in the collaborative learning environment associated to the e-book is
represented by an interaction profile, constructed from their shared annotations
in the system. The objective of the interaction profile is to provide a digital
representation of the user’s expertise. Each concept in the profile can be manually
annotated by users. More specifically, we propose an interactive tool for users
to declare their level of expertise and to choose how much of this expertise to
share in different social spheres and situations (Figure 2).
    Thus, a user can for example say that they have high expertise in ”European
Union Member States“ but that they wish to share this expertise only with
classmates and teachers and that they are ready to interact on this topic only
when they are at the library or at school. With the help of DBPedia, we also
retrieve the hierarchical tree associated to the concept, which allows the user
to share the same concept with other social spheres, but with less granularity.
In this way, the user can share different levels of granularity about the same
knowledge with multiple social spheres (e.g. detailed knowledge for the class,
less detailed for friends).
    Linked Data allows to retrieve the hierarchical tree and semantic neighbor-
hood associated to a given concept. In the case of collaborative learning, we
leverage this feature by offering users a more granular way of sharing their inter-
action profile, constructed from the shared annotations. Such interaction profiles
make it possible for the system to select the right person to ask a question on
a specific topic. In order to overcome the problem of data sparsity in case of
new members in such learning communities, we are currently investigating the
possibility to connect this service to other Social Platforms and to build a user
interaction profile from annotations shared in them. In this way, the user will
be recommended people to interact with even if they did not share sufficient
annotations. More specifically, each time the user opens the e-book, they will
be presented with the most relevant readers to interact with, based on comple-
mentary expertise in specific topics of interest and declared sharing preferences
related to that topic. This provides students with a seamless way of linking to
co-learners and thus form spontaneous learning communities on a specific part
of a lecture.


5   Conclusion

In this paper we have presented our research, which focuses on integrating Linked
Data results into e-book navigation tools for students in order to enhance social
learning. We base our work on user annotations, we process these annotations
with Linked Data technologies, we use the produced synthetized results to modify
our e-book display; we hope to maximize reading and learning efficiency, to create
common environments and implicitly, communication opportunities. At the basis
of our innovation is the hypothesis that the huge amount of shared content in
different Social Platforms offers exciting opportunities to enrich the learning
6


                     Fig. 2. Interaction Profile Declaration Tool


experience and reduce the knowledge latency in the class, which we define as the
delay between the encoding of the knowledge of a professor on a specific subject
into information (e.g. a lecture or slides) and the assimilation of this information
as knowledge by students.

6     Acknowledgment
We would like to thank BookGlutton for providing open data that we were able
to exploit, and in particular Aaron Miller (CTO). The set of ideas about the
future of e-books we have presented in this paper are currently being addressed
in the scope of a collaborative project between Alcatel-Lucent, Abilene Christian
University and Cambridge University Press 7 .

References
1. Jae K. Kim, Rosta Farzan, and Peter Brusilovsky. Social navigation and annotation
   for electronic books. In BooksOnline ’08: Proceeding of the 2008 ACM workshop on
   Research advances in large digital book repositories, pages 25–28, New York, NY,
   USA, 2008. ACM.
2. Jill Freyne, Rosta Farzan, Peter Brusilovsky, Barry Smyth, and Maurice Coyle.
   Collecting community wisdom: integrating social search & social navigation. In IUI
   ’07: Proceedings of the 12th international conference on Intelligent user interfaces,
   pages 52–61, New York, NY, USA, 2007. ACM Press.
3. Jens Lehmann, Chris Bizer, Georgi Kobilarov, Sren Auer, Christian Becker, Richard
   Cyganiak, and Sebastian Hellmann. DBpedia - a crystallization point for the web
   of data. Journal of Web Semantics, 7(3):154–165, 2009.
4. Myriam Ribière, Jérôme Picault, and Sylvain Squedin. The sbook: towards so-
   cial and personalized learning experiences. In Proceedings of the third workshop
   on Research advances in large digital book repositories and complementary media,
   BooksOnline ’10, pages 3–8, New York, NY, USA, 2010. ACM.

7
    Cambridge University - http://www.cambridge.org/press/article.asp?artid=154961
                A Linked Knowledge Base for
                    Simulation Learning

                      Irene Celino and Daniele Dell’Aglio

        CEFRIEL – Politecnico of Milano, Via Fucini 2, 20133 Milano, Italy
              {irene.celino,daniele.dellaglio}@cefriel.it


      Abstract. Simulation Learning is a frequent practice to conduct near-
      real, immersive and engaging training sessions. AI Planning and Schedul-
      ing systems are used to automatically create and supervise learning ses-
      sions; to this end, they need to manage a large amount of knowledge
      about the simulated situation, the learning objectives, the participants’
      behaviour, etc.
      In this paper, we explain how Linked Data and Semantic Web tech-
      nologies can help the creation and management of knowledge bases for
      Simulation Learning. We also present our experience in building such a
      knowledge base in the context of Crisis Management Training.


1   Introduction
Traditional research on Semantic Web in e-learning [1, 2] are aimed at promoting
interoperability between training systems, thus usually the core investigation
targets are standards and schemata to describe learning objects [3, 4].
    Our research is focused on a different kind of e-learning system, i.e. Simu-
lation Training to improve soft skills [5]. In this context, not only it is needed
to describe learning objects, but also to fully plan simulation sessions; those
sessions should be interactive and engaging to challenge the trainees to improve
their skills. Simulation Learning systems generally re-create a near-real environ-
ment for training sessions, in which learners are subject to stimuli: they have
to learn how to deal with the simulated situation and how to react to it.. Such
simulations need to be effective and engaging, so that the learners do not simply
memorise notions about the specific matter, question or theme, but they actively
and permanently acquire skills, practice and knowledge.
    The scenario production is therefore the core and critical activity when build-
ing a Simulation Learning system. Knowledge technologies are needed to model
and manage all the required information, often generated and managed by dif-
ferent and independent sources: scenario descriptions, events and stimuli for the
trainees, storyboards for the learning sessions, multimedia assets, supporting
documents and guidelines, trainees description and behaviour/decisions, learn-
ing session monitoring, etc. Such a wealth of information makes the Simulation
Learning a knowledge-intensive context, which requires smart solutions.
    We decided to adopt Linked Data and Semantic Web technologies to address
the requirements of Simulation Learning. The knowledge diversity and scale
2       Irene Celino and Daniele Dell’Aglio

calls for a solution which provides interlinking between different datasets while
preserving possibly independent information sources; moreover, the knowledge
coherence and consistency must be assured to guarantee the significance, mean-
ingfulness and correctness of simulation scenarios and storyboards presented to
trainees.
     In this paper, we present our current effort in exploiting Linked Data and
Semantic Web technologies to build a Knowledge Base for a Simulation Learning
environment. We explain why we believe that the selected technologies not only
offer a suitable means to knowledge representation and management, but they
are specifically required to address the challenges of such an environment.
     Section 2 introduces the basic concepts of Simulation Learning systems and
a concrete scenario in Crisis Management Training; Section 3 details our ex-
ploration in the use of Linked Data and Semantic Web to build a Simulation
Learning Knowledge Base illustrating the gained benefits; Section 4 specifies
our modelling choices, while Section 5 suggests that such modelling could bene-
fit from provenance tracking; finally, Section 6 concludes the paper.


2     Simulation Learning

Learning should be relevant to people’s workplace and lives: learning content
should be truly understood, remembered and applied to actual practices. Only
in this way, by actively engaging participants in experiential training, learners
can apply their knowledge and learn the best practices [5]; more and more often,
indeed, it is not enough to read information and listen to a frontal lecture.
    In this section, we introduce the theme of Simulation Learning for Decision-
making, we draw a generic architecture of a system to support Simulation Learn-
ing and we describe a concrete scenario that we will use throughout the paper
to exemplify our approach.


2.1   Simulation for Decision-making

Training plays an important function in the preparation of professional prac-
titioners. Currently, there are two main modalities for such training: table-top
exercises and real-world simulations. Table-top exercises are low cost and can
be easily and frequently organised. However, they cannot create a believable at-
mosphere of stress and confusion, which is prevailing in real-life situations and
is crucial to the training of timely and effective decision making. On the other
hand, training through simulation exercises on the field can be very effective [6],
but it is considerably more expensive, it can require specialist equipment and it
can be difficult to organise.
    Simulation exercises require an Exercise Director (or trainer ) who plays a
key role in every form of exercise: the trainer has access to the whole exercise
programme, ensures that it proceeds according to a plan, often feeds information
to the “players” (the trainees) to let them make informed decisions in response
(verbally or by written messages). Sometimes information fed to the trainees
                          A Linked Knowledge Base for Simulation Learning            3

is timed in advance at pre-set intervals, regardless of the previous responses.
However, flexibility allows a trainer to use judgement and experience in timing
the inputs: his/her role should be aimed to facilitate rather than orchestrate the
exercise, thus intervention should be minimal and trainees should be given time
to recognise and correct problems. Nevertheless, usually it is up to the trainer
to decide, for example, how much advice to give to trainees.


2.2   Architecture of a Simulation Learning System

The architecture of a Simulation Learning System is depicted in Figure 1. In the
picture, we can identify the involved actors, which are the trainees – the learning
participants engaged in the simulation – and the trainer – who activates the
exercise and monitors the progress of actions during the training session.
    The figure also shows the four main modules of such an architecture, the first
three following the usual AI sense-think-act cycle:

 – Behaviour Sensing: this module is aimed to create and update a model of
   each trainee from sensors information (e.g. heart rate, blood pressure, res-
   piration); the model represents trainee’s future and actual behaviour and
   provides indications on how to personalise the training path.
 – Simulation Planning: this module is aimed to create and simulate a training
   scenario and its evolution, by combining the information in the behavioural
   model with knowledge about the learning scenarios; the output of this mod-
   ule is the actual simulation storyboard presented to the trainees.
 – Learning Delivery: this module is aimed to effectively represent the simu-
   lation storyboard in the learning environment, including the rendering of
   audio-video inputs or Non-Player Characters (NPC, cf. Section 4.3).
 – Simulation Learning Environment: this is the “place” where the training is
   conducted; the location can be a physical room or a virtual environment
   where the trainees interact and receive stimuli during a learning session.

The core of such system is therefore the Simulation Planning module, which
contains the basic engine for creating active exercises for classes of trainees. The
module is responsible for deciding which stimuli are sent to trainees and how
they should be coordinated to create a meaningful and effective lesson plan. In
broad terms, it is responsible for allocating over time the set of lesson stimuli in-
dexed according to differences in presentation media, emotional characterization,
personalization needs, etc.


2.3   Crisis Management Training Scenario

There is increasing recognition for the need to train non-technical skills like con-
trol and decision making for Crisis Management in national emergencies, high-
reliability industries, as well as in industrial workplaces [7, 8]. In the happening of
a catastrophic event, it is human behaviour – and often human behaviour alone
– that determines the speed and efficacy of the crisis management effects [9].
4        Irene Celino and Daniele Dell’Aglio


Fig. 1. High-level architecture of a Simulation Learning System (from the classical
sense-think-act cycle of AI)


     The Pandora project1 aims to provide a framework to bridge the gap between
table-top exercises and real-world simulation exercises for Crisis Management,
providing a near-real training environment at affordable cost. Its training sys-
tem captures the good practice tenets of experiential learning but with greater
efficiency and focuses on real, rather than abstract learning environments. The
effective use of integrated ICT reduces the high dependence upon the trainer
that is currently required to deliver exercises. Moreover, the Pandora framework
supports the measurement and performance assessment of Crisis Managers, the
key decision makers participating in a training exercise event as trainees.
     As such, Pandora is developing an enabling technology to simulate believable
dynamic elements of an entire disaster environment by emulating a crisis room
(the Simulation Learning Environment). In this context, we are developing a
Knowledge Base that makes use of Linked Data and Semantic Web technologies
to model and interlink the pieces of data needed in the training simulation
sessions. In the rest of the paper, we will use the Crisis Management scenario to
exemplify our approach.


1
    Cf. http://www.pandoraproject.eu/.
                        A Linked Knowledge Base for Simulation Learning         5

3     Our Simulation Learning Linked Knowledge Base

Within a Simulation Learning system, knowledge exchange plays a central role.
In this section we give some details about the Simulation Planning module, focus-
ing on the requirements, design and implementation principles of its Knowledge
Base. All the technical details are related to the choices made in the Pandora
framework.


3.1   Knowledge required to Plan a Simulation

To formalize the lesson plan, it is natural to choose a basic representation from
timeline-based planning [10]. A plan is represented as a set of events having a
temporal duration, distributed over a time horizon and indexed according to
distinct features which should be planned for. This set of events is organized
inside a data structure called Event Network, very common in current state of
the art planning technology. The Event Network is a temporal plan of multi-
media communicative acts toward trainees (e.g., e-mail messages, video news
from an emergency location, etc.).
    The Event Network can generated by a Simulation Planner. This planner
compiles static information into the Event Network, and then adapts the events
configuration according to the actions of the trainees, thus simulating different
courses of action of the world. The planner can be adapted from a generic AI
Timeline-based Planning and Scheduling module [10].
    The core information item elaborated by a Simulation Planner is the so-
called synchronization. Synchronizations are the causal rules that regulate the
transitions between values on the same planning feature and the synchronization
of values among different planning features. In the Crisis Management scenario,
synchronizations are used to influence the Crisis Managers’ decisions, e.g. to
generate changes in the emergency conditions.
    When adopting Planning and Scheduling technologies to simulate a scenario,
it is worth highlighting how a great effort and amount of time is necessary
to understand the problem, capturing all its specificity, and to create a model
of the relevant aspects of the domains and the problem [11]. This consideration
suggests, on the one hand, the need for identifying commonalities and similarities
among the different domains and problems to operate in a more systematic way
and, on the other hand, the opportunity to exploit Semantic Web technologies
to ease and support the knowledge modelling task.
    For those reasons, we have built a Knowledge Base with Linked Data and
Semantic Web technologies. This KB is a central component in the Simulation
Learning system, responsible for collecting and maintaining the “knowledge”
about scenarios and training sessions. As such, the KB is the core information
source for the simulation: it contains all the knowledge required by the Sim-
ulation Planner to “orchestrate” the events during the training sessions. All
the causality in a simulation domain is modelled and stored in the KB; this
knowledge is then converted by the Simulation Planner into the suitable data
6          Irene Celino and Daniele Dell’Aglio

structures to synthesize the Event Network configurations for the lesson plan
goals.


3.2     Requirements for the Knowledge Base

The Knowledge Base [12] was carefully designed to fulfil a pressing requirement:
containing and managing all the knowledge needed to model and run the simu-
lation scenarios, the training events, the trainees’ behaviour, the time sequence,
and so on.
    To fulfil such a requirement, the KB must reuse pre-existing information (e.g.,
in the Crisis Management scenario, training procedures, emergency management
guidelines) and, in the meantime, it must allow for customization and diversifica-
tion of training knowledge (e.g., emergency policies and legislation change from
country to country). Furthermore, since most of the related information can be
pre-existing in a variety of formats, the KB must able to gather information from
heterogeneous sources (e.g., location data from geographic datasets, audio and
video inputs from multimedia archives, participants profiles) and to synthetize
and interlink such knowledge into a coherent base.


       Fig. 2. Role of the Knowledge Base in a Simulation Learning Environment


    The role of the KB in the Simulation Learning Environment and its interac-
tions with other components is depicted in Figure 2:

    – The KB is “initialized” by the trainer who models the simulation scenarios
      and the training path alternative options;
    – It is accessed by the Simulation Planner that needs to understand what
      “events” should be triggered and presented to the trainees during the learning
      sessions;
                         A Linked Knowledge Base for Simulation Learning         7

 – It is also accessed by other system components that need to get/give infor-
   mation about the training session and the knowledge exchanged during or
   after its delivery (cf. Section 4);
 – It is used to record the events and decisions taken during each training
   session, in order to enable the semi-automatically creation of an individual
   trainee debriefing report at the end of the training session.
To cope with such challenges, we adopted Linked Data and Semantic Web tech-
nologies for the design and development of our Knowledge Base.

3.3   Benefits from the adoption of Linked Data
The choice of Linked Data and Semantic Web technologies in our KB is mo-
tivated by the need for an easy access, (re)use and integration of data and
knowledge [13].
    The ease of access to the KB is implicit in the use of Web technologies, which
represent a mature and established technology stack. Following the Linked Data
principles [14], we provide a standard access means to the data and knowledge
stored in the KB. Moreover, Linked Data and Semantic Web facilitate and enable
an entity-centric design of Web APIs: in our implementation, on top of the KB,
we have developed a RESTful service2 with specific methods to get details about
certain entities on the basis of the concepts (entity types) defined in the KB
ontologies and models (cf. Section 4). The RESTful service is also employed to
abstract from the physical location of data, as explained further on.
    The reuse of pre-existing datasets is also enabled by our technological choice.
Several useful data sources are already present on the Web of Data and, thus,
immediately exploitable by the KB. For example, in the Crisis Management
scenario, environment characteristics of crisis settings are retrieved from GeoN-
ames3 , the geographical database containing over 10 million geographical names,
7.5 million unique features, 2.8 million populated places and 5.5 million alternate
names. For example, a scenario about a river flood or a earthquake benefits from
the retrieval of localized information from GeoNames. As a pragmatic solution,
we are “caching” the relevant features from GeoNames locally to the KB. How-
ever, the reuse of GeoNames URIs constitutes a link to the remote dataset and
allows for further knowledge retrieval. In the same way, we can connect the KB
to other knowledge bases like Freebase4 or DBpedia5 [15] to get information on a
number of general-purpose topics and entities. The linkage to the latter sources
is still in progress.
    But this re-usability benefit applies also to the knowledge explicitly mod-
elled for domain-specific learning scenarios: the choice of RDF to encode the
data and of RDFS/OWL to model their structure pays, since those data are
partially published on the open Web, thus enriching the Web of Linked Data
2
  Cf. http://pandoratest01.xlab.si:8080/pandora-ckb/.
3
  Cf. http://www.geonames.org/.
4
  Cf. http://freebase.com/.
5
  Cf. http://dbpedia.org/.
8         Irene Celino and Daniele Dell’Aglio

and becoming available for other Simulation Learning systems or for different
tools. To this end, in our Crisis Management scenario, we decided to store the
schemata and data generated by Pandora components natively as RDF triples
in the KB; the knowledge coming from pre-existing sources in different formats
(e.g., taxonomies, spreadsheets, guidelines) have been converted – manually or,
whenever possible, semi-automatically – to a structured RDF format. The ben-
efits of this approach are: the general Crisis Management knowledge is available
to the whole community; the simulation scenarios can be reused by any installa-
tion of the training system; the further enhancements and extensions of the core
knowledge are immediately “reflected” in all systems that make use of our KB.
    The ease of integration comes from the native interlinking capability of
Linked Data technologies. RDF provides the basic mechanism to specify the
existence and meaning of connections between items through RDF links [16]. In
other words, through the adoption of RDF, we not only give a structure to the
data stored in the KB, but we also interlink the entities described by such data.
Moreover, the links drawn between knowledge items are typed, thus conveying
the “semantics” of such relationships and enabling the inference of additional
knowledge. The information sources of the KB can be maintained and evolve
over time in an independent way, but, in the meantime, can be connected via
the Linked Data lightweight integration means.
    The KB contains different (although interlinked) datasets, which also require
diverse confidentiality/security levels for management and access. To this end,
the KB is designed as a set of federated RDF stores6 : the shared knowledge (e.g.
general Crisis Management information, basic scenarios) should be “centralised”,
to let all training system instances access and use it, while the installation-
specific knowledge (e.g., detailed or customized scenarios, trainees information,
personalizations) is managed in a local triple store, not accessible from outside
the system (see Figure 3). The RESTful service on top of the KB, as explained
earlier, provides a uniform access to the KB and hides to the other Pandora
components the existence of the various “realms” of distinct Linked Data sources.
    Finally, the adoption of Semantic Web technologies in the form of ontologies
and rules provides a further gain, since we can exploit reasoning and inference
for knowledge creation and consistency checking, as explained in next section.


4     Modelling and Retrieval in our Knowledge Base

As previously mentioned, our Knowledge Base manages several different and
interlinked types of information. In this section, we introduce three “families”
of data included in the KB and explain their modelling choices. We also illus-
trate their use in the Crisis Management training scenario within the Pandora
Integrated Environment.
6
    In the Pandora project, since the work is still in progress and for now we have one
    single system installation, the current initial release of the KB consists of a unique
    triple store with all the integrated knowledge.
                          A Linked Knowledge Base for Simulation Learning            9


Fig. 3. The KB as federation of different triple stores to preserve security and confi-
dentiality while benefitting from interlinking.


4.1   User Modelling

As introduced in Section 2.2, a Behaviour Sensing module is devoted to the
“detection” of trainees’ performance in order to create individual models that
help in tailoring the learning strategy of each participant to the simulation.
Prior to the training session, dedicated psychological tests and physiological
assessment at rest (e.g., through a Holter that measures the heart rate activity
at rest) are used to measure some relevant variables (like personality traits,
leadership style, background experience, self-efficacy, stress and anxiety). Those
variables are then updated during the training session, through self-assessment
measurements (i.e., asking the trainee about his performance) or through the
elaboration of the row data recorded by the sensors.
    Those data about trainees’ behaviour are stored and updated in our KB, as
instances of ontology concepts that represent the “affective factors” that influ-
ence the decision-making of the trainees. Due to the sensitivity of such infor-
mation, the individual performances of the trainees are modelled in RDF and
stored in the “local” triple store (cf. Figure 3) for apparent privacy reasons. We
are also investigating the possibility to exploit Named Graphs [17] for access
control: if the training session recordings are “stored” in the KB as separated
named graphs, a named graph-aware access control component could grant ad-
mission to the allowed users (e.g., the trainer) and could deny the access of the
malicious or occasional users (e.g., the other trainees).
    In the specific scenario of the Pandora Integrated Environment, the learning
sessions are targeted to the training of Crisis Managers. Therefore, the KB stores
and manages also a set of specific information about them.
    The Crisis Managers are the so-called Gold Commanders, who are responsible
for the strategic development of responses to crisis situations. The trainee group
10        Irene Celino and Daniele Dell’Aglio

is usually composed of the representatives of the “command team”, i.e. the core
agencies involved in the strategic Crisis Management (e.g., police, local authority,
fire brigade, ambulance); sometimes, other trainees can come from other utility
companies (e.g. electricity, road transportation, environmental agency).
    In our KB, therefore, we modelled the basic knowledge about those Gold
Commanders by creating classes to represent the different trainees typologies.
Those classes are “instantiated” per each training session, by adding the in-
dividual trainees to the KB. This lets the system record the training of each
participant in relation to his/her role in the simulation; this knowledge is very
precious for both the debriefing phase – when the trainer summarizes the per-
formance results of each trainee (see also below) – and for a general analysis and
mining of the achieved objectives and learning needs of the different agencies.
    The initial version of the user modelling is part of the Pandora Ontology7 .


4.2     Training Simulation Modelling

The core module of the simulation learning system is the Simulation Planning (cf.
Section 2.2). Our KB therefore must be able to manage the knowledge required
for the planning, in terms of the basic entities used by AI Planning Applications
based on Timeline Representations.
    In literature, several attempts tried to formalize the semantics of planners
[18, 19]. However, those approaches, on the one hand, tried to specify a generic
planning ontology and, on the other hand, were specifically tailored to some
application domains.
    Building on their experience, we decided to make our own formalization to
encompass the family of techniques known under the name of Timeline-based
Planning and Scheduling. In fact, current AI planning literature shows that
timeline-based planning can be an effective alternative to classical planning for
complex domains which require the use of both temporal reasoning and schedul-
ing features [10]. Moreover, our modelling aims to become the foundation for the
investigation on the interplay between Semantic Web Technologies and Planning
and Scheduling research [12]; Semantic Web knowledge bases, in fact, can rep-
resent a good alternative to the current domain modelling in the planning area,
which encompasses a multitude of custom and not interoperable languages.
    Our modelling is formalized in a Timeline-based Planning Ontology8 . As
in classical Control Theory, the planning problem is modelled by identifying
a set of relevant features (called components) which are the primitive entities
for knowledge modelling. Components represent logical or physical subsystems
whose properties may vary in time; in the simulation learning, components are
either trainees behavioural traits or learning scenario variables. Their temporal
evolutions is controlled by the planner to obtain a desired behaviour. Therefore,
our ontology includes a set of time functions that describe the evolution over
temporal intervals. The evolution is modelled by events happening on modelled
7
     Cf. http://swa.cefriel.it/ontologies/pandora.
8
     Cf. http://swa.cefriel.it/ontologies/tplanning.
                          A Linked Knowledge Base for Simulation Learning        11

components. To this end, a set of planning rules (or synchronizations) specifies
what events can be triggered to modify these evolutions. The task of the Simu-
lation Planner is to find a sequence of events that brings the system entities into
a desired final state.
    The core concept of the Timeline-based Planning Ontology is therefore the
planning rule: each rule puts in relation a “reference” event – which is the poten-
tial cause of some phenomena in the simulation – with a “target” event – which
is the possible consequence –, under a set of conditions called rule relations.
We modelled such conditions as SPARQL FILTER or LET clauses9 ; therefore,
we reused the modelling of such clauses and functions included in the SPIN
Modeling Vocabulary [20] and extended it with regards to temporal conditions.
    At learning design time – i.e. prior to the simulation sessions –, the trainer
has to model the possible training scenarios, by instantiating in the KB the
ontology concepts, in particular the planning rules and the related events. The
choice of Linked Data and Semantic Web technologies for our modelling is not
only useful for reusing and exploiting pre-existing knowledge. In this case, we
can also exploit the semantics of such ontology for the consistency checking of
the simulation scenarios: by automatic means, we can check if all the planning
rules are satisfiable, if they represent possible “states” of the world simulated
during the sessions, if all the events can happen under opportune conditions,
and so on.
    At run-time – i.e. during the simulation learning sessions –, all the events
and decisions taken by the trainees during their learning are recorded in the
KB. The KB is therefore used by the Simulation Planner to create and update
the simulation plan. SPARQL-based querying is used to perform the knowledge
retrieval required in this step: based on the actual recorded events, only the
admissible planning rules are returned to let the planner decide what events to
trigger.
    After the learning session, at debriefing time, the recording of trainees’ be-
haviour and decision-taking is exploited to summarize the session progress. Also
in this case, SPARQL-based querying on the KB is exploited to retrieve all the
events and situations that involved each trainee; this knowledge is immediately
at disposal of the trainer to produce a debriefing report for each participant
and can be used to highlight personal performance, achieved training goals and
attention points for improvement or further training.


4.3    Asset Modelling

The Learning Delivery module (cf. Figure 1) takes as input the simulation plan
and “execute” it by sending the opportune stimuli to the trainee. To do this,
it needs to recreate the actual simulation conditions, by pretending a near-real
situation. For example, in the Crisis Management training scenario, the partic-
ipants must be solicited by phone calls, mail, news, videos, etc. that give them
9
    The SPARQL LET clause is defined in some implementations, like the Jena Semantic
    Web Framework http://openjena.org/
12      Irene Celino and Daniele Dell’Aglio

updates on the evolution of the emergency. To this end, the Learning Delivery
module manages two types of “learning objects” that are described in the KB.
    The first type of simulation objects consists in audio and video assets, which
give information to the trainees about what happens outside the simulation
room. In the Pandora scenario, those assets are pre-canned recording of simulated
video news or audio inputs – like phone calls from the crisis setting – which
are used to put pressure on the trainees and, in the meantime, to give them
further inputs on which they must base their decisions. To model such assets,
it is possible to re-use existing learning objects modelling, such as [4, 21]. In
the Pandora project, we are still in the process of selecting the most suitable
modelling for our purpose.
    There is a second type of stimuli for the simulation trainees. Since the sensing
system records the “performance” of each participant also in terms of stress and
anxiety, the simulation can be adapted to the specific conditions and deliver
tailored inputs for the individual trainees. For example, if the purpose is to
augment the pressure on a participant, the input could be made more dramatic.
To this end, the Learning Delivery module makes use of Non-Player Characters
(NPC): in games terminology, elements that act as a fictional agents and that are
animated and controlled by the system. Those NPCs simulate additional actors
from outside the learning environment and are used to deliver information to
the trainees.
    Our KB, therefore, includes also the modelling of NPC descriptions, in terms
of their role in the simulation, their basic characteristics (e.g. gender, ethnicity,
disability), their profiles (expertise, experience, emotional type, communication
skills, etc.), their multimedia rendering mode (from the simplest text represen-
tation to fully rendered 3D avatar), etc. For this modelling, Linked Data are
exploited for the reuse of pre-existing descriptions and Semantic Web technolo-
gies are leveraged to retrieve and select the most suitable NPC to simulate a
desired stress or anxiety situation.


5    Towards Provenance Tracking

As detailed in the previous section, our Linked Knowledge Base is used to manage
the knowledge required to produce simulation-based learning sessions. We think
that Simulation Learning can be seen as a special case of the Open Provenance
Model (OPM) [22]. The sessions are our main process, the trainees, as well as the
simulated external characters, are our agents and the events and the decisions
taken by the trainees are the artifacts of the learning sessions.
    Our future investigation will focus on the definition of the suitable OPM
Profiles for Simulation Learning systems; specifically, we aim at mapping our
Timeline-based Planning Ontology to the Open Provenance Model Vocabulary
Specification [23]. While this is still work in progress, hereafter we give some
hints on how we can build on the Open Provenance Model and why it is useful.
    The provenance tracking in simulation learning can be done at two levels:
at design time – when the learning scenarios are modelled in the KB with their
                         A Linked Knowledge Base for Simulation Learning           13

possible planning rules –, and after the learning sessions – when the results of
the simulations are analysed.
    At design time, provenance can be used to trace the cause-consequence chains
between the possible simulation events. As explained in Section 4.2, planning
rules are used to model the admissible transitions between events in the simula-
tion; the completion and inference rules defined in OPM [22] can be exploited for
the consistency checking of the simulation modelling. On the one hand, those
rules can help in refining the modelling, by eliminating useless entities, com-
bining eventual repetitions and introducing missing entities; on the other hand,
OPM rules can help in examining the possible decision-trees (i.e., the possible
alternative planning options) to identify unreachable states or decision bottle-
necks.
    After the learning sessions, the simulation records can be analysed to under-
stand and synthetise the learning outcomes. Tracking the provenance of trainees’
decisions and mining the most popular causal chains across several sessions de-
livery can be of great help for identifying learning needs, common behaviours
(as well as common trainees’ mistakes), wide-spread procedures, etc. This infor-
mation can become of considerable importance: on the one hand, to improve the
learning simulations and better address learners requirements and, on the other
hand, to better study and interpret learning outcomes for individual participants
or for entire classes of trainees.


6   Conclusions

In this paper, we presented our approach and experience in building a Linked
Knowledge Base to support Simulation Learning systems. We introduced the
general architecture of such a system together with a concrete scenario in Crisis
Management training; we illustrated the benefits of the use of Linked Data and
Semantic Web technologies and we summarised our modelling choices. We also
suggested the introduction of provenance tracking, to further enrich and better
analyse the contents of a Knowledge Base for Simulation Learning.
   Our approach is being integrated in the Pandora Environment, which, in the
second half of 2011, will be tested at the UK Emergency Planning College in
their “Emergency Response and Recovery” training courses.


Acknowledgments

This research is partially funded by the EU PANDORA project (FP7-ICT-2007-
1-225387). We would like to thank the project partner for their collaboration.


References
 1. Stojanovic, L., Staab, S., Studer, R.: elearning based on the semantic web. In: In
    WebNet2001 - World Conference on the WWW and Internet. (2001) 23–27
14       Irene Celino and Daniele Dell’Aglio

 2. Tiropanis, T., Davis, H.C., Millard, D.E., Weal, M.J.: Semantic technologies for
    learning and teaching in the web 2.0 era. IEEE Intelligent Systems 24(6) (2009)
    49–53
 3. Learning Technology Standards Committee of the IEEE: Draft standard for learn-
    ing technology - learning object metadata. Technical report, IEEE Standards
    Department, New York (2002)
 4. Brase, J., Nejdl, W.: Ontologies and Metadata for eLearning. In Staab, S., Studer,
    R., eds.: Handbook on Ontologies. International Handbooks on Information Sys-
    tems. Springer (2004) 555–574
 5. Aldrich, C.: Simulations and the Future of Learning: An Innovative (and Perhaps
    Revolutionary) Approach to e-Learning. Pfeiffer (2003)
 6. Caird-Daley, A., Harris, D., Bessell, K., Lowe, M.: Training Decision Making using
    Serious Games. Technical report, Human Factors Integration Defence Technology
    Centre (2007)
 7. Sniezek, J., Wilkins, D., Wadlington, P., Baumann, M.: Training for Crisis
    Decision-Making: Psychological Issues and Computer-Based Solutions. Journal
    of Management Information Systems 18(4) (2002) 147–168
 8. Stern, E., Sundelius, B.: Crisis Management Europe: An Integrated Regional Re-
    search and Training Program. International Studies Perspective 3(1) (2002) 71–88
 9. Lehto, M., Nah, F.: Decision-making Models and Decision Support. In: Handbook
    of Human Factors and Ergonomics. John Wiley & Sons, Inc., NY (2006)
10. Cesta, A., Fratini, S.: The timeline representation framework as a planning and
    scheduling software development environment. In: 27th Workshop of the UK Plan-
    ning and Scheduling SIG. (2008)
11. Cesta, A., Cortellessa, G., Fratini, S., Oddi, A.: Developing an end-to-end planning
    application from a timeline representation framework. In: 21st Applications of
    Artificial Intelligence Conference. (2009)
12. Celino, I., Dell’Aglio, D., De Benedictis, R., Grilli, S., Cesta, A.: Ontologies, rules
    and linked data to support crisis managers training. IEEE Learning Technology
    Newsletter, Special Issue ”Semantic Web Technologies for Technology Enhanced
    Learning” 13 (2011) Issue 1
13. Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space. 1st
    edn. Volume 1 of Synthesis Lectures on the Semantic Web: Theory and Technology.
    Morgan & Claypool (2011)
14. Berners-Lee, T.: Linked Data – W3C Design Issues, Architectural and philosophical
    points, Online at http://www.w3.org/DesignIssues/LinkedData.html. (2006)
15. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hell-
    mann, S.: Dbpedia – a crystallization point for the web of data. Journal of Web
    Semantics: Science, Services and Agents on the World Wide Web 7 (2009) 154–165
16. Bizer, C., Heath, T., Berners-Lee, T.: Linked Data – The Story So Far. Interna-
    tional Journal on Semantic Web and Information Systems 5 (2009) 1–22
17. Carroll, J.J., Bizer, C., Hayes, P., Stickler, P.: Named graphs, provenance and
    trust. In: WWW ’05: Proceedings of the 14th international conference on World
    Wide Web, ACM (2005) 613–622
18. Rajpathak, D., Motta, E.: An ontological formalization of the planning task. In:
    International Conference on Formal Ontology in Information Systems (FOIS’04).
    (2004) 305–316
19. Gil, Y., Blythe, J.: Planet: A sharable and reusable ontology for representing plans.
    In: the AAAI - Workshop on Representational Issues for Real-World Planning
    Systems. (2000) 28–33
                          A Linked Knowledge Base for Simulation Learning             15

20. Knublauch, H.: SPIN Modeling Vocabulary. http://spinrdf.org/spin.html (Oc-
    tober 20, 2009)
21. Meta-information for Multimedia eLearning. In: Computer Science in Perspective.
    (2003)
22. Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P., Kwasnikowska,
    N., Miles, S., Missier, P., Myers, J., Plale, B., Simmhan, Y., Stephan, E., den
    Bussche, J.V.: The Open Provenance Model core specification (v1.1). Future
    Generation Computer Systems (2010)
23. Zhao, J.: Open Provenance Model Vocabulary Specification. Online at http:
    //purl.org/net/opmv/ns (2010)
    Connecting medical educational resources to the
Linked Data cloud: the mEducator RDF Schema, store
                      and API

 Evangelia Mitsopoulou1, Davide Taibi2, Daniela Giordano3, Stefan Dietze4, Hong
   Qing Yu4, Panagiotis Bamidis5, Charalampos Bratsas5, Luke Woodham1
1 St George’s University of London, Center for Medical and Healthcare Education, Cranmer

                      Terrace, SW17 0RE London, United Kingdom
                              {emitsopo, lwoodham}@sgul.ac.uk
  2
    Consiglio Nazionale delle Ricerche, Istituto per le Tecnologie Didattiche, Via Ugo La
                             Malfa 153, 90146, Palermo, Italy
                                    davide.taibi@itd.cnr.it
  3 Universita‘ di Catania, Dipartimento di Ingegneria Elettrica, Elettronica e Informatica,

                           Viale A.Doria 6, 95125, Catania, Italy
                                    dgiordan@dieei.unict.it
    4 The Open University, Knowledge Media Institute, MK7 6AA, Milton Keynes, UK

                                {s.dietze, h.q.yu}@open.ac.uk
      5 Aristotle University of Thessaloniki, PO Box 323, 54124, Thessaloniki, Greece

                              {bamidis, mpampis}@med.auth.gr


    Abstract. The existence of non-interoperable metadata schemas and limited use
    of shared vocabularies means that retrieving and processing educational
    resources across the Web represents a challenge. The emerging Linked Data
    paradigm has provided the tools and methods to share and expose metadata in a
    more unified and well interlinked manner, permitting both humans and
    machines to process Web data. The availability of vast amounts of RDF-based
    Linked Data is offering a worthwhile alternative to the isolated and
    heterogeneous data silos which previously dominated the Web. In the EC-
    funded project mEducator a standardised approach is proposed to describing
    and exposing medical educational resources. In this paper are described firstly
    the design considerations, and conceptual model upon which mEducator ’s
    metadata scheme approach was based. Afterwards, follows the description of
    the serialization of the scheme in RDF/XML, and in turn follows an example
    showing how medical educational resources are exposed on the Web using an
    RDF endpoint, and discuss the potential advantages of this approach.


    Keywords: Metadata, Linked Data, RDF vocabularies, medical learning
    resources, mEducator, Conceptual Model, RDF Store
1 Introduction

Medical and Healthcare Education has been modernised lately by means of
Information and Communication Technologies (ICT). Many European academic
institutions use their own Learning Content Management Systems (LCMS) in order to
deliver educational resources in their curriculum. These resources can have different
formats, such as images, text, videos, podcasts, virtual patients, serious games about
health, diagnostic tests such as electrocardiograms (ECGs) etc. Currently, it is
difficult to exchange resources between different platforms due to lack of
standardised content sharing mechanisms. An attempt to solve the above problem is
being made by the mEducator Best Practise Network (BPN) [1], funded by the EC in
the eContentplus programme, which aims to analyze the use of existing standards and
reference models in the e-learning field to discover, retrieve, share and re-use medical
educational resources. The goal of the current paper is to present this effort and to
show how this effort could be further examined, evaluated and reused by a wider
community.
   The diffusion of learning experiences based on the Web has increased over the
years, and the application of Semantic Web technologies to e-learning has been
widely studied by researchers. In 2001 Tim Berners Lee presented the Semantic Web
as a Web interpretable by machines [2]. Many research studies propose the
application of Semantic Web technologies in different e-learning situations. In [3] and
[4] ontologies are used to describe the contents of learning resources. The importance
of the use of the Semantic Web approach to learning object metadata representation is
discussed in [5], [6]. The use of Semantic Web services architecture for e-learning
was proposed by [7] and [8]. The modeling of an e-learning environment by means of
a multi agents system was studied by [9]. Virtual learning environments based on
Semantic Web technologies were discussed in [10]. Recent developments in the social
Semantic Web have focused on informal learning experiences [11], as well as the
diffusion of Linked Data, which has opened up new opportunities for improving e-
learning experiences.
   Linked data can be useful to support interoperability, accessibility and reusability
of learning resources. For this reason, in the mEducator project, a standardised
metadata description scheme has been proposed, and an RDF triple-store repository
compliant with Linked Data requirements has been created to store educational
resources. By using a standardised format to describe resources a basis for
interoperability and a framework for discovery will be provided [12].
   The rest of this paper is structured as follows: Section 2 describes the development
and structure of the conceptual model and the schema, along with its RDF
serialization. The mEducator implementation and its application of Linked Data
principles, including the RDF instance file, is discussed in Section 3. Details of the
REST API and rdf repository are provided, as well as a description of the approach
taken by the mEducator RDF schema and RDF endpoint to interlink educational
resources with data sets as part of the Linked Data cloud.
2 A general schema for educational resources

2.1 In General

The first step when developing a metadata scheme is to specify what kind of metadata
is needed in order to solve a problem in a specific domain of interest. In this case, the
domain of interest is medical and healthcare education, and the problem is how to
describe different types of medical resources in order to facilitate content sharing and
re-use across institutions. In particular the description must allow resources to be
efficiently repurposed, a process that involves all or part of a resource being modified
and re-used to suit it to an audience or purpose different than was originally intended.
Repurposing is a term first used in [13] to refer to the adaptation of existing online
learning resources to different educational levels. Later in [14] the term repurposing
was used in a wider sense to refer to the conversion of a Virtual Patient (VP) created
for one purpose into a VP fit for another purpose i.e. a different subject, discipline or
healthcare profession. Virtual Patient as defined in [13] is a specific type of medical
educational resource that refers to an interactive computer simulation of real-life
clinical scenarios for the purpose of medical training, education, or assessment.
   In order to best determine the metadata requirements for the project a conceptual
model for the description of medical educational resources was formulated. The needs
of mEducator’s target group were identified by conducting a survey of mEducator’s
partners. The results of the survey indicated the nature of current perceptions about
the concept of educational content, educational content sharing and repurposing.
Based on the survey’s results and following an investigation of existing e-learning
standards, an initial framework for the metadata concepts was set that allowed the
above educational material to be annotated semantically and described in a machine
readable format. A conceptual model of the schema was then developed based on this
initial framework. Taking into consideration the fact that the intended users are not
professional indexers and would tend to fill in the minimum required information, the
generally agreed approach was to keep the schema as lean as possible and to
maximise the re-use of existing e-learning standards.


2.2 Research on existing Standards

Existing standards for describing learning objects, in particular IEEE Learning Object
Metadata (LOM) [15] and its extension Healthcare LOM [16] - designed by
MedBiquitous to provide for the specific demands of medical and healthcare
education - were considered as the starting point for the development of the schema.
This starting point was also based on previous research done by [17] whose input to
the current work was fundamental. To continue, the XML binding of IEEE LOM was
initially reused by mEducator in order to implement the scheme in XML. However,
as the need to ensure that the metadata was compliant with the principles of Linked
Data became clear, RDF [18] was identified as the most appropriate framework to
use. The notion of linking between resources and ontologies through RDF URIs
offered a way of benefitting from the many advantages of Linked Data.
   Work towards an implementation of IEEE LOM in RDF has been started by [19],
but this work was discontinued around 2004. For this reason Dublin Core [20] was
considered and Dublin Core’s expression in RDF/XML [21], along with a further
investigation of existing RDF Vocabularies, made up the final phase of the process of
structuring mEducator’s conceptual model. Examples of these RDF Vocabularies
includes SKOS[22], which was used to provide a standard way of representing
controlled vocabularies in RDF defined within mEducator, and FOAF [23], which
was used to describe people in RDF.
   Those properties of Dublin Core that met mEducator’s conceptual model needs
were reused in the scheme as super properties. They were not used in the same
manner as they are defined in the Dublin Core element namespace [24], because this
approach was considered to be too general. In contrast, the approach of the Dublin
Core Terms Namespace [25], in which all properties are sub-properties of the ones
defined in [24], was adopted i.e. mdc:creator, mdc:identifier, mdc:title, mdc:rights,
mdc:subject, mdc:language are sub-properties of dc:creator, dc:identifier etc
respectively, where mdc is the prefix for mEducator Namespace [26]. From the FOAF
specification, some basic properties were reused i.e. those that are used to describe a
person: foaf:name, foaf:Organisation, foaf:mbox_sha1sum.
   Where existing standards did not meet mEducator requirements, additional
properties were proposed. For example, a property was needed in the schema to
describe to which organisation a person belongs to. The lack of such a property in
FOAF necessitated the proposal of a new property within mEducator’s namespace i.e.
mdc:memberOf.


2.3 Conceptual Model

The conceptual model for the Learning Resource that was developed to be the basis of
the mEducator schema is intended to:
     1) Capture the semantics of the relationships (if any) among the metadata fields
         and the Learning Resource
     2) Make evident the rationale of the design decisions in the schema
     3) Identify potentially overlapping semantics in the metadata fields, and remove
         them in order to support the modular construction of metadata
     4) Clearly locate the appropriate place for the development of mEducator-
         specific controlled vocabularies and taxonomies.
   As depicted below (Figure 1), the basic metadata of a medical learning resource
consists of: its title, its identifier (i.e. url), the language in which it is created, the
language of its metadata, the date of resource creation, the date of metadata creation
the resource author, the author of the metadata, a brief description, a technical
description, its IPR license, its citation and any further information that certifies its
quality. In addition, further metadata has been proposed that provides pedagogical
value such as: educational objectives, assessment methods, teaching instructions,
educational level, educational prerequisites and educational outcomes. Further
metadata will be used to provide keywords describing the resource, and to describe
the discipline and discipline specialty to which a resource relates.
   Another category of metadata fields included in the proposed scheme capture the
repurposing history of the resource. The definition of the repurposing process is given
in [14] and the metadata provided may include the title and identifier of the parent
resource from which the current resource has been created, any further resource(s)
that the current resource has been repurposed to, the type of repurposing that took
place (i.e. change of language, change of target etc.) and a brief description about the
repurposing process that took place.
   There are different types of medical educational resources, and this information
should be captured and described. Within mEducator two controlled vocabularies
have been created to enable the categorisation of the resources based on two different
criteria: the informational quality (including the nature of the information provided,
how it is organized, presented or collected) and the media format of the resource (or
the media formats included in the resource if this is complex and includes multiple
media types). These use the resourceType metadata field and the mediaType metadata
field.
   Metadata fields for describing accompanied resources are also included. The
rationale behind accompanied resources is that sometimes an educational resource is
accompanied by other resources such as pdfs, videos, images etc in order it to be fully
understood and better taught. In order to describe this information the
isAccompaniedResource metadata field was proposed.
Fig. 1. The conceptual model used as a basis for developing the mEducator metadata
application profile, summarising the metadata information proposed to describe a medical
educational resource. Mandatory fields are in bold, fields using controlled vocabularies are in
yellow, fields capturing user-generated information are in pink.

    In order to make the schema as simple and flexible as possible, the majority of the
fields are optional. Those fields which are mandatory include the title of the resource,
the unique identifier (i.e. URL, URN, OkkamID etc.), the IPR license, the language of
its content, the language of its metadata, a description and relevant keywords. In
addition to defining a metadata field as mandatory or optional, the multiplicity for
each field has been specified. For instance, each educational resource should have one
title only and its metadata should be written in a single language. Other fields are
permitted to have multiple instances.


2.4 RDF Serialisation

Linked Open Data refers to data published on the Web in such a way that it is
machine-readable, its meaning is explicitly defined, it is linked to other external data
sets, and can in turn be linked to from external data sets [27] (Bizer, 2009). At the
heart of the Linked Data approach are its core technologies, such as RDF for data
representation and SPARQL as a standardised means to query RDF stores via HTTP.
   The first step towards implementing mEducator’s metadata schema in RDF was to
define its RDF Model [28]. This RDF model comes out of the conceptual model
described in the previous section. More specifically, all the metadata fields included
in the conceptual model become properties in the RDF Model. Each property should
have a value i.e. the property title can have the value “cardiology”. Generally
speaking, the type of value a property can take varies from a plain string, (datatype
properties) to a more complex object composed of a group of properties and values
(object properties). In order to describe more complicated values or to describe a
value whose type is not described by an existing class (e.g. String) it is necessary to
define classes. Thus the need to define a new class for a property is dependent on the
type of the value that property can take.
   Another component that was defined within the RDF Model is controlled
vocabularies. A controlled vocabulary is a vocabulary consisting of a “prescribed list
of terms or headings, each one having an assigned meaning, and provide a way to
organize knowledge for subsequent retrieval" [29]. Controlled vocabularies are, in
other words, another type of value that a property can take.
   Below are the classes, properties and controlled vocabularies that were defined as
part of the RDF Model for mEducator.

         Classes
         Resource, RepurposingResource, IPRType, RepurposingContext,
       EducationalLevel,  Discipline,  DisciplineSpeciality, Subject,
       ExternalTerm


          Properties
          creator, identifier, repurposingIdentifier, description, language,
       rights,    created,    metadataCreated,     memberOf,        profileURI,
       assessmentMethods, educationalOutcomes, educationalObjectives,
       teachingLearningInstructions, citation, discipline, disciplineSpeciality,
       externalTerm, externalSource, conceptID, educationalPrerequisites,
       educationalContext,     educationalLevel,     title,   repurposingTitle,
       isAccompaniedBy, toRepurposingContext, fromRepurposingContext,
       isRepurposedTo, isRepurposedFrom, resourceType, mediaType,
       technicalDescription, metadataCreator, metadataLanguage, quality,
       repurposingDescription


         Controlled Vocabularies
         ResourceType, MediaType, EducationalOutcome,                IPRType,
       EducationalLevel, RepurposingContext


3   Deployment: an RDF store and API compliant with the
mEducator schema

The Linked Data cloud is based on publishing structured data on the Web using RDF,
SPARQL and interconnected URIs to reference data entities. That has led to a large
amount of well-connected and interlinked data sets and has emerged as an extremely
useful approach for exposing data.
   In the mEducator project, a sharing mechanism which exposes educational
resource descriptions as rich and well-interlinked Linked Data has been proposed.
This approach requires the implementation and application of the mEducator
metadata scheme and the provision of an API that enables access to a repository
containing this data. This is part of a more general semantic solution based on Linked
Services [37] that allows distributed searches and queries across the Web. The
detailed description of this solution is outside the scope of the present paper (a more
complete description can be found in [30]). While external resource metadata is
discovered and processed on the fly, an RDF store complying with the schema
proposed in this paper is used to exposed retrieved metadata as LOD and enrich it
with links to structured LOD data sets. The store’s API could be directly accessed by
a 3rd party application allowing users to insert new resources or search resources using
their metadata, likewise software agents can collect resource metadata using web
service endpoint of compliant LCMS, and use the API to store them in the repository.
   The solution proposed respects the four principles of Linked Data [31]:
   1. The first principle requires the use of URI as a namespace to identify resources
   2. The second principle proposes the use of HTTP URI in order to look for
        information about the resources.
   3. The third principle recommends to provide useful information, using RDF
        when someone looks up a URI.
   4. The fourth principle consists in creating links to other resources on the Linked
        Data cloud


3.1 RDF Schema Level

As shown in the previous section, mEducator’s metadata description scheme uses an
RDF model to represent its data. Generally speaking, the usage of URIs and RDF in
the schema demonstrates the implementation of the basic principles of Linked Data.
PURLs [32] along with targeted locations were used in order to develop mEducator’s
namespaces, and the domain http://www.purl.org/meducator has been registered for
mEducator. In turn, further subdomains have been registered to define namespaces for
both the schema and the controlled vocabularies defined within mEducator. The
namespace of the schema is http://purl.org/meducator/ns/ and the namespaces for the
controlled vocabularies are named similarly.
   RDF entities – classes and instances - are described using URIs, and most
specifically, via HTTP URIs, conforming to the first two design principles of Linked
Data. By referencing classes using HTTP URIs and also providing further information
regarding their definition, their meaning can be understood even in the event that they
become dereferenced.
   The definition of properties is done in a similar fashion, with HTTP URIs and
additional information regarding their meaning also being provided. In addition to
this, property definitions include links to other properties or to classes of other
standards. This is one of the strengths of RDF and Linked Data. By connecting
existing standards, and linking to other widely used standards such as Dublin Core the
interoperability of the mEducator scheme is maintained.
    The use of controlled vocabularies also complies with Linked Data principles, and
is implemented in RDF/XM. Some vocabularies were implemented using SKOS i.e.
mediaType, ResourceType and educationalOutcomes, while others were implemented
using just RDF i.e.IPR Type and Repurposing Context. In both cases, HTTP URIs
have been used to refer to each concept of a controlled vocabulary, e.g.
http://purl.org/meducator/licenses#Attribution     for    an    IPR     license   and
http://purl.org/meducator/mediaType#image for a media type. All of them include the
<rdfs:label> element that provides useful information in the event that they become
dereferenced.


3.2 RDF Instance Level

The incorporation of Linked Data principles at the RDF instance level is similar to
that at the schema level. Firstly, each RDF instance is described by an HTTP URI.
Secondly, all the properties whose value will be taken from a controlled vocabulary
use HTTP URIs. For example, the ResourceType property may have the value
http://purl.org/meducator/resourceType#virtualPatient.
   A critical advantage of Linked Data is the interlinking of resources by considering
available datasets (bioportal ontologies, mEducator controlled vocabularies) and
reusing schemas (FOAF, Dublin Core). In mEducator’s case, this is done via the
owl:sameAs [33] property that is used along with Subject, Discipline,
DisciplineSpeciality. For example, the Subject property is used to provide keywords
that describe the medical resource. These keywords are identified by the user but are
defined in an external source. Using the owl:sameAs property links to other URIs are
established where the definition of the same keyword exists in other taxonomies,
enabling other resources to be discovered. In this way mEducator’s RDF Schema
applies the fourth design principle of Linked Data, linking to other, related URIs to
improve the chances of discovering other related information on the Web.


3.3 The mEducator API

The mEducator instances described above are stored in an RDF repository that
provides a Representational State Transfer (REST) API, enabling access to the
repository by third parties with no knowledge of the underlying schema used to store
the data. During the inserting phase the mEducator REST API validates the RDF
instances to be inserted against the mEducator RDF schema, and creates a universal
unique identifier (UUID) for the mEducator resources inserted in the store according
to the mEducator HTTP URI1.
   The mEducator educational resource store provides simplified search mechanisms
in order to retrieve information related to the resources in RDF format. The API aims
to simplify the search for resources by using properties and features defined in the

1 http://purl.org/meducator/resources/UUID
mEducator RDF schema. The general purpose SPARQL interface provided by the
Sesame triple store is replaced with a simpler endpoint appropriately designed to
handle mEducator resources. In order to create links with other resources on the
Linked Data cloud the mEducator resources have been connected to the BioPortal2
vocabularies with regard to disciplines and subjects. A detailed description of these
search methods will be discussed in Section 3.4.

  <rdf:Description rdf:about="http://purl.org/meducator/resources/45b5ab54-4a09-4100-88bb-93057f256da2">
       <rdf:type rdf:resource="http://www.purl.org/meducator/ns/Resource" />
         <rdfs:seeAlso rdf:resource= “http://metamorphosis.med.duth.gr/uid#8332” />
      <mdc:identifier>http://www.med.helsinki.fi/tuke/meducator/Kansio_1004/Patient%201004.pdf</mdc:identifier>
       <mdc:title><![CDATA[ ECG Patient case 1004 limb and chest leads & thrombolysis ]]></mdc:title>
       <mdc:rights rdf:resource="http://purl.org/meducator/licenses/Attribution-Non-Commercial-Share-Alike" />
       <mdc:creator>
            <foaf:Person>
                 <foaf:name>Timo Kuusi</foaf:name>
            </foaf:Person>
       </mdc:creator>
       <mdc:metadataCreator>
            <foaf:Person>
                <foaf:name>Jarkko Mylläri</foaf:name>
                <mdc:profileURI>http://metamorphosis.med.duth.gr/pg/profile/Jarkko?view=foaf </mdc:profileURI>
            </foaf:Person>
       </mdc:metadataCreator>
       <mdc:created>2010-04-19</mdc:created>
       <mdc:metadataCreated>2010-04-19</mdc:metadataCreated>
       <mdc:subject>
            <mdc:Subject rdf:about="http://meducator.open.ac.uk/ontology/SNOMEDCT/51308006">
                 <rdfs:seeAlso>http://purl.bioontology.org/ontology/SNOMEDCT/51308001</rdfs:seeAlso>
                 <rdfs:label>Thrombolysis</rdfs:label>
                 <mdc:externalSource>SNOMED Clinical Terms</mdc:externalSource>
            </mdc:Subject>
       </mdc:subject>
       <mdc:resourceType rdf:resource="http://purl.org/meducator/resourceType/IMAGE"/>
       <mdc:discipline>
            <mdc:Discipline rdf:about="http://meducator.open.ac.uk/ontology/SNOMEDCT/51308001">
                 <rdfs:seeAlso>http://purl.bioontology.org/ontology/SNOMEDCT/51308001</rdfs:seeAlso>
                 <rdfs:label>Cardiology</rdfs:label>
                 <mdc:externalSource>SNOMED Clinical Terms</mdc:externalSource>
            </mdc:Discipline>
       </mdc:discipline>
       <mdc:educationalContext>Teaching of ECG</mdc:educationalContext>
       <mdc:educationalObjectives>Reading and interpreting ECG</mdc:educationalObjectives>
  </rdf:Description>


Fig. 2. Example of a mEducator resource description

Figure 2 shows an example of a mEducator resource, in which:
   - the mEducator namespace (http://www.purl.org/meducator/ns/) is used to
       identify the type of the resource;
   - subject and discipline are connected to the Bioportal ontology;
   - educational context and objective are properties of the resource as defined in
       the mEducator schema;
   - digital rights of the resource is connected to the Creative Commons;
   - the FOAF ontology is used to reference creator and metadata creator.

2 http://bioportal.bioontology.org/
3.4 The mEducator repository architecture

The mEducator repository stores medical learning resources in RDF format using a
SESAME RDF triple store. In the project, a REST API has been created with the aim
of making resources compliant with the four Linked Data principles and simplifying
the repository interoperability with external environments (data stores, software
applications, web agents, and so on). At present a social collaborative educational
network [34], Metamorphosis based on the social environment Elgg, uses this API to
provide a web user interface to the mEducator resources. However, as mentioned
above, the API is available to any 3rd party applications to, for instance, query and
retrieve resource metadata. Basically the API is an interface that external software
agents (clients, web services, and so on) could use to handle mEducator resources.
Moreover, future developments envisage the access to the API functionalities
considering different types of authentication levels.
   It is important to note that a SESAME triple store already offers a complete set of
REST methods to work with the resources and the entities belonging to the store.
However, these methods are designed to work with general RDF resources; for this
reason it was necessary to develop functionalities specifically designed to handle and
process mEducator RDF schema compliant data.
   Taking into consideration the characteristics of mEducator resources, the API
implements methods to insert the resources into the store, to retrieve the resources
using different search criteria, and to update the description of a resource and to delete
resources. The REST protocol uses the HTTP methods POST, GET, PUT and
DELETE to implement the typical database CRUD (create, retrieve, update, delete)
operations. The mEducator resources REST API is written in Java, specifically the
project uses Jersey, the reference implementation of the JSR-311 [35], to create a
RESTful endpoint for the mEducator repository.
   Considering the software architecture the main part of the API consists of:
     - Rest Interface: this module implements the endpoint for the services defined
          in the mEducator project. It manages the HTTP requests, defining the map
          between http URL and the Java method that implements the required
          behaviour.
     - Controller: this module addresses the requests coming from the Rest
          Interface to the right handler. In particular, during the inserting phase, it uses
          the Schema Validator to check the validity of a new resource with respect to
          the mEducator schema, and for the searching functionalities it delegates the
          Query Manager module to address a query to the Sesame store.
     - Schema Validator: this module uses the Jena Validity Report to perform
          validation of the RDF file representing a resource against the mEducator
          RDF schema. If the model is not valid, the resource is not inserted into the
          store and the conflicts that have caused the invalid condition is reported.
     - Repository Manager: this module manages the connection with the Sesame
          store.
     - Query Manager: this part of the architecture is responsible for managing the
          search requests and obtaining the results from the Sesame store.
Fig. 3. Main architectural components of the mEducator resources REST API.

The following query types have been developed in this version of the API:
     1. the searchByPropertyValue, this kind of search uses the couple property-
          value to perform a search on the resources stored in the Sesame store.
     2. the searchByKeyword this query creates an RDF graph of the resources with
          a specified keyword in one of its values.
     3. the searchByID this query uses the DESCRIBE statement to return an RDF
          graph related to the resource with a specified UUID
     4. the searchBySeeAlso this query is used to search the repository for the
          resources with a specified seeAlso property.
   In retrieving the resources from the SESAME store, particular attention was given
to managing the enclosures and blank nodes of the resources. In SPARQL, the
DESCRIBE queries return all the RDF triples of a resource but not the RDF triples of
the blank nodes related to it. A similar problem arises from deleting resources using
the DELETE statement, even though in this case only the main resource is deleted and
not all the related blank nodes.


4 Conclusions

The work presented in this paper has been designed to facilitate the connection of the
mEducator repository to the Linked Data cloud. [36] reports the basic steps needed to
publish data as Linked Data on the Web; it is necessary to assign a URI to the objects
described by the data and to create links to other data on the Web, as well as
providing metadata about the published data.
   All the work that has been done so far is in line with the approaches being taken by
big initiatives such as the Dublin Core Metadata Initiative. Taking this approach
ensures alignment with current versions of controlled vocabularies, allowing for a rich
choice of terminology (from external sources) to be made available as suggestions
both during the search and the metadata filling process. The investigation of existing
e-learning standards and their adoption, along with the reuse of existing RDF
vocabularies within the schema, ensured that the work of mEducator is as
interoperable as possible. This interoperability allows for connections to be made with
other communities using strict Healthcare LOM or other schemas.
   The scheme will initially be used by mEducator partners to describe their medical
educational resources in two sharing mechanisms developed as part of the mEducator
Best Practice Network and recommendations, and best practices will be produced. At
present, the mEducator RDF store REST API assigns a valid URI to the new
resources inserted into the repository, and metadata about resources are also provided
using search methods. Interlinking of mEducator resources to the Linked Data cloud
is currently being supported by enriching data sets with references to biomedical data
available via the BioPortal vocabularies. That way, for instance, free text descriptions
of subjects, keywords or disciplines are (semi-)automatically extended with
references to related terms in established vocabularies.
   While recent work has established the infrastructure to expose medical educational
resources in mEducator, ongoing and future work is dealing with populating the data
store, enhancing the enrichment process and integrating the data/store via its API into
third-party educational applications. The intention is also to permit third parties to use
the schema for their own resource description, while the submission of the schema to
standardisation groups would also be of importance.
   You can find further information about the schema, best practice guidelines,
related tools and the created API at the mEducator project website
http://www.meducator.net.


Acknowledgements

This work was supported by the project mEducator (Multi-type Content Sharing and
Repurposing in Medical Education), funded by the eContentplus Programme, a
multiannual Community programme to make digital content in Europe more
accessible, usable and exploitable (Grant ECP 2008 EDU 418006)


References

1. mEducator | Multi-Type Content Repurposing and Sharing in Medical Education
   http://www.meducator.net/
2. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American, 284(5),
   34-43, (2001).
3. Gasevic, D., Jovanovic, J., Devedzic, V.: Ontology-based Annotation of Learning Object
   Content, Interactive Learning Environments, 15(1), 1-26, (2007).
4. Gašević, D., Jovanović, J., Devedžić, V., Bošković, M.: Ontologies for Reusing Learning
   Object Content, In Proc. of the Fifth IEEE International Conference on Advanced Learning
   Technologies (ICALT'05), pp.944-945, (2005).
5. Brase, J., Nejdl, W.: Ontologies and Metadata for eLearning, In S. Staab & R. Studer (Eds.)
   Handbook on Ontologies, Springer-Verlag, pp. 555-574, (2004).
6. Nilsson, M., Palmér, M. and Naeve, A.: Semantic Web metadata for e-learning - some
   architectural guidelines, Proc. of the 11th World Wide Web Conference (WWW2002),
   Hawaii, USA, May 7-11, (2002).
7. Dietze, S., Gugliotta, A., Domingue, J.: A Semantic Web Service oriented Framework for
   adaptive Learning Environments. European Semantic Web Conference (ESWC), Innsbruck,
   Austria, (2007).
8. López Cobo, J.M., Sicilia, M.A., Arroyo, S.: Making Learning Objects useful with Semantic
   Web Services. In Proceedings of the e-Challenges Conference, (2006).
9. Dunkel, J., Bruns, R., Ossowski, S.: Semantic E-Learning Agents - Supporting E-Learning
   by Semantic Web and agent technologies, in: Seruca, I.; Cordeiro, J.; Hammoudi, S.; Filipe,
   J. (Eds.), Enterprise Information Systems VI, Springer Verlag, pp. 237-244, (2006).
10.Simic, G., Gasevic, D., Devedzic, V.: Semantic Web and intelligent learning management
   systems. Proceedings of the 2nd International Workshop on Applications of Semantic Web
   Technologies for ELearning. Macéió-Alagoas, Brazil, (2004).
11.Jovanović, J, Gašević, D, Torniai, C, Bateman, S., Hatala, M. : The Social Semantic Web in
   Intelligent Learning Environments: state of the art and future challenges, Interactive
   Learning Environments, 17(4), 273-309, (2009)
12.Starr, J., & Gastl, A.: IsCitedBy: A Metadata Scheme for DataCite D-Lib Magazine, 17,
   http://www.dlib.org/dlib/january11/starr/01starr.html
13.REHASH Project Final Report, Repurposing, http://www.elu.sgul.ac.uk/rehash/index.htm
14.eViP Electronic Virtual Patients http://www.virtualpatients.eu/about/about-virtual-
   patients/faqs/
15.IEEE Learning Technology Standards Committee (LTSC) — IEEE Learning Technologies
   Standards Committee 2011, http://www.ieeeltsc.org:8080/Plone
16.Healthcare       LOM         Overview        |    MedBiquitous         Consortium        2011,
   http://www.medbiq.org/working_groups/learning_objects/Healthcare_LOM_Overview.html
17.Konstantinidis S, Kaldoudi E, Bamidis P, Enabling Content Sharing in Contemporary
   Medical Education: A Review of Technical Standards, The Journal on Information
   Technology in Healthcare 2009; 7(6): 363–375
18.RDF - Semantic Web Standards 2011, http://www.w3.org/RDF/
19.IEEE Learning Object Metadata RDF binding LOM, http://kmr.nada.kth.se/static/ims/md-
   lomrdf.html
20.Dublin       Core       Metadata      Element       Set,    Version        1.1      2011     ,
   http://dublincore.org/documents/dces/
21.Langford,      D.:     Internet    ethics      .  Macmillan,      Basingstoke      (2000)    ,
   http://dublincore.org/documents/dcmes-xml/
22.SKOS Simple Knowledge Organization System - Home Page 2011 ,
   http://www.w3.org/2004/02/skos/
23.The Friend of a Friend (FOAF) Project | FOAF Project 2011, http://www.foaf-project.org/
24.Dublin Core Metadata Element Set http://dublincore.org/documents/dces/
25.DCMI Metadata Terms http://dublincore.org/documents/dcmi-terms/
26.mEducator Namespace, http://www.meducator.net/mdc/schema.rdf
27.The Semantic Web: Scientific American 2011 Bizer 2009
28.RDF/XML Syntax Specification (Revised) 2011, http://www.w3.org/TR/REC-rdf-syntax/
29.Reitz, J. M.: Dictionary for library and information science . Libraries Unlimited, Westport,
   Conn. (2004), http://edutechwiki.unige.ch/en/Controlled_vocabulary
30.Yu, H.Q., Dietze, S. , Li, N., Pedrinaci, C., Taibi, D., Dovrolis, N., Stefanut, T., Kaldoudi,
   E., Domingue, J.: A Linked Data-driven & Service-oriented Architecture for Sharing
   Educational Resources; Linked Learning 2011, 1st International Workshop on eLearning
   Approaches for the Linked Data Age, Heraklion, Greece (2011)
31.Linked Data - Design Issues 2011, http://www.w3.org/DesignIssues/LinkedData.html
32.PURL Home Page 2011, http://purl.oclc.org/docs/index.html
33.OWL Web Ontology Language Reference 2011, http://www.w3.org/TR/owl-ref/
34.E. Kaldoudi, N. Dovrolis, S. Konstantinidis, P. Bamidis, “Social Networking for Learning
   Object Repurposing in Medical Education”, The Journal on Information Technology in
   Healthcare, vol. 7(4), pp. 233–243, 2009
35.JSR-311: JAX-RS: The Java API for RESTful Web Services.
36.Bizer, C., Heath, T., Berners-Lee, T., : Linked Data - The Story So Far. International Journal
   on Semantic Web and Information Systems (IJSWIS), 5(3), 1-22, (2009).
37.Dietze, S., Yu, H.Q., Pedrinaci, C., Liu, D. and Domingue, J. (2011) SmartLink: a Web-
   based editor and search environment for Linked Services, 8th Extended Semantic Web
   Conference (ESWC), Heraklion, Greece
        Educational Semantic Wikis in the Linked Data Age:
         the case of MSc Web Science Program at Aristotle
                     University of Thessaloniki

    Charalampos Bratsas 1,2, Anastasia Dimou1, Georgios Alexiadis1, Despoina-Eleana
     Chrysou1, Konstantinos Kavargyris1, Ioannis Parapontis1, Panagiotis Bamidis1,2,
                                   Ioannis Antoniou1
        1
        Web Science, Mathematical Department, Aristotle University of Thessaloniki, Greece
    2
      Lab of Medical Informatics, Medical School, Aristotle University of Thessaloniki, Greece
    cbratsas@auth.gr, andimou@math.auth.gr, galexiad@math.auth.gr, chrysoue@math.auth.gr,
             kostaskav@math.auth.gr, paraponi@math.auth.gr, bamidis@med.auth.gr,
                                    iantonio@math.auth.gr


             Abstract. Wikis are nowadays a mature technology and further well established
             as successful eLearning approaches that promote collaboration, fulfill the
             requirements of new trends in education and follow the theory of
             constructivism. Semantic Wikis on the other hand, are not yet thoroughly
             explored, but differentiate by offering an increased overall added value to the
             educational procedure and the course management. Their recent integration
             with the Linked Data cloud exhibits a potential to exceed their usual
             contribution and to render them into powerful eLearning tools as they expand
             their potentialities to the newly created educational LOD. Web Science
             Semantic Wiki constitutes a prime attempt to evaluate this potential and the
             benefits that Semantic Web and linked data bring in the field of education.
             Keywords: Wiki Engines, Semantic Wiki, e-learning, Linked Data


1           Introduction

Semantic Web is the current state-of-the-art, whose unprecedented intension is to
unlock the value of content; even though knowledge modeling is significantly
difficult due to its complexity. Nowadays, this is mainly accomplished using the two
prominent semantic technologies of ontologies and Linked Data [1]. Although their
diffusion is not wide, the education field is well passed [2]. There are several
educational projects which have already started to use Linked Data. Among them are
meducator [3] 1, LUCERO JISC Project2 , dotAC3 , RKB Explorer4 and SemTech5.

1
  http://www.meducator.net/
2
  http://lucero.open.ac.uk/
3 http://www.dotac.info/
4
  http://www.rkbexplorer.com/explorer/
5 http://www.semtech.ecs.soton.ac.uk/
   This paper attempts to highlight in a brief but informative way the integration of
Advanced Semantic Wikis with the Semantic Web and Linked Data in the form of
Semantic Learning Management Systems (SLMS). Web Science Semantic Wiki
(WSSW) is used as a reference for this purpose and in a sense emphasizes on how
Semantic Wikis might render to be a rich alternative eLearning approach in the
Linked Data Age.


2     Web Science Semantic Wiki Presentation

WSSW was initially implemented as a project within the framework of Master
Program “Web Science” of Aristotle University of Thessaloniki, Greece. It has
nowadays evolved into a SLMS to support the adoption of Linked Data in the field of
educational procedure following closely the tense towards Linked Data in Education
as it is determined by the University of Southampton and the Open University in UK.
   WSSW manages to conjunct successfully the two predominant semantic
technologies. On the one hand, it uses the classical ontological structure to model its
knowledge base and on the other hand it renders it available to be queried by the
Semantic Web through its Triple Store connector. The purpose was to outreach the
self-contained perspective of Semantic Wikis and try to expose its content to the
Linked Data Cloud as a first step before the overall integration with it.

2.1     The platform

SMW+6-community option- was the platform’s core software base together with a set
of accompanying extensions to the Semantic Mediawiki7 powered by MediaWiki. As
far as the extensions were concerned, the Halo Extension8 was specifically selected to
facilitate WSSW’s use for the more inexperienced users. It provided the WSSW with
a semantic annotation bar and an autocompletion feature but its true value emerged in
conjunction with the rest of the available extensions. Another essential extension used
was the Semantic Gardening Extension9. It was installed to upload and maintain the
WSSW ontologies and to continuously check for their consistency and referential
integrity. It ensured that any anomalies, pages without annotations, and undefined
entities would be detected and finally exported the ontology as it was modified after
the users’ intervention in order to be reused. The extension of Semantic Form10 was
also broadly used to allow users to enter their data semantically annotated without the
need to learn the Semantic Mediawiki’s syntax. Instead, the Semantic annotations
were added indirectly through the simultaneous use of the corresponding templates.
TreeView, (Treeview511 and Javascript dtree) was yet another extension used that

6 http://wiki.ontoprise.de/
7
  http://semantic-mediawiki.org/wiki/Semantic_MediaWiki/
8
  http://wiki.ontoprise.de/smwforum/index.php/Help:Halo_extension
9 http://smwforum.ontoprise.com/smwforum/index.php/Help:Semantic_Gardening_extension
10
   http://www.mediawiki.org/wiki/Extension:Semantic_Forms
11 http://smwforum.ontoprise.com/smwforum/index.php/Help:TreeView_extension
                                                                                        3


requested subtrees automatically and allowed the demonstration of WSSW’s tree as
an alternative way to navigate through the wiki. TripleStoreConnector12 Basic
Extension was installed to enhance WSSW’s querying capabilities. It contained the
Jena Triplestore connector and connected the the Triplestore with the SMW+ and a
tolerant SPARQL endpoint. It allowed queries of the semantic data directly from
within the wiki or remotely via a SPARQL endpoint. This led into better search
results since inverse, equal and transitive properties were reflected in the queries. A
SMW User Manual Extension13 enabled immediate access to help, context-sensitive
selection of help articles and provided direct feedback in the online community,
(submit bugs and ask questions). Semantic Results Format Extension14 was installed
because it bundled a number of result formats for the inline queries. Timelines,
eventlines googlebars, googlepies and semantic maps and calendars were applied to
present several queries in a more comprehensive and representative way.
   Additionally, several other extensions were installed to support the better function
of the WSSW and to succeed in better presentation of the content.


2.2       Structure Description

The approach followed to represent the structure of the actual master program was
based on an ontology, which provided the essential building blocks to best suit the
WSSW needs. It consisted of a primary, master’s program ontology in combination
with the imported FOAF15 (Friend Of A Friend) one (Figure 1). The master’s program
key ontology was implemented initially on Protégé, it was then modified through the
WSSW Ontology Browser and obtained its instances, dynamically, after the users’
entries. On the other hand, the FOAF was imported to fulfill the specific descriptive
needs of the master’s manpower, (instructors and students). What follows is a brief
description of the WSSW structure via presenting some of its elements.
The FOAF class “Person” is used to describe all the persons involved. The Master’s
ontology distinguishes them into the classes of “AcademicStaff” and “Student”, and
each one has its own subclasses. Apart from the class “Person”, the Master’s ontology
also has the “Announcement” class that has as instances all the announcements of the
Master and the class “Assignment” which includes all the assignments announced at
the lectures. An assignment might be the same for all students or individual ones
might be assigned to specific students each time.
In addition, the class “Course” includes all the information concerning each separate
course and the “Degree” class is related to the class “Person” so as to define the
higher educational level of a Person. The class “Lecture” describes the specific
lectures of a course and the “Material” refers to all books, presentations and notes
available on each lecture. The “News” class includes all news related to the Master
and the class “UniversityOrganisation” is a subclass of FOAF class “Organization”.

12
     http://smwforum.ontoprise.com/smwforum/index.php/Help:TripleStoreConnector_Basic
13 http://smwforum.ontoprise.com/smwforum/index.php/Help:User_Manual_extension
14
     http://smwforum.ontoprise.com/smwforum/index.php/Help:Semantic_Result_Formats
15 http://xmlns.com/foaf/spec/
   Several forms were also implemented to allow easier user editing of the WSSW
but to also ensure that the content would be automatically annotated and to maintain a
consistent data structure. Each form creates a new instance for the class it defines.
The fields that the users are asked to complete are the values of the properties of the
instance. Each form is accompanied by its template which contributes further in the
content presentation. Forms are used to create new and/or edit existing instances.


                     Fig. 1. The key ontology at the ontology browser

   Each WSSW registered user owns a unique profile created via the “PeopleForm”
and depending on the category he belongs to, he is assigned with different access
rights in the available forms. For instance, only the Master’s and course Coordinators
have access to the “NewsAnnouncementsForm” form and are allowed to add new
entries at the associating sections. The Master’s Coordinator can also add new, or
renew existing Courses via the “CourseForm” and can additionally assign a Thesis to
an eligible student using the “ThesisForm”. On the other hand, the Course
Coordinators may add new Lectures for their Courses, announce the Project topics
and assign them to one or more Students, always through the associating forms. In
general, each time a registered user attempts to enter a new entry, a table list, result of
an inline query, informs him of any already existing entries and instances of that
certain category and when he completes the values at the form’s fields,
autocompletion feature based on concepts, facilitates his work.
   A course example is presented at Figure 2. The course’s overall timetable is
gathered at the course’s main page as it comes up combining its separate lectures
properties. Its material is, also, collected via an inline semantic query which sum ups
all separate materials added by individual lecturers. The course’s calendar follows as
                                                                                        5


a result of inline query to the lectures timetable (not illustrated at the Figure 2).
Moreover, the users may browse the WSSW via the Semantic TreeView on the right
of the context pane.


                         Fig. 2. A course as it is presented at WSSW


2.3    Highlights

The WSSW provides the usual benefits the most Semantic Wikis tend to exhibit and
as these are summarized in scientific papers. It differentiates however in terms of
content presentation and availability. Most Semantic Wikis seem to lack a much
desired flexibility in the ways that content becomes available to their users. The
WSSW approach caters for this need with a more thorough use of semantic
annotations and attains a greater level of flexibility. It manages to offer consistent and
well presented content in multiple and rich ways. WSSW users may choose one of the
several different presentations that best suits their needs and experience enhanced
browsing and navigation capabilities.
Apart from the inline links in regular semantic wikis, the WSSW offers alternative
ways to navigate through its content. For instance, users may browse the WSSW via
its compendious Factbox which sums up the semantic annotations referred at each
specific page or choose a Semantic TreeView on the right of the context pane. In
addition, one may use the most conventional way of the menu, which is an inline
semantic query on each one. Besides regular navigation, enhanced Semantic Search
Option and friendly Query Interface are provided to the users to serve their needs.
   A well-adapted Semantic Calendar accompanied with Semantic Maps allows users
to be informed of the timetables and the exact places associated with each lecture.
This and all other intersperse information of the WSSW are readily available via
inline semantic queries.
   Another strong point of the WSSW is that it exploits the advantages of a Jena
Triple Store Connector that besides storing, it also exposes data to the Semantic Web
and allows queries to be performed on them from other remote endpoints. This feature
makes the difference comparing to the common Semantic Wikis which are
independent of the rest Linked Data Cloud. The uppermost goal is the WSSW to be
able, also, to incorporate data from Linked Data Cloud as well.
   At the design stage of WSSW, SMW+ set of extensions was the only option for a
wiki powered by MediaWiki to expose its knowledge base to the Semantic Web and
to also allow queries via SPARQL. One has nowadays additional options when
choosing a triple store connector/extension with each providing different functions
and advantages. A comparison of their features is available at [4].


3       Conclusions and Future Work

    Through WSSW we demonstrated that Advanced Semantic Wikis, integrated with
the Semantic Web and Linked Data in the form of SLMS, exhibit true potential in
becoming rich alternative eLearning approaches in the Linked Data Age. For this
purpose a brief sum up of the up-to-date work on the field of education and Linked
Data was presented mainly focusing on the WSSW. The used platform, its ontologies,
its structure and its semantic functions were concisely reported.
    As far as future milestones are concerned it is important to overcome the barrier
that predefined ontologies cause and broaden the presented content by importing other
sources from the Linked Data Cloud without applying any changes to our ontology. It
is also of critical importance for the MSc in Web Science to incorporate an extension
to this Semantic Wiki which will allow the instructors to query multiple repositories
and aggregate selected resources to recommend to students and in general, to embody
data from the Linked Data cloud. Another milestone is to use SKOS representations
of scientific classifications and to characterize, the available learning objects of the
WSSW, so as to clearly define their content and to make it easily searchable. The
uppermost goal is a complete solution which will make broad use of Linked Data.


4       References
    1. Christian Bizer, Tom Heath, and Tim Berners-Lee. Linked data – the story so far.
       International Journal on Semantic Web and Information Systems (IJSWIS), 2009, pp.1-
       22
    2. Bamidis P.D., Kaldoudi E., Pattichis C., "mEducator: A BEST PRACTICE NETWORK
       FOR REPURPOSING AND SHARING MEDICAL EDUCATIONAL MULTI-TYPE
       CONTENT", In proceedings of PRO-VE 2009, Springer Verlag 2009, IFIP Advances in
       Information and Communication Technology 307, pp. 769-776
    3. Bratsas C., Kapsas G., Konstantinidis S., Koutsouridis G., Bamidis P.D., "A Semantic
       Wiki within Moodle for Greek Medical Education", in Pocc of 22nd IEEE International
       Symposium on Computer-Based Medical Systems, New Mexico, USA, 2009, pp.1-4s.
    4. Comparison of Semantic MediaWiki triplestore connectors,
       http://www.mediawiki.org/wiki/Extension:SparqlExtension#Comparison_of_Semantic_
       MediaWiki_triplestore_connectors
       A Linked Data-driven & Service-oriented Architecture
               for Sharing Educational Resources

Hong Qing Yu1, Stefan Dietze1, Ning Li1, Carlos Pedrinaci1, Davide Taibi2, Nikolas Dovrolis3,
                      Teodor Stefanut4, Eleni Kaldoudi3 and John Domingue1
          1
              Knowledge Media Institute, The Open University, MK7 6AA, Milton Keynes, UK
              2
               Italian National Research Council, Institute for Educational Technologies, Italy
                      3
                       School of Medicine, Democritus University of Thrace, Greece
                              4
                               Technical University of Cluj-Napoca, Romania


          Abstract. The two fundamental aims of managing educational resources are to
          enable resources to be reusable and interoperable and to enable Web-scale
          sharing of resources across learning communities. Currently, a variety of
          approaches have been proposed to expose and manage educational resources
          and their metadata on the Web. These are usually based on heterogeneous
          metadata standards and schemas, such as IEEE LOM or ADL SCORM, and
          diverse repository interfaces such as OAI-PMH or SQI. Also, there is still a
          lack of usage of controlled vocabularies and available data sets that could
          replace the widespread use of unstructured text for describing resources. On the
          other hand, the Linked Data approach has proven that it offers a set of
          successful principles that have the potential to alleviate the aforementioned
          issues. In this paper, we introduce an architecture and prototype which is
          fundamentally based on (a) Linked Data principles and (b) Service-orientation
          to resolve the integration issues for sharing educational resources.

          Keywords: Linked Services, Semantic Web, Linked Data, SOA, eLearning,
          Technology-enhanced Learning.


   1 Introduction

   Recently, learning resource repositories have been widely used in higher education to
   enhance the learning process for both students and course creators or teachers. One
   important feature of Web-based learning repositories is to make educational resources
   reusable and shared. Although most of the learning repositories provide open access
   to their educational resource repositories, the integration process is still costly as
   different learning repositories are isolated from each other and based on different
   implementation standards [18]. Therefore, an intuition approach is to federate
   different learning repositories from different institutes, which can foster the aims of
   reusing and sharing educational resources without costly duplicating them into local
   learning repositories. Following such an approach, four major research challenges
   need to be taken into consideration to ensure Web-scale interoperability:
       1.   Retrieving distributed data from heterogeneous Learning repositories:
            distributed heterogeneous data and services are neither widely reused nor
            integrated into learning application environments sufficiently. Standardized
            methodologies to solve heterogeneities between terminologies used by
            distinct data or service providers are not available. Therefore,
            interoperability and scalability of current E-Learning applications is limited
            [17].
       2.   Dynamic metadata mediation: since learning contents and their metadata are
            stored in heterogeneous repositories, the metadata and the services
            consuming the content are usually described by using distinct schemas such
            as Dublin Core1 and IEEE Learning Object Metadata (LOM) [11], and a
            variety of vocabularies accordingly. Therefore, to achieve interoperability,
            distinct schemas need to be mapped and aligned.
       3.   Extensible learning environment: in highly distributed Web-based
            environments, frequent changes occur to available interfaces. That is,
            services as well as repositories are usually added, modified or removed
            regularly based on the new requests and new considerations.
       4.   Enriching existing educational resource metadata: many educational
            resource metadata stored in different Web repositories are non-aligned,
            incomplete, and poorly structured, i.e. free text is still widely used for
            describing educational resources. Therefore, to allow machine-processing
            and Web-scale processing, distributed educational metadata needs to be
            enriched, that is transformed into structured and formal descriptions. This
            may lead to wider interoperability by linking to existing vocabularies,
            schemas and taxonomies.

The most recent development of the Semantic Web [1] is Linked Data (LD) [2] which
has successfully established a set of principles to expose data and metadata on the
Web and has led to the widespread availability and use of schemas, vocabularies and
data sets spanning across all application domains. In that, the Linked Data approach
offers opportunities to substantially alleviate the challenges addressed above.
     In this paper we introduce a Linked Data-driven & Service-oriented architecture
to address the discussed issues above. The four major contributions are:

       1.   Linked Data-principles are applied to model and expose metadata of both
            educational resources and educational services and APIs. In this way, not
            only resources are connected but also services’ description and resources are
            exposed in a standardized and accessible way.
       2.   Existing heterogeneous and distributed learning repositories, i.e. their Web
            interfaces (services) are integrated on the fly by reasoning and processing of
            Linked Data-based service semantics (see step 1).
       3.   Metadata retrieved from heterogeneous Web repositories, for instance IEEE
            LOM resource metadata, is automatically lifted into RDF and exposed as
            Linked Data, exposed and accessible based on Linked Data principles.

1
    http://dublincore.org/documents/dces/
A set of RESTful APIs is developed on top of the integration framework to allow
third party applications to consume and interact with the data exposed by our
approach.
    The remaining sections of the paper cover: Section 2 discusses the background
and motivation of our work. Section 3 illustrates the overview of our proposed
architecture. Section 4 describes the different parts of the architecture while Section 5
introduces an educational application from the biomedical field. Section 6 discusses
the related work and Section 7 finally concludes the paper.


2 Background and Motivation

One use case motivating a general architecture to address above research problems is
a federated search across educational data crossing distributed Web repositories. More
specifically, the user would search educational resources based on, for instance, a set
of keywords, content types, educational subjects or preferred language. That however
requires to: (1) integrate suitable resource stores via their APIs and Web services that
can deliver correct educational resources; (2) dynamically invoke the suitable services
and mediate the results into a single standard and meaningful response; (3) transform
and automatically expose it as structured well-interlinked Linked Data.
   One way to enable educational resources inside individual learning repositories to
be accessed and integrated through distributed environments is to develop Web
services and open APIs on top of educational repositories. However, this naturally
introduces a heterogeneous API landscape, as it is characteristic for the current Web.
For example, one Web service interface from the PubMed2 repository provides an
OAI-PMH-based REST-ful service where response messages are based on XML.
Other Web services offering educational Linked Data from The Open University3
provide a SPARQL [25] end point where response messages are RDF-based.
Furthermore, different services may also operate on a certain language based on the
target user groups. Meanwhile, there are multiple types of educational resource
metadata schema such as IEEE LOM and Dublin Core. Therefore, dynamic mediation
between different metadata schemas and API interfaces is required. However, the
current metadata standards are mainly stored and exposed based on XML and
relational databases and consist largely of poorly structured text keywords, lacking
formal semantics and interlinking with established vocabularies to allow more
advanced discovery and inference mechanisms. That leads also to highly ambiguous
descriptions, which, considering the medical domain, for instance, can prove as
crucial factors for the value and quality of resource descriptions.
   Although some efforts have been made on providing an IEEE LOM-RDF binding4,
this early work was (a) discontinued a few years ago and (b) only focused on the
binding aspect rather than further working towards a Linked Data-principles
compliant approach, e.g. by reusing elements of established Linked Data schemas or
linking metadata to established Linked Data vocabularies.

2
  http://www.ncbi.nlm.nih.gov/pubmed/
3
  http://data.open.ac.uk/
4
  http://dublincore.org/educationwiki/DCMIIEEELTSCTaskforce/RDFPAR
   While our work considers the integration of arbitrary data on the Web, we use the
mEducator educational resource RDF description schema (see mEducator project5) as
a schema according to which we lift existing (non-RDF) metadata for further
processing and exposure as fully interlinked Linked Data. The mEducator RDF
schema6 is fully compliant with Linked Data principles.
   The other important factor of the challenge is the distributed and Service-oriented
environment. The services operating on educational repositories are very dynamic, in
that they might change behaviors and interfaces according to new requirements. Also,
new services (and stores) regularly join the environment while others might
disappear. Therefore, the extendable environment feature is a very important
requirement mainly considering supporting the modification, adding or removal of
services without any impact to upper layers (e.g. existing educational applications and
user interfaces). Therefore, facilitating easy-to-use service representation based on
standard service vocabularies (e.g. SAWSDL [15] and WSMO-Lite [13]) is an
important requirement to allow third party service providers to independently provide
meaningful service descriptions and interlink their services with the environment.
   In this paper, we are building on top of two fundamental technologies which
facilitate data interoperability (a) Semantic Web technologies to facilitate data
interoperability and (b) Services-orientation to allow interoperability at the repository
level.
   The four principles of LD are [2].

     1.   Use URIs to identify things.
     2.   Use HTTP URIs so that these things can be referred to and looked up
          ("dereference") by people and user agents.
     3.   Provide useful information about the “thing” when its URI is dereferenced,
          using standard formats such as RDF-XML.
     4.   Include links to other, related URIs in the exposed data to improve discovery
          of other related information on the Web.

The above principles have proved largely successful throughout the past years,
leading to an ever increasing amount of LD-compliant schemas and data-sets7 as well
as general-purpose tools and APIs allowing to generate and interact with LD.
   Linked services adopt the LD principles to support publishing of service
annotations as LD, expressed in terms of a simple generic service conceptual model
that can be consumed by both humans and machines. While there is a growing set of
tools supporting the development and processing of Linked Services, the iServe
platform8 is one of the implementations to publish Linked Services [16]. In order to
cater for interoperability, iServe uses what can essentially be considered the
maximum common denominator between existing Semantic Web Services (SWS)


5
  http://www.meducator.net/
6
  http://purl.org/meducator/ns/
7
  http://richard.cyganiak.de/2007/10/lod/
8
  http://iserve.kmi.open.ac.uk/
formalisms that we refer to as the Minimal Service Model (MSM). Services are
defined as having a number of operations. Each operation has an Input and Output
MessageContent and Faults. MessageContent is defined to may have MessageParts.
Additionally, iServe adopts the SAWSDL, WSMO-Lite and hRESTS [14]
vocabularies. The SAWSDL vocabulary is used to capture modelReference,
liftingSchemaMapping and loweringSchemaMapping that are important for lifting
and integrating services’ outputs. WSMO-Lite extends SAWSDL with a model to
specify the semantics of the particular service annotations. The hRESTS vocabulary
allows modeling of additional information necessary for Web APIs.


3 Overall Architecture

In this section, we provide an overview of a general-purpose framework which aims
at (i) integrating heterogeneous educational resources and (ii) exposing its metadata as
well-structured and interlinked Linked Data. Our overall proposed architecture
includes three layers: (Web) data and service layer, Data and service integration
layer and Application and presentation layer that are shown in Figure 1.

     • The (Web) data and service layer consists of available educational resource
       metadata, Web services and data sources such as the ones part of the Linked
       Data cloud.
     • The data and service integration layer is based on the Linked Services
       approach, exploiting Linked Data based service annotation, the APIs to
       broker services and an RDF repository for exposing enriched educational
       resource data. The iServe [16] & SmartLink [6] repositories store two
       different kinds of service annotations separately, namely functional and non-
       functional service annotations.
     • The application and presentation layer uses the APIs provided by the data &
       services integration layer to interact with underlying data & services and
       provides an interface to end-users.
Fig. 1. Architecture overview.
The proposed architecture supports a particular methodology as below:

 1.   Query across distributed educational repositories: The Linked Services
      environment allows distributed queries across distributed and heterogeneous
      educational repositories (via their services/APIs) on the fly while query
      results will be lifted into RDF automatically. The Linked Services
      component facilitates dynamic discovery of suitable stores/services (for
      instance services matching a particular user language or subject) and
      execution by adhering to particular execution constraints. New repository
      services can be added by simply adding semantic annotations of the services
      via our Linked Services annotation environments. The service discovery
      engine searches the semantically suitable services by matching service
      annotations with service request parameters that are passed through the
      application layer. Suitable services for a given query are invoked
      sequentially. Finally, different service outputs are lifted into the RDF
      mEducator RDF schema. That way, heterogeneous service responses are
      consolidated into a single schema that simplifies further processing by the
      application layer.
 2.   Storage of selected metadata in RDF store: Exposing of retrieved results as
      Linked Data/RDF in an RDF store. The RDF repository supports two main
      purposes: (1) allows content providers who do not have a metadata
      publishing platform to publish their educational resource metadata through
      the application layer; (2) allows metadata sets from the distributed learning
            repositories (Step 1) to be enriched and exposed in a Linked Data-compliant
            way. The RDF store is implemented based on a Sesame RDF store with a
            BigOWLim and mEducator RDF schema compliant repository and a
            dedicated REST API. Each resource entity owns a unique URI identifier that
            can be dereferenced.
      3.    Data enrichment and interlinking: Metadata can be enriched based on LD
            datasets, e.g. particular ones from the biomedical field. That is of particular
            importance for metadata properties such as keywords, discipline or subject
            where established vocabularies exist on the Web, and in particular the LD
            cloud. Enrichment takes advantage of available APIs such as the ones
            provided by Bioportal9, which allow access to a vast number of established
            taxonomies and vocabularies, such as SNOMED10, MESH11 or Galen12. That
            way, unstructured free text, for instance the keyword “Thrombolysis”, is
            enriched with unique URIs of structured LD entities - such as
            http://purl.bioontology.org/ontology/SNOMEDCT/89551006 which refers to
            a related concept within the SNOMED Clinical Terms ontology – which
            allow not only further reasoning on related concepts but also enables users to
            query for resources by using well-defined concepts and terms as opposed to
            ambiguous free text.


4 Data and Services integration: the Linked Services Approach

Our current implementation of the data and service integration layer builds on
existing Semantic Web Services research, namely the Linked Services approach. As
discussed in Section 2, the Linked Services approach uses a rather lightweight service
annotation schema and applies Linked Data principles to the services domain. Based
on RDF models describing core elements of the service (e.g., operations, input,
output) services are discovered and executed based on a given set of service consumer
constraints.
   To this end, we use the iServe open platform for publishing semantic annotations
of services based on a direct application of LD principles [16]. As introduced
previously, the MSM – iServe’s underlying RDF schema - focuses on functional
properties while the SmartLink service annotation model focuses on non-functional
properties (NfP) as extension to the MSM. SmartLink NfP data are stored in a
dedicated RDF store that is synchronized with the iServe repository at runtime.
   The whole lifecycle of the Linked Services approach includes 3 major steps:

     1.    Editing and publishing service annotations through the SmartLink service
           annotation online interface. SmartLink allows service providers to use a Web
           form to easily annotate service properties step-by-step following the extended
           MSM schema. Meanwhile, service requesters can also look up services
9
  http://www.bioontology.org/wiki/index.php/BioPortal_REST_services
10
   http://www.ihtsdo.org/snomed-ct/
11
   http://www.nlm.nih.gov/mesh/
12
   http://www.co-ode.org/galen/
          manually apart from using our automatic API-based service discovery and
          invocation approach.
     2.   Services discovery based on service request: a set of RESTful APIs has been
          developed to let third-party applications send service requests distinguished by
          different parameter values to discover semantically suitable educational
          resource services.
     3.   Services invocation and lifting: finally, the identified services are invoked and
          heterogeneous service responses are lifted to comply with the mEducator
          Learning Content RDF description schema.

The rest of this section will introduce above steps in details.


4.1. SmartLink: Linked Services editor and search environment

In order to provide a Linked Services editor that allows the annotation of Web
services and Web APIs without any pre-existing documentation, a services annotation
and search tool is developed, SmartLink13 (SeMantic Annotation enviRonmenT for
Linked services). SmartLink allows annotation of Web services and Web APIs based
on the MSM from scratch, that is, without any pre-existing services documentation
such as WSDL or HTML files, as assumed by existing annotation tools (Section 1).
SmartLink operates on top of LD stores such as iServe and is an open environment
accessible to users simply via OpenID14 authentication.
   SmartLink exploits an extension of the MSM schema including a number of
additional non-functional properties. These non-functional properties cover, for
instance, contact person, developer name, Quality of Service (QoS), development
status and service license. MSM-schema properties are directly stored in iServe, while
additional properties are captured in a complementary RDF store based on OpenRDF
Sesame15. The following Figure 2 depicts the overall architecture of the SmartLink
environment.


13
   http://smartlink.open.ac.uk & http://kmi.open.ac.uk/technologies/name/smartlink
14
   http://openid.net/
15
   http://www.openrdf.org/
Fig. 2. SmartLink process and architecture.
Being a Linked Data-compliant environment, one of the core features of MSM is the
capability to associate service descriptions with so-called model references that refer
to RDF descriptions in external vocabularies defining the semantics of the service or
its parts. That way, for instance, a particular service response message can be
associated with an external RDF description which details and further describes the
nature of the response. However, while this feature is useful and even necessary in
order to provide meaningful service models, finding appropriate model references
across the entire Web of data is a challenging task. Therefore, SmartLink uses
established Linked Data APIs – currently the WATSON16 API - to identify and
recommend suitable model references to the user.


4.2. A RESTful API for Linked Services discovery and execution

A RESTful API was developed and designed to allow third party applications to
interact with our RDF annotations of educational services and APIs, for instance, to
discover and execute services.
   The service discovery function of our developed REST-ful API takes three service
requirement parameters of category, subject and language, which can each be
referenced by RDF entities defined in Link Data cloud, such as
http://www.daml.org/2003/09/factbook/languages#English. One example vocabulary
for service categories is the Service-finder ontology17. The subject finally specifies the
educational domain that the service operation deals with. Different vocabularies can
be used to define subjects, for instance the Open Learn classification vocabulary18

16
     http://watson.kmi.open.ac.uk/
17
   http://www.service-finder.eu/ontologies/ServiceCategories
18
   http://meducator.open.ac.uk/ontologies/open-learn-classification.rdf
describes OpenLearn19 learning subject categories. The language specifies the
language requirement for service operation output, and in that, allows the discovery of
services which offer access to educational data in a specific user language. All the
requirement properties are optional (but at least one property should be specified). By
taking these service request parameters into account, the discovery API circulates a
SPARQL query to the iServe and Smartlink service annotation repositories for
gaining information about suitable services. The discover response message is a RDF
based output that contains service name, service description, subject, endpoint and
output language. The Listing 1 is one example of the service discovery RDF for a
service request.


Listing 1 Service description excerpt.
A particular invocation method is provided by the API that supports the lifting of
service responses – whatever response message format (e.g., XML or JSON) – into
RDF compliant with the mEducator RDF schema. The service invocation API method
takes service invocation parameters provided by the application layer and returns a
mEducator RDF schema compliant RDF output.


Fig. 3 Service invocation and lifting process.
Service invocation and lifting: based on service input and output annotations and
corresponding lifting/lowering schema description, an RDF service invocation
message is generated by dynamically matching parameters (specified by client users)

19
     http://www.open.ac.uk/openlearn
to service semantic annotations. As shown in Figure 3, the RDF input message will go
through the “lowering” process of the invocation API to the actual input format of the
service. A service is then invoked with the lowered input. Meanwhile, each individual
service may give different output result such as XML, RDF or JSON. In order to
provide service response messages compliant with the mEducator RDF schema,
native output from service invocations will be transformed via the “lifting” step to
RDF compliant with the mEducator RDF schema. Not only does the RDF output
enable data mashup at semantic level, it also could be used, directly or in combination
with other RDF data, for the invocation of further services through the invocation
API.


5 Metamorphosis+: An Application Layer Combining Social
  Computing with Semantic Data and Services Integration

The data and services integration architecture presented in previous sections is fully
exploited in Metamorphosis+, which merges the paradigms of semantic and social
web to produce an environment for sharing educational resources in health sciences.
MetaMorphosis+ (or MM+) realizes the integration of a novel social environment as
the application layer and user interface with the semantic data and services linking
architecture presented here.
   At the application layer, MetaMorphosis+ can be viewed as two distinctive and
interacting networks. The first one is a network of persons, including authors,
potential authors and final users of learning objects (students, teachers or others, e.g.
educational managers, etc). The second is a network of educational resources. The
network of persons is functioning in a way similar to other social networks. Persons
can interact with each other via their personal blogs, declare friends and create their
own interest groups. At a different level, educational resources themselves create an
equivalent social network. Educational resources in MetaMorphosis+ can be
resources residing in a Learning Management System (LMS), in another educational
repository, or merely available on the Web. Resources (like humans) are represented
in MetaMorphosis+ by their profile. Educational resources as social objects can
exhibit different aspects of ‘object sociality’ [12]: (a) the obvious connections via
common tags; (b) connections based on collective usage and other related interaction
of human users; (c) social connections based on the inheritance as realized via
educational content repurposing; (d) semantic connections realized via semantic
annotations and linking of educational resources. This later social dimension is fully
realized via semantic data and service linking as described in the previous sections.
   The application layer, via the Resource Profile View allows viewing and
management and annotation (Figure 4) of the educational resource metadata as they
are lifted from the RDF triple store. Additionally, a new resource and its metadata can
be declared via the application layer, with the respective updating of the triple store.
Fig. 4. Resource annotation in Metamorphosis+
   The application layer fully exploits the search and retrieval APIs searching for
relevant resources two distinct mechanisms: (a) either search in the RDF store; or (b)
on the distributed learning repositories semantically linked via the data and services
integration layer. The search can be focused on general keywords or keywords found
inside specific profile fields of each resource (Figure 5)


Fig. 5. Search interface of MetaMorphosis+. The user can do the simple search by putting
keywords or do advanced search by specifying the interested metadata fields.
6 Related Work

A peer-to-peer architecture (LOP2P) for sharing educational resources among
different learning Institutions is proposed in [18]. LOP2P aims at helping different
educational institutions to create course material by using shared educational resource
repositories. The major advantage of the peer-to-peer approach is that the learning
repositories can be easily integrated after adding the LOP2P plugin and mediation
layer to each different repository. A similar peer-to peer architecture has also been
proposed in the EduLearn project [17] and [8]. Meanwhile, Simple Query Interface
(SQI) is introduced in [23] designed to query different learning repositories using a
comment query language. However, query format and result format have to be agreed
among different repository providers before using the query functionalities, which
means that a wrapper service is required to ensure compliancy of all involved
repositories with the agreed format. These approaches are sharing three
disadvantages: (1) instead of accepting the heterogeneous landscape of the Web, all
approaches impose either a common schema or interface approach on the underlying
stores. Therefore, the heterogeneity issue hasn’t been solved. (2) The sharing
functionalities are limited by using a defined mediation layer and the mediation is
based on syntactic matching, which is not an efficient mechanism to deal with an
open and distributed environment.
    The work described in [20] and [21] utilizes Semantic Web as well as Web service
technologies to enable adaptation to different learning contexts by introducing a
matching mechanism to map between a specific context and available learning data.
However, this work neither considers approaches for automatic service discovery nor
it is based on common standards. Hence, the reuse and automatic allocation of a
variety of services or the mediation between different metadata standards is not
supported. These issues apply to the idea of ”Smart Spaces” [22] for learning as well.
The work in [3] follows the idea of using a dedicated personalization Web service that
makes use of semantic learning object descriptions to identify and provide appropriate
learning content. Neither is the integration of several distributed learning services
within the scope of this research, nor is the allocation of services at runtime. Further
related research on [9] and [10] allows a mediation between different services based
on a so-called ”connector service”.


7 Conclusion and Future Work

Integrating educational resources becomes more and more important since plenty of
educational resources are published to be available online while a range of interfaces
and description approaches are being used. In this paper, we introduced a Linked
Data-driven & Service-oriented architecture to resolve the integration issues by
addressing interoperability issues in a distributed and heterogeneous environment via
dynamic metadata mediation towards an extensible and fully Linked Data-compliant
resource description layer. The Metamorphosis+ application, an open environment for
(biomedical) education has been developed based on our proposed architecture.
    While the presented work is ongoing research and tackles a number of distinct
challenges such as metadata interoperability, services discovery or data mediation,
there are plenty of opportunities for future works. In the short term and most
importantly, these cover: (1) investigating ways to enable efficient, accurate and
dynamic enrichment of educational data, what involves research fields such as text
mining, entity recognition and ontology mapping; (2) extending the framework with
additional open repositories and data stores to further showcase and evaluate our
services integration approach; (3) integrating the APIs of the “data & services
integration layer” with additional third party applications to further evaluate the
performance and scalability of the architecture. In particular, (1) also needs to cover
the consideration of security and authentication aspects while context-awareness is an
important aspect to be taken into account with respect to (2). Additionally, many
aspects of our proposed framework are domain-independent and it is intended to
deploy similar approaches in different areas, such as eScience.


References

1.  Berners-Lee, T., Hendler, J., and Lassila, O. (May 17, 2001). The Semantic Web Scientific
    American Magazine. March 29, 2009.
2. Bizer, C., T. Heath, et al. (2009). Linked data - The Story So Far. Special Issue on Linked
    data, International Journal on Semantic Web and Information Systems (IJSWIS).
3. Baldoni, M., Baroglio, C., Brunkhorst, I., Henze, N., Marengo, E. and Patti, V. (2006). A
    Personalization Service for Curriculum Planning. In proceedings of 14th Workshop on
    Adaptivity and User Modeling in Interactive Systems, Hildesheim.
4. Davies, J., Domingue, J., Pedrinaci, C., Fensel, D., Gonzalez-Cabero, R., Potter, M.,
    Richardson, M., and Stincic, S. (2009). Towards the open service web. BT Technology
    Journal, 26(2).
5. Dimitrov, M., Simov, A., Konstantinov, M., Momtchev, V. WSMO Studio – a Semantic
    Web Services Modelling Environment for WSMO (System Description). In: Proceedings
    of the 4th European Semantic Web Conference (ESWC). Number 4519 in LNCS,
    Innsbruck, Austria (2007) 749–758.
6. Dietze, S., Yu, H.Q., Pedrinaci, C., Liu, D. and Domingue, J. (2011) SmartLink: a Web-
    based editor and search environment for Linked Services, 8th Extended Semantic Web
    Conference (ESWC), Heraklion, Greece
7. Dietze, S., Gugliotta, A. and Domingue, J. Supporting Interoperability and Context-
    Awareness in E-Learning through Situation-driven Learning Processes, Special Issue on
    Web-based Learning of International Journal of Distance Education Technologies (JDET),
    IGI Global, 2008.
8. Hadzic, M., D’Souza, R., Hadzic, F., Dillon, T. Thinking PubMed: an Innovative System
    for Mental Health Domain. Computer-Based Medical Systems, IEEE Symposium on, pp.
    330-335, 2008 21st IEEE International Symposium on Computer-Based Medical Systems,
    2008
9. Henze, N. (2006). Personalized E-Learning in the Semantic Web. Extended version of 4.
    International Journal of Emerging Technologies in Learning (iJET), 1(1).
10. Henze, N., Dolog, P., and Nejdl, W. (2004). Reasoning and Ontologies for Personalized E-
    Learning. Educational Technology & Society, 7(4).
11. IEEE, IEEE Standard for Learning Object Metadata, IEEE Std 1484.12.1-2002 , vol., no.,
    pp.i-32,               2002,              doi:               10.1109/IEEESTD.2002.94128,
    URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1032843&isnumber=221
    80
12. Kaldoudi E, Dovrolis N, Giordano D, Dietze S., Educational Resources as Social Objects
    in Semantic Social Networks, Proceedings of the Linked Learning 2011: 1st International
    Workshop on eLearning Approaches for the Linked Data Age, Herakleio, May 2011
13. Kopecky, J., Vitvar, T., and Gomadam, K. 2008. MicroWSMO. Deliverable, Conceptual
    Models         for       Services       Working         Group,       URL:        http://cms-
    wg.sti2.org/TR/d12/v0.1/20090310/d12v01_20090310.pdf.
14. Maleshkova, M., Pedrinaci, C., and Domingue, J. (2009). Supporting the creation of
    semantic restful service descriptions. In Workshop: Service Matchmaking and Resource
    Retrieval in the Semantic Web (SMR2) at 8th International Semantic Web Conference.
15. Maleshkova, M., Kopecky, J., and Pedrinaci, C. (2009). Adapting SAWSDL for semantic
    annotations of restful services. In Workshop: Beyond SAWSDL at OnTheMove Federated
    Conferences & Workshops.
16. Pedrinaci, C., Liu, D., Maleshkova, M., Lambert, D., Kopecky, J., and Domingue, J.
    (2010) iServe: a Linked Services Publishing Platform, Workshop: Ontology Repositories
    and Editors for the Semantic Web at 7th Extended Semantic Web Conference.
17. Prakash, L. S., Saini, D. K., and Kutti. N. S., 2009. Integrating EduLearn learning content
    management system (LCMS) with cooperating learning object repositories (LORs) in a
    peer to peer (P2P) architectural framework. SIGSOFT Softw. Eng. Notes 34, 3 (May 2009),
    1-7. DOI=10.1145/1527202.1527212 http://doi.acm.org/10.1145/1527202.1527212
18. Rafael de Santiago, Andre L.A. Raabe, "Architecture for Learning Objects Sharing among
    Learning Institutions-LOP2P," IEEE Transactions on Learning Technologies, pp. 91-95,
    April-June, 2010.
19. Sheth, A. P., Gomadam, K., and Ranabahu, A. (2008). Semantics enhanced services:
    Meteor-s, SAWSDL and SA-REST. IEEE Data Eng. Bul l., 31(3):8–12.
20. Schmidt, A., Winterhalter, C. User Context Aware Delivery of E-Learning Material:
    Approach and Architecture, Journal of Universal Computer Science (JUCS). 10(1), 2004.
21. Schmidt, A. “Bridging the Gap Between E-Learning and Knowledge Management with
    Context-Aware Corporate Learning (Extended Version)”, In proceedings of Professional
    Knowledge Management (WM 2005), Springer.
22. Simon, B., Dolog., P., Miklós, Z., Olmedilla, D. and Sintek, M. Conceptualising Smart
    Spaces for Learning. Journal of Interactive Media in Education. 2004(9), from
    http://www-jime.open.ac.uk/2004/9, 2004.
23. Simon, B., Massart, D., Assche, F., Ternier, S., Duval, E., Brantner, S., Olmedilla, D., and
    Miklos. Z. A simple query interface for interoperable learning repositories. In Proceedings
    of the 1st Workshop On Interoperability of Web-Based Educational Systems, pages 11-18,
    2005.
24. Ternier, S., Duval E., and Vandepitte, P. LOMster: peer-to-peer learning object metadata,
    Proc. of World Conference on Educational Multimedia, Hypermedia and
    Telecommunications, 2002, ISBN 1-880094-45-2.
25. World Wide Web Consortium, W3C Recommendation, SPARQL query language for
    RDF, 2008, (http://www.w3.org/TR/rdf-sparql-query/).
       Zenaminer: driving the SCORM standard
              towards the Web of Data

    Ernesto Mudu1 , Luca Schiatti1 , Giuseppe Rizzo1,2 , and Antonio Servetti1
         1
           Dipartimento di Automatica e Informatica, Politecnico di Torino
                  Corso Duca degli Abruzzi, 24, 10129 Torino, Italy
     ernesto.mudu@studenti.polito.it, luca.schiatti@studenti.polito.it,
            giuseppe.rizzo@polito.it, antonio.servetti@polito.it
             2
                Department of Multimedia Communication, EURECOM
               2229, Route des Crêtes, 06560, Sophia Antipolis, France
                            giuseppe.rizzo@eurecom.fr


       Abstract. In this paper we present Zenaminer, a software architecture
       for linking the SCORM standard to the Linked Open Data cloud. The
       main idea is to overcome some limitations of the SCORM Content Pack-
       aging standard, the most used standard for sharing e-learning contents,
       in order to migrate this material into the Web of Data. We analyze the
       design of a RESTful Web Service for SCORM where each single SCO
       (Sharable Content Object) is exposed on the web as raw data that can
       be created, read, updated, and deleted through the HTTP protocol. In
       addition, borrowing the idea from the concept of web mashup, we pro-
       pose to build the Learning Management System interface as a mashup
       that uses Zenaminer web API to access SCO data and that customizes
       the presentation and interaction with the help of additional CSS and
       Javascript. To achieve this goal, for the creation of the HTML mate-
       rial, we suggest to use a “light formalism” defined by the W3C with
       the name Slidy so that the presentation remains separated from the con-
       tent. Finally, Zenaminer allow users to enrich the SCORM resources with
       comments. Comments can be contents themselves that extend and im-
       prove the original material of the teacher. Unstructured comments are
       automatically annotated with Spotlight linking them to the LOD cloud
       through DBpedia.

       Keywords: SCORM, e-learning, LOD, DBpedia, web service, REST,
       learning object


1    Introduction
The evolution of digital communications and the more and more relevant pres-
ence of the Web in our society lead to significant changes also in the e-learning
scenario. A new term is being used to define this change, “e-learning 2.0” [6]. In
relation to the same shift happened in the Web, among the various innovations
that characterize the shift from the 1.0 to the 2.0 system, in this work we focus
on the new paradigm defined as “Web of Data”. That is, the old Web of mostly
2      Ernesto Mudu et al.

human readable documents has to change in a Web of micro contents that can
be easily processed by machines [1] and by the users. The big advantage is in
the ability to manipulate every single part of a documents on-the-fly with the
possibility to aggregate and remix the data in order to generate new documents
of enhanced value. Micro contents are the building blocks on which Linked Data
can operate, they represent contents that convey one idea and that can be ac-
cessed through a single URL in the space of the Web. Examples of micro contents
can be the arrival and departure times for an airplane flight, the abstract from a
long publication, or other similar. As a consequence of this transformation, the
Web is changing from a read-only platform to a read-write platform where users,
starting from data available on-line, can also share, remix and create new and
original material [12]. Web users are becoming both consumers and producers,
i.e., prosumers.

    These ideas are not new at all in the e-learning scenario. Here, teaching ma-
terial is defined in the form of micro contents named Sharable Content Objects
(SCO). SCOs form the basis of usability, interoperability, and adaptation. They
are the building blocks that can be independently produced, stored, indexed,
composed and evaluated. SCOs can be considered a compromise between raw
data (e.g., image, video, audio, text) and a whole lesson: they have a context,
but small, so they can flexibly arranged to form new lessons or presented in
a different shape changing the appearance of their elements (e.g., logos, titles,
etc.). For example, in this work we have defined a SCO as a group of slides on
the same topic inside a lesson. Thus, the production of teaching material implies
the production of a number of SCOs that can be collected to create an archive
of sharable micro contents. Nevertheless, no services have been defined to mi-
grate this information into the Web of Data. SCOs are shared as documents,
ZIP archives, that a user need to download and extract before being able to use
them in a new project.

    The paradigm of the ”Web as a Platform” [10] requires instead these contents
to be exposed on the Web through an Application Programming Interface (API)
that allows a set of operations on them, i.e., create, read, update, delete. From
a Linked Data point of view, a Learning Object Repository (LOR) can be seen
as a Web Service (WS) where, briefly, each LO is a resource (and so identified
by URI) that can be processed using the HTTP methods (GET, PUT, POST,
and DELETE) of the Representational State Transfer (REST) [7]. In this pa-
per, we propose a novel architecture to expose on the Web, as a Linked Data
resource, the teaching material of a Shareable Content Object Reference Model
(SCORM) packet, the de-facto standard in the production of e-learning content.
Our objective is twofold, the design of a new Web interface for SCORMs and
the migration of LOs into the Linked Data Cloud where the “consumer” can
infer, connect and aggregate data from different repositories in order to create
new value added resources. The added value is not in the data itself, but in how
contents are combined in new ways, how they are presented with new interfaces,
in how they are enriched far beyond they original content defined by the teacher.
         Zenaminer: driving the SCORM standard towards the Web of Data                3

    The remainder of this paper is organized as follows. A review of the current
state of the art is presented in Section 2. The key ideas of our approach, with
some details on the issues of exporting SCORMs by means of a REST web
service, are described in Section 3. Then, in Section 4, we discuss the concept of
separation of concerns, which is at the basis of our tool. In Section 5 we describe
the architecture of the web service and, finally, in Section 6 the use case used to
test our work. Conclusions follow in Section 7 where we stress on the benefits
introduced by the tool into the e-learning community.


2   Related Work
E-learning systems has assumed an important role in the Web community for
the important value of the contents which they shared. In this context, contents
are named sharable content objects (SCO) and they hold information about
lectures and metadata useful to describe better the inferred contents. Based
on the assumption that the Web provides the best opportunity to maximize
both access to and the reuse of learning contents, the overall objective of these
systems are to make sharable content objects usable, reusable and interoperable.
In order to address this, our goals are to define and to implement a model to
export teaching materials adhering the Web of Data.
    Many efforts has been spent to make SCORM environment interoperable3
within the Web cloud. Many of them focused to make interoperable LMSs in a
cloud infrastructure [5] e [4] other, instead, to perform interoperability sharing
learning objects. Although the idea of making a network of services which may
handle teaching materials is interesting for what concerns managing the author
information, security and network reliability, on the other hand it requires other
formalism to make interoperable the atomic information. In addition, it is far
from the idea of making the Web of Data linkable, which is one of the best
practice of the Linked Data.
    Vossen and Westerkamp [14] introduced the need to share SCO by means of
service. They proposed a service oriented architecture (SOA), but any specific ar-
chitecture or implementation was described. In [3], Redondo et al. discussed typ-
ical problems of SCORM-related standards and proposed a web service-oriented
approach as a solution. They introduced the concept of Everything as a Ser-
vice (Eaas) and they mapped each single SCO as a service. A WSDL exposed
information about services available to gather information from the each SCO.
Gonzalez et al [8] extended Moodle4 to the SOA paradigm. Modules and courses
were available from the Web and sharable from a different kind of applications,
making it a portable tool. To sum up, these approaches try to expose features of
these LMSs, using the service-oriented architecture. Our work, instead, exposes
data from each single SCO in a raw way, in order to make this data linkable and
navigable from the Web applications.
3
  According to the Open Knowledge Initiative (OKI) definition, interoperability is
  about making the integration as simple and cost effective as technologically possible.
4
  http://moodle.org/
4         Ernesto Mudu et al.

3      A web service for SCORM

SCORM, i.e., the Sharable Content Object Reference Model, is a reference model
for the creation of web-based e-learning material with the objective to ensure
interoperability, reusability, and accessibility. The SCORM content packaging
section specifies how the course material should be packaged, usually in a ZIP
file, and described. The course is defined as a collection of Sharable Content
Objects (SCOs) that can be associated to a lesson or a part of it. A Content
Object must be a web-deliverable learning unit, that is usually designed as an
HTML page with CSS and Javascript so that it can be launched in a web browser.
     SCORM packages are managed by a Learning Management System (LMS)
that, among other important functions such as administration, tracking, and
reporting, provides also a mean to display this information to the users. In
order to correctly deliver the contents, the LMS parses the package metadata to
understand the course structure, known as the “activity tree”, and to know how
to launch each SCO. Since most of times the material takes the form of slides,
“traditional” LMS use a fixed HTML frameset structure, as shown in Fig. 3,
that includes a list of the available lessons, a set of navigation buttons, and a
panel where the slides are shown.


Fig. 1. Screenshot of the web interface of the Moodle Learning Management System.


      This kind of interface presents some limitations:

    – The SCO designer can completely define the content and presentation of
      each single SCO, but it can not control at all the interface used to present
      and navigate it (that is part of the LMS functions);
    – The SCO designer’s work area is limited by the dimension of the HTML
      frame defined by the LMS, thus it is difficult to integrate different media
      and materials, i.e., a video of the teacher, slideshows, comments;
         Zenaminer: driving the SCORM standard towards the Web of Data              5

 – SCOs are included in the LMS with the graphical style defined by their own
   SCO designer, style that can be unhomogeneous if an LMS integrates differ-
   ent SCO sources. If content and presentation where independent, it would
   be instead easy to redefine each SCO in order to present an homogeneous
   presentation;
 – A fixed interface and the impossibility to separate presentation from content
   make hard to adapt the interface to different displays, i.e., smartphones,
   tablets, netbooks, etc.

    To overcome these limitations, we present Zenaminer, a web service to ex-
port SCORM packages through a RESTful web API, that is a web service im-
plemented using HTTP and the principles of REST (REpresentational State
Transfer). An architecture where a SCORM package is not shared as a single
object (the ZIP packet), but as a collection of smaller objects, the SCOs, each of
them accessible independently from the others. It is the implementation of the
new paradigm of the “Web of Data”, SCOs are publicly available as read/write
raw data that can be retrieved or updated using a REST interface as described
in the next sections.
    The key factors that enable Zenaminer to provide such a functionality are
two:

 – the knowledge of the SCORM content packaging standard, to import the
   ZIP files;
 – the knowledge of the content “internals”, i.e., how the slide information is
   defined, to separate the content from the presentation.

    Zenaminer, for the creation of HTML documents, suggests the introduction
of a “light formalism” proposed by the W3C with the name Slidy [11]. Slidy
defines simple keywords for the class attribute of HTML tags that can be used
to identify HTML elements as slides, titles, sidebar, incremental lists, etc. –
each of them with a particular presentation style defined by CSS or behavior
implemented in Javascript.
    In short, SCORM contents are available as resources of the web service given
a specific URI. The SCO designer can freely define the interface as he likes
it, then contents will be dynamically retrieved from Zenaminer using AJAX
(Asynchronous Javascript and XML) calls. As illustrated in Section 5, web API
give access to each single slide of the course as well as to the table of contents for
the navigation. In addition Zenaminer extends the SCORM metadata to include
also references to additional information such as the video recording of a lesson,
synchronization information between video and slides, etc.
    Taking into consideration another characteristic of what is defined as e-
learning 2.0, that is collaborative learning, Zenaminer enables students to enrich
the SCORM resources with comments on the course, the lesson, or the slide.
Comments can be contents themselves that extend and improve the original
material provided by the teacher. Furthermore, unstructured comments are au-
6        Ernesto Mudu et al.

tomatically annotated with DBpedia Spotlight5 linking them to the Linked Open
Data cloud through DBpedia [1][2].


4     Content presentation

In order to structure the presentation of contents we adopted the Slidy for-
malism, which uses HTML for the description of its items. The choice to adopt
HTML becomes important in our approach because it opens to all devices able to
connect and visualize Web pages and, more important, it intrinsically performs
the separation of contents from the presentation (also called view). According to
that, we may define data able to exchange information and how this data may
be used to build the presentation: this is obtained with the use of the Cascading
Style Sheet (CSS) file. By means of it, a view maker may define models to render
specific items or group of them. These models are named class and are referred
to an HTML page. The interaction between HTML page and CSS is performed
by means of selectors, which are able to point to specific items or class of an
HTML page. In this context, the Javascript, a “scripting” language, performs
the possibility to select an item or a set of them and to change the behaviour of
the view, previously defined. In addition, it works as a tool to enrich dynamically
the presentation, e.g. making table of contents (TOC), suggesting the value of
an acronyms, changing font or window dimensions. Then, it performs an im-
portant role to dynamically access structured data without any layout details,
or raw data, which are coming from different archives spread within the Web
cloud. The technology that allows to gather data from external sources is the
Asynchronous Javascript and XML (AJAX).
    Our work exploits the Separation of Concerns (SoC) principle, introduced
above, in the context of a Learning management system. The difference between
data and view allows to navigate through resources, making customizable views
which better respond to the need of who provide contents (author) and who use,
reuse and redistribute them (user). Data is presented without layout details, raw,
but it is rich of semantics: data is stored with the information about inference to
others and are exposed to users by means of REST APIs. Our approach exploits
the MVC (Model-View-Controller) paradigm, in which the view is the set of
presentation rules for the data, the Model is the amount of data raw available
for a generic topic, while Controller is the set of methods which are needed
to create the communication channel from the Model and the View. Our data
Model is deployed in the system which it holds SCORM packages, while the
View is created on all user machines whenever a user require it. The Controller,
instead, is located on the user machine where the view is created and on the
LMS where SCORMs are available.
    To provide the maximum interoperability and reusability, contents indexed
and managed by our LMS are slides, lectures and courses. All of these are Web
resources available by URIs time-invariant. This feature is important because we
5
    http://dbpedia.org/spotlight
         Zenaminer: driving the SCORM standard towards the Web of Data          7

need to performs durability and reusability to each Web resource. Slide contents
are described by means of HTML Slidy formalism; each slide is composed of
title, pictures, vertical and horizontal scroll bars, interactive items (buttons,
check box, etc.). Lectures are organized as follows: slides and video fragments.
A fragment is a part of video which is related to a particular lecture. To map
lecture and fragments, we introduced a synchronization file. A video is available
and accessible by means of a URI with the above requirements. Course, lecture
and slide exploit the SoC principle, separating the view from the data model.
The video, instead, holds the information about its presentation and it is shared
without changing.

5   Architecture
The architecture is structured in four building blocks that are summarized in
Figure 2:


Fig. 2. Zenaminer architecture. The REST controller is the interface between
Zenaminer and the Internet; the W3C validator is used to check that imported
(X)HTML files are well-formed; the Spotlight client provides automatic annotation;
the database stores imported SCORMs together with comments and annotations.


1. The REST controller is the interface between Zenaminer and the Internet.
   Its features are grouped in three sets (as shown in Figure 3): user manage-
   ment and authentication (login), get and post of comments (comment) and
   management of SCORM packages (scorms, item, outline, sync and slide).
   Each of these features is implemented using the REST architecture, thus
   resources are available through URIs in the Internet. Access to REST calls
   that provide access to (GET) or modify (POST, PUT, DELETE) existing
   contents can be limited using the existing facility for user management and
   authentication.
8        Ernesto Mudu et al.

 2. A local validator is used to verify that (X)HTML files are valid and well-
    formed before they are parsed. When a new SCORM package is imported
    each HTML file that contains slides is validated using the remote interface
    offered by the official W3C validator6 .
 3. When a comment (an enrichment) is received it is passed to the Spotlight
    client. The Spotlight client sends a request to DBpedia Spotlight service and
    receives an annotated version of the comment.
 4. The SQL Database is used to store imported SCORM packages together
    with comments and annotations.


Fig. 3. Detail of the REST controller, its three main features are: user management,
comment management and SCORM management.


   Figure 4 shows the workflow for the upload of a SCORM package into
Zenaminer, it is performed in the four steps below:

 1. the client sends a POST request to /scorm page, including in the request
    body the SCORM package;
6
    http://validator.w3.org
         Zenaminer: driving the SCORM standard towards the Web of Data             9

 2. the REST controller receives the package, unpacks and analyzes the files
    containing slides and sends them to the W3C validator;
 3. the W3C validator validates individual files and reports the result to the
    REST controller;
 4. if all files have been validated correctly the package is imported into Zenaminer.
    Otherwise if files are not validated Zenaminer sends to the client an error
    message containing errors generated by the W3C validator.


Fig. 4. Workflow describing the upload of a SCORM package in Zenaminer: 1. a
SCORM package is received from a client; 2. (X)HTML files are forwarded to the
validator; 3. (X)HTML files are validated by sending requests to the offical W3C val-
idator; 4. if all documents are valid, the SCORM package is stored to the database.


    Zenaminer gives the possibility to its users to enrich the content of a given
content (e.g. a slide) by adding a textual comment to it. Such comments are
named enrichments because they are meant to add more information to the
content of a SCO, for example an enrichment to a slide could be a proof to
a theorem, a correction to its content, an in-depth explanation to a concept,
etc.. In order to avoid that content of an enrichment is buried in the database
we added automatic annotation in order to enable machines to access to the
content, enhancing search search and the possibility to display complementary
information.
    Figure 5 shows the actions performed by Zenaminer when a comment is
received:

 1. the client sends a POST request to /comment page, the message body con-
    tains the text of the comment;
    An example of enrichment is:
    President Obama called Wednesday on Congress to extend a tax break
    for students included in last year’s economic stimulus package,
    arguing that the policy provides more generous assistance.
 2. the text is forwarded to the Spotlight client;
10      Ernesto Mudu et al.

 3. the Spotlight client sends a request to DBpedia Spotlight and obtains as re-
    sponse an automatically annotated version of the comment. As an additional
    parameter to the annotate request we could pass a minimum confidence and
    support values that will impact on precision and recall of the annotation
    itself;
    The request URI to annotate the enrichment in the previous example is:
    GET http://spotlight.dbpedia.org/rest/annotate?
    text=President%20Obama%20called%20Wednesday%20on%20Congress%20to
    %20extend%20a%20tax%20break%20for%20students%20included%20in%20last
    %20year%27s%20economic%20stimulus%20package,%20arguing%20that
    %20the%20policy%20provides%20more%20generous%20assistance.
    &confidence=0.4
    &support=20
    The response (in JSON format below) contains a reference to the annotated
    term in the enrichment (@ surfaceForm) and the DBpedia URI relative to
    the annotated content (@URI):

      {
    "@URI": "http://dbpedia.org/resource/United States Congress",
    "@support": "9321",
    "@types": "Organisation,Legislature",
    "@surfaceForm": "Congress",
    "@offset": "44",
    "@similarityScore": "0.14802740514278412",
    "@percentageOfSecondRank": "0.6257434730652487"
    }
 4. finally, the enrichment and its annotations are stored into the database.
A SCORM package includes a XML file named “manifest”, it describes a course
and is used to list all lectures in it, all resources (files) associated to each lecture
and some sequencing rules used to define how the user accesses to SCOs. Figure
6 shows in a tree structure how the manifest file is represented into the relational
database. In addition, a SCORM package lists several SCOs, each SCO includes
an HTML file containing slides, a CSS file to describe the style, a Javascript
file to describe the behavior. Additional files can be included in a SCO: videos,
subtitles or files describing synchronization between a video and the slides. Files
linked into the manifest are then packaged together into a ZIP file. Currently
Zenaminer does not support sequencing rules, thus management of such rules is
up to the SCO designer.


6    Use case
The validation phase of the Web-service was conducted during the course “Mul-
timedia Environments” (Academic Year 2010/2011) for the Master of Science
in Cinema and Media Engineering at the Politecnico di Torino. Zenaminer was
thought as a service in order to design e-learning projects (SCORM Packages)
         Zenaminer: driving the SCORM standard towards the Web of Data             11


Fig. 5. Workflow describing the reception of a comment in Zenaminer: 1. a comment
is received from a client; 2. the content of the comment is forwarded to the Spotlight
client; 3. the Spotlight clients contacts DBpedia Spotlight to obtain an annotated
version of the comment; 4. the comment together with annotations are stored to the
database.


Fig. 6. A tree representation of a SCORM package in Zenaminer. A SCORM package
is a set of SCOs, each SCO represents a lesson in a course, each SCO contains several
files like HTML, CSS, JS or videos. HTML files containing a presentation are parsed
and slides are extracted.
12        Ernesto Mudu et al.

for the course “Multimedia Environments”. The objective for the students was
to create a SCORM package defining both content and presentation. Students
were divided in 20 teams, each team defined a personal learning environment
building different interfaces (using the SoC concept). Students acted as SCO
designers, the projects were the use case for Zenaminer and we verified the po-
tentiality of the separation of content from presentation. The demo is available
at http://eridano.polito.it:8080. Figure 7 and 8 show two different views of the
same raw data done by two different groups of students. Such interfaces are able
to show same contents in two different ways. The controller of each interface
collects the list of SCORM packages stored into Zenaminer. Depending on the
SCORM package selected by the user, the controller gathers the entire set of
lectures for that package and displays the related Table of Contents (ToC). The
user is, then, free to navigate the lectures (SCOs) following the ToC, when a
lecture is selected the controller requests all files associated to that lecture and
the view is updated accordingly. For example, in both figures the selected SCO
had a set of slides and a video associated to it. Thanks to the sync file, the view
is able to synchronize the video with the slides.


Fig. 7. One of the user interfaces designed by students for the course Multimedia
Environments.


   The Web Service was developed with the framework Pylons 1.07 . The source
code8 is released under the GNU GPLv3. In order to better balance and dis-
7
     http://pylonshq.com
8
     https://sourceforge.net/projects/zenaminer/
           Zenaminer: driving the SCORM standard towards the Web of Data          13


Fig. 8. Another example of user interfaces designed by students for the course Multi-
media Environments.


tribute the load we decided to use Apache 2.2 HTTP Server9 as interface be-
tween the requests towards the Web Service. As relational database we used
PostgresSQL 910 ; it stores data extracted from the SCORM by the web-service.
Raw data stored in the database are available through REST calls and serialized
into JSON format. The W3C Markup Validation Service is used to validate the
HTML of the lectures that must be conform to W3C standards. Finally we used
Spotlight to automatically annotate the enrichments of learners.


7      Conclusions

In this paper we proposed and argued a Web Service Architecture for e-learning.
Zenaminer goes beyond the limitations of the SCORM and allows a complete
separation of content and presentation. Content creators are free to define learn-
ing environments without constraints of traditional LMSs. Zenaminer does not
limit the definition of the interfaces, the graphic design and the learner interac-
tion with a SCO. In the field of e-learning the presentation (the interface) does
not only involve the aesthetic appeal and the usability but also affects the learn-
ing process. In fact the interface design [13] and the adaptation of interfaces to
the learner [9] are two key factors for e-learning success.
 9
     http://httpd.apache.org
10
     http://www.postgresql.org/
14      Ernesto Mudu et al.

    The “Separation of Concerns” (SoC) allows SCO designers to improve e-
learning acceptance. For example a SCO designer could create different envi-
ronments for the same course according to the learners’ computer literacy or
could improve the interoperability of SCOs designing different interfaces for dif-
ferent devices (e.g. Smart Phones). Not secondary is the possibility to integrate
and manage multimedia contents as separate entities from the learning content.
In traditional LMSs a multimedia content must be integrated in a SCO while,
in Zenaminer, content-types other than text (e.g. videos) can be managed and
integrated regardless of SCOs. This feature does not compromise the SCORM
compatibility with traditional LMSs.
    Finally Zenaminer allows collaborative learning. Learners have an active role
in the definition of contents thanks the possibility to enrich lectures with personal
contributions. Content creators are free to define different models of collabora-
tive learning; they can implement a Wiki model or they can choose a model with
less collaboration. In addition to collaborative learning it is possible to integrate
contents thanks to instruments offered by the Semantic Web. The Automatic
annotation allows the integration between of content with hypertext link navi-
gation in order to satisfy more effectively the information needs of the learner.
Furthermore the SCO designer could decide to allow learners to disambiguate or
integrate automatic annotations. The use of ontology-based annotation pushes
Zenaminer to be a Linked Data LMS. The architecture proposed in this paper
allows Instructional designers to better implement learning theories (e.g. Cogni-
tivism, Constructivism) and teaching strategies. For example they could choose
to minimize the content of a course and focus on collaborative learning; this ap-
proach satisfies the key aspect of Constructivism that considers fundamental the
collaboration between learners [15]. Moreover this work represents a step towards
the publishing of LMS related data to the Web of Data and new future works
are planned. First of all, to be totally compliant to the Linked Data paradigm
we want to investigate about the conversion of LMS related data to RDF, and to
expose them using a SPARQL endpoint. Then we want to investigate about the
use of disambiguation process through the DBpedia Spotlight. The idea behind is
offer to users the possibility to better understand ambiguous concepts by means
of the semantic disambiguator. This tool may highlight ambiguous concepts in
a SCO and may suggest multiple references per each ambiguous concept, then
user may choose the most interesting.


Acknowledgments

Thanks to all students of the course “Multimedia Environments” academic year
2010/2011 of the Politecnico di Torino. They were developers and analysts of
our Web Service with a vibrant cooperation spirit, which was fundamental in
achieving this important goal. A special thanks to SCO designers Jacopo Berta,
Federica Tina Bossa (who developed the SCO view shown in Figure 7) and Lucia
Marengo, Alice Ferrari (who developed the SCO view shown in Figure 8).
          Zenaminer: driving the SCORM standard towards the Web of Data              15

References
 1. Berners-Lee, T.: Linked data. International Journal on Semantic Web and Infor-
    mation Systems 4(2) (2006)
 2. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hell-
    mann, S.: DBpedia - a crystallization point for the web of data. Web Semantics
    Science Services and Agents on the World Wide Web 7(3), 154–165 (2009)
 3. Dı́az Redondo, R., Fernández Vilas, A., Pazos Arias, J.: Educateca: A web 2.0
    approach to e-learning with SCORM. In: Cellary, W., Estevez, E. (eds.) Software
    Services for e-World, IFIP Advances in Information and Communication Technol-
    ogy, vol. 341, pp. 118–126. Springer Boston (2010)
 4. Dong, B., Zheng, Q., Qiao, M., Shu, J., Yang, J.: BlueSky cloud framework: An
    e-learning framework embracing cloud computing. Lecture Notes in Computer Sci-
    ence Cloud Computing 5931, 577–582 (2009)
 5. Dong, B., Zheng, Q., Yang, J., Li, H., Qiao, M.: An e-learning ecosystem based on
    cloud computing infrastructure. In: Proceedings of the 2009 Ninth IEEE Interna-
    tional Conference on Advanced Learning Technologies. pp. 125–127. ICALT ’09,
    IEEE Computer Society, Washington, DC, USA (2009)
 6. Downes, S.: E-learning 2.0. eLearn 2005 (October 2005)
 7. Fielding, R.T., Taylor, R.N.: Principled design of the modern web architecture.
    ACM Transactions on Internet Technology 2(2), 115–150 (2002)
 8. Gonzalez, M., Penalvo, F., Guerrero, M., Forment, M.: Adapting lms architecture
    to the soa: An architectural approach. In: Proc. of the Fourth International Con-
    ference on Internet and Web Applications and Services, ICIW ’09. pp. 322 –327
    (May 2009)
 9. Mödritscher, F., Barrios, V.M.G., Gütl, C.: Enhancement of SCORM to support
    adaptive e-learning within the scope of the research project AdeLE. In: Nall,
    J., Robson, R. (eds.) Proceedings of World Conference on E-Learning in Corpo-
    rate, Government, Healthcare, and Higher Education 2004. pp. 2499–2505. AACE,
    Washington, DC, USA (2004)
10. Murugesan, S.: Understanding web 2.0. It Professional 9(4), 34–41 (2007)
11. Raggett, D.: Slidy - a web based alternative to Microsoft PowerPoint. W3C (2006)
12. Richardson, L., Ruby, S.: RESTful Web Services. O’Reilly Media, Inc. (May 2007)
13. Selim, H.M.: Critical success factors for e-learning acceptance: Confirmatory factor
    models. Computers & Education 49(2), 396–413 (September 2007)
14. Vossen, G., Westerkamp, P.: Why service-orientation could make e-learning stan-
    dards obsolete. International Journal of Technology Enhanced Learning 1(1), 85–97
    (2008)
15. Wilson, B.G.: Constructivist Learning Environments: Case Studies in Instructional
    Design. Educational Technology Publications, illustrated edn. (1996)
         Developing controlled vocabularies for educational
                 resources sharing: a case study

          Daniela Giordano1, Isaak Kavasidis1, Concetto Spampinato1,
                            Panagiotis Bamidis2
     1
         Dipartimento di Ingegneria Elettrica, Elettronica e Informatica, Università di Catania,
                              Viale A. Doria 6, 95125 Catania, Italy
         2
           Lab of Medical Informatics, Medical School, Aristotle University of Thessaloniki,
                              PO Box 323, 54124 Thessaloniki, Greece
                        {dgiordan, ikavasidis, cspampin}@dieei.ing.unict.it
                                      {bamidis}@med.auth.gr


         Abstract. We present a novel controlled vocabulary for the classification of
         Educational Resource Type and Media type that has been developed within the
         mEducator Best Practice Network, and discuss the rationale behind its design
         and its SKOS implementation. Several available controlled vocabularies were
         analyzed, but none was found entirely satisfactory for the project's purposes.
         Although the driving motivation of this effort was to account for and deal with
         multi-type content variety that is especially characteristic of the medical field,
         the result of this process and the proposed organization generalise fairly well to
         broader educational/learning contexts. We discuss the proposed vocabularies
         pointing out the multifaceted role they may fulfil in the linked learning era.
         Keywords: Resource Type, Media Type, controlled vocabularies, SKOS,
         Linked Data, metadata reuse


1 Introduction

Thesauri and taxonomies (which extend simple lists of terms by expressing also the
relationship between terms) are currently acknowledged as an essential tool in any
successful knowledge management effort [1]. In a taxonomy terms are arranged or
linked in a tree, so that narrower, more specific “children” terms fall under broader,
more generic “parent” terms; in thesaurus relationships of affinity, synonyms and
relatedness among terms can be expressed. Controlled vocabularies, more simply,
consist of a flat list of terms used for indexing or categorizing; their goal is to ensure
consistency in indexing, tagging, or categorizing and to guide the user to the desired
information. In this respect, controlled vocabularies can be used to assist the users
either in filling metadata for resource description, and conversely, can be proposed in
the user search interfaces to assist during the search process. This latter practice
somehow addresses a recent criticism of taxonomies/controlled vocabularies, i.e. that
often they are too far from the user language, and therefore are unlikely to be used
spontaneously for searching. Hence the support for effective resources retrieval is
questioned. On the other hand, “folksonomies” emerging from social tagging of
resources do provide a collection of terms closer to the end-user language.
Taxonomies and folksonomies are not mutually exclusive or incompatible, and there
is a fast growing body of literature pointing out the benefit of their integration [2], [3].
   In systems that resort to user generated metadata, as content sharing solutions often
do, there is a tension between the design choice of enforcing controlled vocabularies
versus allowing free tagging and filling of the metadata fields. This tension originates
from considering that in the indexing phase, when a resource is published, and should
be described as effectively as possible, the user, who is not a professional indexer,
should be aware of the vocabularies, and know how to use them. Often this is not the
case, and even when the user is aware of the existence of a classification system, its
actual usage, especially in the case of vocabularies with a very large number of terms
(e.g., MeSH) will depend on how easily it can be browsed and understood. On the
other hand, filling with unconstrained vocabularies metadata fields, often results in
poor quality because of semantic ambiguity in the name of the fields [4].
   From a Linked Data (LD) perspective [5] some of these issues can be solved or
ameliorated at a fundamental level. In fact, the LD paradigm places emphasis on the
reuse of available metadata fields even from different schemas, which inherently
favours the reuse of the most useful and meaningful fields; whereas concerning the
actual choice of controlled terms, there is the freedom to choose any system that suits
the need of the community, as long as it is treated consistently with the DL principles
and links are created with other data sets dealing with similar terminologies. An even
better practice is to create through RDF links explicit mappings, by resorting to RDF
or OWL properties between similar terms in different classification systems (e.g.,
through the use of OWL’s "same as", or "relates to" or SKOS’s "closeMatch",
"exactMatch”, etc.). One clear advantage of this approach is the enabling of automatic
metadata enrichment by resorting to the LD Cloud (e.g., [6], [7]): this practice may
result into more robust search and retrieval systems also by assisting the users with
dynamical suggestion of terms during metadata publishing or during the searches.
   The goal of this paper is twofold. First, we present a controlled vocabulary
concerning Educational Resource type and Media type that has been developed in the
context of the mEducator project, funded within the eContentPlus Programme and
concerned with best practices in innovative technical solutions for educational content
sharing and repurposing. This vocabulary has a specific raison d’être in the context of
a very wide and open network of stakeholders broadly concerned with medical
education and also with the practical issue of repurposing educational resources that
are often overspecialised and very expensive to develop. Then we discuss
generalisation issues and, based on the experience accrued within the project we
derive some implications in the context of the Linked learning movement.


2 Background: multi-type content sharing and repurposing

The mEducator project ("Multi-type Content Sharing and Repurposing in Medical
Education, www.meducator.net) is a Best Practice Network funded by the European
Programme eContentPlus and is concerned with the evaluation of two innovative
ways to share medical educational contents across European institutions and across
the community of educators (i.e., one solution utilizing semantic web services
technology and one solution based on mashup technology). Among the project's goals
is the provision of recommendations to relevant standardization bodies (e.g. to
Medbiquitous, that currently employs the HLOM, the Healthcare extension of the
IEEE LOM, [8]) regarding suitable, additional extensions to the metadata schema to
effectively support both the sharing and the repurposing process. In this regard, an
action of the project has been concerned with the actual design and development of
these extensions. After several rounds of polling the partners about inclusion of the
available fields from the HLOM, and critical re-examination of the HLOM from the
Technical Reference Group of the mEducator Consortium, an RDF-based meducator
schema was developed [9], reusing existing metadata fields and including new
meducator fields dealing specifically with the repurposing history of a resource and
other pedagogical aspects of the resource, e.g. medical educational outcomes.
   One line of investigation of mEducator that is at the basis of the work presented in
this paper concerns the identification of educational content types that are specific to
medical education and to test the descriptive adequacy of the currently available e-
learning metadata standards against these content types. For example, a Virtual
Patient (VP) is a specific medical content type that employs a problem-based learning
approach and allows the learners to follow either linear or branched paths to diagnose
the patient and suggest treatments [10]. Similarly, interactive medical images (e.g., X-
rays) where the learners can draw tracings to identify anatomical structures or
pathologies/abnormalities and obtain visual feedback are a novel resource type
especially suitable for the development and refinement of visual perception skills
[11]. Initially, the distinction between content type and resource type was blurred, and
the only agreement within the Consortium on the semantics of the field was that it
should try to capture the nature of the educational resource without any reference to
topics or disciplines. Thus, during the first iteration of testing and validation of the
proposed meducator metadata schema, the field "content type" was included, and it
was left as free text to fill (i.e., the initial decision was not to use any controlled
vocabulary). The rationale of this decision was to gather data about the variability of
the type of educational items that would be contributed by the content providers of the
Consortium, and use this information to inform subsequent decisions about the
meducator schema and the potential need to resort to controlled vocabularies.


3   Methodology

3.1 Deriving the requirements

The first version of the schema, where content type was a free text field, was used by
the Consortium Partners to describe the educational resources that each one had
agreed to contribute to the project. This was done by resorting to MetaMorphosis, a
social network based on ELGG platform [12], where the users from the participating
institutions could register and contribute their resources by filling the metadata forms.
The data from 350 forms contributed by about 100 users form the various institutions
were analysed by means of content analysis.
    From the analysis of how the users filled the field it was apparent a conflation
between format (e.g. PPT presentation), educational content type (e.g., a tutorial, or a
case study) and tool/interaction type. For example, we had users who referred to
"webTraces" to indicate that their contents were images annotated with graphical
feedback, since such content had been developed within the LCMS "WebTrace"[11].
This conflation of several aspects into the same field, pointed to the need to clearly
differentiating "media/format" aspects from "content type", and to attempt a
differentiation between the pedagogical role of the resource and the rhetoric of
presentation/interaction.
    Some semantic overlap with notion of “subject/topic” was noted in the way users
filled the form, although this was not entirely unexpected. In addition, the analysis of
the content to be shared revealed that, consistently with the aim of the mEducator
project, the type of materials to be shared amongst were broader in scope, and would
include design documents such as curricula, or elaborations over official documents
(typically, clinical guidelines) to be considered jointly (i.e. as companion resources);
other cases would include referral to general resources available on the web, where up
to date materials would be issued regularly on a given topic or discipline.
    The broad typology of sharing needs from the mEducator community target users
was an additional reason to support the choice of resorting to a controlled vocabulary.
    The general requirements for the controlled vocabulary were:
  •    Maximise reuse from existing vocabularies, also in the light to ensure machine
       processability and compatibility with existing standards.
  •    Cater for the need of sharing materials that might not be "strictly" educational.
       This implies complementing a focus on the instructional/pedagogical aspect of
       the resource (e.g. a tutorial) with a focus on the informational model (e.g. a
       "news", regardless of whether it is being delivered through TV, Internet, or
       Newspaper). A clear separation of the informational model from the media of
       delivery would facilitate repurposing across different media.
  •    In the characterization of the media, novel types emerging from the Web 2.0
       paradigm should be considered (e.g., Wikis, blogs, and the like).
  •    Take into account the variety of pedagogical approaches that are in place in
       the community and are reflected in community specific "content types". This
       requirement might be referred to as pedagogical flexibility [13].
  •    Completeness, with respect to the specific "content types" in use in the
       community of medical educators.
  •    Support efficiency in the retrieval, but also in the annotation process, i.e., the
       vocabulary should be as lean as possible and understandable at a glance.


3.2 Related work

A widely shared criticism of current e-learning standards is that they fail to address
adequately the "learning" [14], and, in general, miss crucial information required for
efficient searching and automated processing [13]. From the perspective of the e-
learning community, two works are especially relevant. In [13] the LOM "learning-
resource type" field is criticised on the ground that the controlled values mix
instructional (e.g., exercise, simulation, experiment) and format information (e.g.,
diagram, figure, slide, table) that need to be separated. This is exactly the situation in
which we incurred by leaving the field free to be filled by the users. It should be noted
that the same problem occurs in the HLOM extension, where both instructional (e.g.,
tutorial, Virtual Patient, simulation) vocabulary values and format values (e.g.,
narrative text) have been added. The solution proposed in [12] is an ontology, with
root class Instructional Object. The philosophy of this work is to subsume under the
subclass "Concept" the specific domain of interest (definition, facts, laws and
processes) and to complement "Concept" with "Satellite" elements, a subclass that
captures the instructional information associated to any concept of the domain (e.g.,
real world problem, exercise, exploration, example non-example, explanation,
remark, evidence, etc). Format and media aspects are not addressed in this ontology.
   On a similar note, but with a different goal, i.e., supporting automated annotation,
[15] proposes the ALOCOM ontology to capture both the content structure and the
content type of a Learning Object (LO). The need to express the Structure of the LO,
stems from having to deal effectively with the different granularity levels of a LO,
which is seen as consisting of various "content fragment" (non further decomposable
media); "content object" (an aggregation of fragments with added navigational
features), and "learning object" (an aggregation of content object on a learning
objective) The proposed classes for learning object are: lesson, course, chapter, test
and tutorial. From the content type perspective, the pedagogical roles are captured in
the "supporting" subclass of content object, and at the moment include: example,
exercise, reference, description, illustration, question, and answer. This is somehow
similar to the "satellite" subclass in [13].
There exist several other Resource Types and Media Type vocabularies. Some have
been developed within broader metadata standard specifications, e.g., Dublin Core
[16], some have been specifically developed by educational communities, such as the
RDN/LTSN (Resource Discovery Network/Learning and Teaching Support Network)
[17]; a comprehensive listing of the variety of vocabularies developed in the attempt
to capture pedagogical aspects is provided in [18], where some general requirements
that should inform their development are also provided.
  Concerning the media, in addition to the MPEG-7 ontology [19] there are
specialisations in narrower domains, e.g., the RDA/ONIX framework for resource
categorization in the media publishing domain [20]; whereas another set of relevant
terminology is available through the categories "publication format" and "publication
component" widely accepted medical thesauri MeSH [21]. Another relevant work is
the SIOC ontology [22], providing terms especially relevant to characterize emergent
social media.


3. 3 Design considerations

The methodology used to craft the taxonomies or the controlled vocabularies for
mEducator was to analyze first existing vocabularies, taxonomies and ontologies, to
evaluate their fit for reuse. Selection of the most appropriate terms was then
performed based on coverage of the meducator requirements, and then adding the
missing concepts emerged from the first filed testing of the mEducator schema.
   Critical analysis of the available resources, however, pointed out some
shortcomings with respect to mEducator needs. A common problem was lacking
expressivity in the description of learning/teaching activities implied or supported by
the resource, other ones were too biased towards lecturing-based pedagogy or tutorial
expository modes; others were too much oriented toward media fragments (e.g., [19]).
Others would be too focused on various granularity levels in content organization
(e.g. [15]) whereas other used the more practical approach of condensing in one term
more than one aggregation level (e.g.,, "course/module/unit" in the RDN/LTSN
vocabulary [17]). This latter approach was adopted as a design criteria, since it was
deemed a good practice, alternative to pursuing generalisation into more abstract,
catch all terms, that often tend to be semantically ambiguous. The expected benefit is
lessening the cognitive load on the user without sacrificing precision of retrieval, at
least as far as concerning identification of the nature of the resource.
    Analysis of the identified sources mentioned in section 3.2 resulted in the selection
of 45 concepts obtained by picking up from the sources, merging with mEducator
community specific terms, and in some cases rewording some concepts. Nearly none
of them were mutually exclusive; thus, from the user annotation perspective, the
classification grid must enable multiple selection to characterise reasonably the
resource to be shared.
   In general, the term selection and refinement process was carried out taking into
consideration that in the end, the resources will be indexed by the end-users and not
by professional cataloguers, therefore attempting to obtain manageable lists (e.g., a
too long list should be categorised in a way meaningful to the end-user), with terms
readily visualised and familiar (ideally, self-explaining).


4 The mEducator Resource Type Taxonomy

Resource Type in the context of the mEducator project is defined as:

  A classification of the Resource based on its informational qualities, taking into
  account the nature of the information provided, how it is organized, presented or
  collected (e.g., a tutorial, a textbook, a simulation) and the nature of the
  interactions that are expected to take place between the users and the content
  (e.g., reading, practicing, experimenting, assessing, etc.). Resource type includes
  all the artifacts that traditionally have a role in the context of educational
  practice, and any document, product or tool that has a role during the practice
  of healthcare professions and that can be either the focus or a supporting
  resource for the design of an educational experience (e.g., a laboratory test, or a
  clinical practice guideline).

  On purpose, aspects strictly related to media, format and implementations were not
considered, to give primacy to the nature of the information provided by the resource.
The 45 concepts were classified according to the following three broader categories:
1) Educational practice artefact
     Definition: any document, product or tool designed and developed specifically
     for use in any phase of the teaching/learning process, including the supporting
     institutional and administrative activities, or generated as a result of the
     educational activity.
     Comment: this category includes all the artefacts that have a role strictly within
     the context of educational practice, and as such, would be of no particular value
     in the context of practicing a healthcare profession (e.g., a Game, a Study Guide,
     a reading list)
2) Professional practice artefact
     Definition: any document, product or tool that is used during the practice of
     healthcare professions.
     Comment: The emphasis is on the artefacts that have their “raison d'être” in the
     professional context (e.g. a clinical record or diagnostic test) but that can be
     effectively introduced in the educational activities to design realistic, authentic
     learning experiences. Resources of this type might not be readily available in
     digital format; therefore the capability to identify and retrieve them would
     facilitate their repurposing to customize learning/teaching activities.
3) Reference Material
     Definition: any collection of information that is not routinely used in
     professional/or educational practice, but it is used only for reference purpose.
     Comment: for this category a complete mapping to a selection of MeSH terms
     was possible, and the Mesh definitions were reused.

To facilitate reuse, the vocabulary has been further organised into "Core Resource
Type " and "Medical Resource Type" vocabulary, this latter being an extension of the
Core Resource Type. Table 1 lists the terms under each category, and definitions for
each term are provided in Appendix. It must be noted that a Resource can be indexed
according to multiple terms from the same or distinct category; for example, a
“course/module/unit” might contain a “simulation”, “problem/exercise with
feedback” and “reading list”, and therefore should be indexed with these four terms.


4.1 Comments on the vocabulary values: towards generalisation

It is interesting to note that a good proportion of the items that have been identified
are quite general and would apply to any teaching/learning setting, regardless of the
specific domain, although this analysis has net been done in the light of completeness
with respect to other domains. It is easy to see that for many items collected under the
Medical resource type, equivalent ones can be found in other domains.
  In the educational practice artefacts, a Virtual Patient could be seen as a type of
Simulation, and Clinical Case Study as a domain-based specialisation of Case Study;
making these specialisations explit as medical type extensions increases precision.
  "Teaching file" was an interesting case of semantic differences across communities,
since medical groups such as radiologists refer to a well established practice of
creating files with Images to be discussed/diagnosed by the students, whereas other
communities (paediatrics, general educators) used the term to refer to a set of
organized resources meant for the teacher. Thus the approach was to keep the term as
used by radiologists and include it in Medical Resource Type, and introduce the
notion of Resource Pack that generalizes the second sense of "Teaching file" (see
definition in Appendix).


Table 1. The mEducator Resource Type taxonomy. In appendix 1 definitions are listed.

                           mEducator Resource Type Taxonomy
                  Educational Practice        Professional Practice               Reference
                  Artefact                    artefact                            material
                  Course/Module/Unit                 Professional Practice        Atlas
                  Lecture (recording)                Guideline
                                                                                  Bibliography
                  Lecture Slide/Presentation         Scientific Journal Article
                                                                                  Dictionary
                  Lecture Notes/Handout              Diagnostic algorithm
                  Textbook/Chapter                                                Encyclopaedia
                                                     Table
                  Tutorial                                                        Handbook
                  Reading List                       Dataset
                                                                                  Index
                  Worked example                     Database
                                                                                  Legislation
                  Demonstration
                                                     Software
                  Simulation                                                      Laboratory manuals
 Core
 Resource         Case Study                                                      Terminology
 Type             Practical
                  Problem/Exercise
                  Problem/exercise (with Feedback)
                  Game/Serious Game
                  Assessment Item/Instrument
                  Student generated content
                  Resource Pack
                  Study Guide
                  Syllabus/Programme/Curriculum
                  Educational Policy
                  Virtual Patient                    Clinical Practice            Anatomical Atlas
                                                     Guideline
                  Teaching File                                                   Pharmacopoeias
                                                     Diagnostic/Laboratory
                  ClinicalCaseStudy                  test

 Medical                                             Diagnostic/Laboratory
                                                     test (annotated)
 Resource
 Type                                                Medical/Diagnostic
                                                     algorithm

                                                     Medical Form

                                                     Clinical record

                                                     Patient Education
                                                     Handout
   Concerning the "Professional Practice" collection of terms, it can be noted that,
many specific terms can have a counterpart in other, highly specialized professional
domains (e.g., law, economics, psychology, etc.) that have professional codes, ethics,
practice guidelines, data collection tools, software to support the job activity. For
example, a "Diagnostic algorithm" or a "Diagnostic/laboratory test" makes sense also
in most engineering/technical disciplines, and, as such, could be part of other domain-
specific extensions. Similar considerations apply for the Reference materials. As
mentioned before, the reason why one would index and share these types of resources,
that are somehow "raw materials" (not re-elaborated for educational purposes) is that
they can be well suited to be the building blocks of constructivist and "authentic"
learning experiences. Also, it might be argued, they can indirectly foster the process
of community building (that is most often mediated by sharing tools and tricks of the
trade) that can make a system for sharing resources successful or not.


5 The mEducator Media Type Taxonomy

The property “Media type” of the mEducator learning resource refers to the media
type of the resource, or the media types embedded in it, if this is a complex one. It is
possible to have multiple values for this field, e.g., an image sequence accompanied
by an audio clip, or a blog post embedding a video clip. Media type may refer also to
the packaging standards that are used to interoperate the resource across different
learning platforms (e.g., SCORM package, MVP-Medbiquitous Virtual Patient) but
leaves out details such as file digital format and size.
   In order to facilitate identification of “fragments” for repurposing, and of classic e-
learning “packages” for interoperability, Media Type has been organized in four
categories: Primary Media, MultiMedia, Web/Social Media and Media Package. The
Web/Social Media category reuses terms from the SIOC ontology [22].
   Table 2 lists the terms selected for each category, whereas Table 3 provides the
definitions for Primary Media and Multi Media


Table 2. The mEducator Media Type taxonomy.

                            mEducator Media Type Taxonomy
   Primary Media     MultiMedia             Web/Social Media            Media Package
Text                 Interactive Multimedia    Site                  SCORM

Image                Immersive Environment     Forum                 IMS

Sketch/Graphical                               Blog                  DICOM
Annotation
                                               Post                  MVP (VirtualPatient)
Animation
                                               Podcast
Audio
                                               Webinar
Video
                                               Wiki
3D Model
Table 3. mEducator Resource Type Taxonomy: definitions for Primary Media and MultiMedia

                      Definitions for mEducator Media Type Taxonomy
                                Primary Media Sub-headings
                      A resource consisting primarily of words for reading. Example: an e-book without
Text
                      interactive features or multimedia elements.
                      Comment: same as http://purl.org/dc/dcmitype/Text

                      A static visual representation other than text
Image
                      Same as: http://purl.org/dc/dcmitype/StillImage

                      An image or text augmented by layer information. May combine attaching text labels
Sketch/Graphical
                      to graphical elements is a natural visual notation that appears in many kinds of hand
Annotation            drawn diagrams, such as those appearing in user manuals, to indicate part-whole
                      relationships.
                      Narrower than Image.

                      A series of visual representations imparting an impression of motion when shown in
Animation
                      succession. Include visual output from a simulation.
                      Note: specializes http://purl.org/dc/dcmitype/MovingImage

                      A resource primarily intended to be heard.
Audio/Sound
                      Same as http://purl.org/dc/dcmitype/Sound

                      A camera-based recording of visual and audible components. Audio might not be
Video
                      included.
                      Note: specialises http://purl.org/dc/dcmitype/MovingImage

                      A three-dimensional digital representation of an object.
3D Model

                                   MultiMedia - Subheadings
                      A resource that aggregates any combination of text, audio, still images, animation,
Interactive
                      video, and requires interaction from the user to be understood, executed, or
Multimedia            experienced. Examples include AJAX Web pages, Applets, e-books with multimedia
Resource              elements.

                      An artificial, interactive, computer-created scene or "world" within which a user can
Immersive
                      immerse themselves. May resort to gestural controls, motion tracking, and computer
Environment/Virtual   vision respond to the user's actions and movements. Examples include Second Life,
Reality               or a virtual reality simulation of surgical procedures.
                      Narrower than Interactive Resource


6 Implementation

IMS Vocabulary Definition Exchange (VDEX) [23] is a standard currently being used
for the interoperability of eLearning vocabularies; however, since mEducator places
emphasis on compatibility with Semantic Web application, and one of the solutions
that is going to be tested is based on Semantic Web Services and Linked Data, SKOS
[24] was chosen to represent mEducator vocabularies/taxonomies, also in accordance
with the JISC recommendations on terminology services and technologies [25].
SKOS (Simple Knowledge Organization System) is the W3C specification for
representing thesauri, classifications, subject headings, taxonomies, and
folksonomies, and fully supports multilingualism.
The vocabularies namespaces are:
    • http://purl.org/meducator/resourceType/
    • http://purl.org/meducator/mediaType/
whereas the RDF files are available at:
    • http://www.meducator.net/mdc/resourceType.rdf
    • http://www.meducator.net/mdc/mediaType.rdf
respectively.
   The schema used for attaching Resource and Media types to the educational
resources, is the mEducator schema that can be found at www.purl.org/meducator/ns.
In Figure 1 an excerpt of an instance of a resource that uses the implemented
vocabularies is shown.

Resource “Biomolecular Structure Function analysis practicals” type is
“Practical”, “Simulation”, “Software”, “Resource Pack”
<mdc:resourceType rdf:resource="http://purl.org/meducator/resourceType#practical"/>
<mdc:resourceType rdf:resource="http://purl.org/meducator/resourceType#simulation"/>
<mdc:resourceType rdf:resource="http://purl.org/meducator/resourceType#software"/>
<mdc:resourceType
rdf:resource="http://purl.org/meducator/resourceType#resourcePack"/>

Resource “Biomolecular Structure Function analysis practicals” media is
“animation”,”3D model”:
<mdc: mediaType rdf:resource="http://purl.org/meducator/mediaType#3DModel"/>
<mdc: mediaType rdf:resource="http://purl.org/meducator/mediaType#animation"/>

Fig. 1. An example of the instance section characterizing the learning resource "Biomolecular
Structure Function analysis practicals". mdc is the alias of the mEducator schema.


7 Discussion

   Resource Type and Media Type have been designed to work jointly to profile the
resources. Another option could have been to include in the medical resource type
extensions some other specific cases where the Resource Type implies the Media, as
is the case for VPs (to a certain extent), or for traceable medical images (that imply
both images and graphical annotation). This might lead to some redundancy, but the
trade-off could be achieving more immediate recognition of a concept from the users.
This is possibility will be re-evaluated after completing the testing of the vocabulary
in its current form with the end-users.
   Resource Type and Media Type are not the only (mandatory) controlled
vocabularies that were adopted in the mEducator schema. A richer profiling of the
learning resource is achieved, with respect to controlled vocabularies by properties
regarding the IPR licensing schema (property reuse and values from Dublin Core),
educational level of the intended audience and learning outcomes taxonomy. This
latter one was developed anew and given its specificity for the medical domain its
description is outside the scope of the paper. However, it should be noted that this
learning outcomes taxonomy plays an important role, similar to the characterization
of the content with respect to the domain (e.g., fact, procedure, law, etc.) included in
the ontologies in [13] and [15].
   The development of the proposed vocabularies can be considered from two angles:
1) an exploration of what constitutes a content type from an educational perspective,
and 2) an attempt to a characterization that is meaningful for the community of
medical educators and that is useful in terms of assisting the retrieving of content with
some repurposing goal in mind. To this aim it should be remarked that the level of
granularity that has been adopted has been very much affected by the notion of
repurposing that is key to mEducator. In fact, repurposing is a middle ground between
the notion of reusing a learning object in its entirety, and reusing some of its media
fragments or content fragments [15], this latter requiring a much more fine-grained
approach to annotation. On the other hand, repurposing presupposes that a fairly
complex and elaborated resource keeps its general connotation (e.g., a virtual patient),
but is repurposed to take into account different contexts, e.g., different cultures,
different languages, different measurement systems, or to cover yet an additional
learning outcome. In this sense it seems reasonable to annotate at a level that does not
explicitly take into account content structure, and yet may allow specifying the nature
of the components included in the resource.
   Concerning the practical ways to support the adoption and the correct use of
controlled vocabularies (to reap their benefits), issues more directly related to the
users must to be taken into account. One lessons from the development of the
vocabulary activity was that any classification system, no matter how much "self-
explaining" its categories and terms are, becomes increasingly meaningful to the user
engaged in the indexing process as long as he or she gains a full view of the facets
used in the classification. Interestingly, this leads us into the realm of user-interface
design, where the recommendation is to show contextually all the top and first level
categories from which the user can select their indexes. This requirement is even
more important when the coexistence of the folksonomic approach is allowed (i.e., a
user who does not recognize a suitable controlled term to describe the resource can
enter a new term), as is the case in our project. For example, if the user thinks that
resource content type is a WIKI s/he should be immediately aware that that option has
been contemplated in the Media Type section.


8 Concluding remarks

   We are currently involved in two lines of activity. The first one is concerned with
evaluating and validating the proposed vocabularies, as implemented in the two
mEducator solutions to share medical content. The proposed vocabulary will be then
presented to Medbiquitous (www.medbiq.org), the international consortium dedicated
to develop information technology standards for healthcare education and competence
assessment, as part of the goals of mEducator.
   The second line of action regards the formalisation of the mappings of the chosen
terms towards DC, HLOM, MeSH and other sources used in developing the
vocabulary. This will be done taking into account also the output of a recent JISC
project on the Vocabulary Mapping Framework [17], where a mapping of
vocabularies from source standards is provided, with the aim to support the building
of transformations between any of them. It is expected is that through this activity and
through the RDF linking of related vocabularies, a valuable resource to assist in the
tailoring of vocabularies for specific communities will be made available.
   From a research perspective, we plan to investigate how the network of mappings
that becomes realised by connecting vocabularies can complement the information
that can be extracted by mining the free text fields of the meducator schema, as
proposed in [27], to further facilitate the resource retrieval process.


Acknowledgements
This work was supported by the project mEducator (Multi-type Content Sharing and
Repurposing in Medical Education), funded by the eContentplus Programme, a
multiannual Community programme to make digital content in Europe more
accessible, usable and exploitable (Grant ECP 2008 EDU 418006)."


References

1. Lambe, P.: Organising knowledge: taxonomies, knowledge and organisational
    effectiveness. Chandos Publishing, Oxford (2007)
2. Kiu C., Tsui, E.: TaxoFolk: a hybrid taxonomy–folksonomy classification for enhanced
    knowledge navigation. Knowledge Management Research & Practice 8, 24--32 (2010)
3. Bamidis P.D., Kaldoudi E., Pattichis C.: From Taxonomies to Folksonomies: a roadmap
    from formal to informal modeling of medical concepts and objects". In: 9th IEEE
    International Conference on Information Technology and Applications in Biomedicine,
    Larnaca, Cyprus, 5-7 November (2009)
4. Heath B.P., McArthur, D.J., et al.: Metadata lessons from the iLumina digital library.
    Commun. ACM 48, 7, 68--74 (2005)
5. Bizer, C., Heath, T. et al.: Linked data - The Story So Far. Special Issue on Linked data,
    International Journal on Semantic Web and Information Systems (IJSWIS) (2009).
6. Mannens, E., Troncy, R. et al.: Automatic metadata enrichment in news production. In:
    10th Workshop on Image Analysis for Multimedia Interactive Services, pp. 61--64, IEEE
    (2009)
7. Dietze, S., Benn, N. Yu, H. et al.: Comprehensive service semantics and light-weight
    Linked Services: towards an integrated approach”. In: 8th International Semantic Web
    Conference (ISWC 2010), Shanghai, China, November (2010)
8. http://www.medbiq.org/std_specs/standards/index.html#HCLOM
9. Mitsopolou, E., Taibi, D., Giordano, D. Dietze, S., Yu, H.Q., Bamidis, P., Charalampos, B.,
    Woodham, L: Connecting medical educational resources to the Linked Data cloud: the
    mEducator RDF Schema. In: 1st Linked Learning Workshop, ESWC, Heraklion, Crete,
    May 29, (2011)
10. Kaldoudi, E., Bamidis, P., Papaioakeim, M.,Vargemezis, V.: Problem-Based Learning via
    Web 2.0 Technologies. In: 21st IEEE International Symposium on Computer-Based
    Medical Systems, pp. 391--396, IEEE (2008)
11. Giordano, D., Leonardi R.: Web-trace and the learning of visual discrimination skills. In:1st
    International Workshop on Pen-based Learning Technologies, Catania, Italy. 24-25 May,
    pp. 80-85.. IEEE CPS (2007)
12. Kaldoudi, E., Dovrolis, N., Konstantinidis, S., Bamidis, P.D.: Social networking for
    learning object repurposing in medical education”, J. Inform Techn Healthcare, 7, 4, 233--
    243 (2009)
13. Ullrich, C.: The learning-resource-type is dead, long live the learning- resource-type!,"
    Learning Objects and Learning Designs, 1, 1, 7--15 (2005)
14. Jonassen, D., Churchill, D.: Is there a learning orientation in learning objects?"
    International Journal on E-learning 3(2): 32-41 (2004).
15. Jovanovic, J., Gasevic, D., Devedic, V.: Ontology-based Automatic Annotation of Learning
    Content. International Journal on Semantic Web and Information Systems, 2, 2, 91--119
    (2006)
16. http://dublincore.org/documents/dcmi-type-vocabulary/
17. Barker P., Cross, P., Fernandez, A. et al. : RDN/LTSN resource type vocabulary- v.1.0.
    Available at: http://www.intute.ac.uk/publications/rdn-ltsn/types/
18. Currier S., MacNeill S., et al.: Vocabularies for describing pedagogical approach in e-
    learning: a scoping study. In: DC-2006 International Conference on Dublin Core and
    Metadata Applications: metadata for knowledge and learning.
19. MPEG-7 Ontology, http://www.w3.org/2005/Incubator/mmsem/XGR-mpeg7/
20. Dunsire, G.: Distinguishing Content from Carrier: The RDA/ONIX Framework for
    Resource Categorization. D-Lib Magazine 13, 1/2 (2007)
21. MeSH, http://www.ncbi.nlm.nih.gov/mesh
22. The SIOC ontology. http://sioc-project.org/
23. http://www.imsglobal.org/vdex/vdexv1p0/imsvdex_bestv1p0.html
24. SKOS, Simple Knowledge Organization System, www.w3.org/2004/02/skos
25. Tudhope D., Koch, T., Heery R.: Terminology Services and Technology - JISC state of the
    art review, http://www.ukoln.ac.uk/terminology/JISC-review2006.html (2006)
26. The vocabulary mapping project, http://cdlr.strath.ac.uk/VMF/index.htm
27. Giordano, D., Faro, A., et al.: Feeding back learning resources repurposing patterns into
    the “information loop”: opportunities and challenges. In: 9th IEEE International
    Conference on Information Technology and Applications in Biomedicine, Larnaca, Cyprus,
    5-7 November (2009)


Appendix: Definitions for Educational practice artifacts

                         Definitions for Educational Practice Artifacts
                      A sequence of activities designed to advance student skills, knowledge, and
Course/Module/Unit    attitudes in a particular discipline and to help students meet requirements as
                      prescribed in a curriculum.
                      Example: a course delivered through a LCMS (e.g., Moodle)
                      The video or audio recording, or the transcript of the exposition of a given subject
Lecture (Recording)   delivered before an audience, for instructional purposes.
                      Example: a speaker presentation at a conference
                      Materials that are projected to support the delivery of lectures or presentations in a
Lecture               module or a course.
Slide/Presentation
                      Example: an annotated PPT presentation, an image, picture or diagram.
                      Notes containing topical information prepared as study/reference material for
Lecture               lecture(s) or a course.
Notes/Handout
                      Example: a document distributed to the participants in a seminar
                      Book or chapter of a book explicitly meant for the study of a specific subject,
Textbook/Chapter      characterised by the systematic presentation of knowledge about the subject.
                      A resource that provides guided, practical information about a subject
Tutorial
                         Example: on-line primer to a software for statistical analysis
                         A list of recommended or required sources which provide additional information on
Reading List             the subject being studied
                         A written problem or exercise designed to illustrate step-by-step how to perform a
Worked Example           task or how to solve a problem.
                         A video or audio recording, or a transcript, demonstrating a skill or a procedure in
Demonstration            practice.
                         A representation of a process, activity or situation. Designed to support problem-
Simulation               based learning or exploratory learning, or the comprehension of the dynamics of a
                         bio-physical or other complex process.
                         Example: a simulated patient interview, a software simulating a spring oscillation at
                         the varying of the spring parameters and the force applied to the spring.
                         An interactive computer simulation that allow the learner to take the role of a health
Virtual Patient          care professional and develop clinical skills such as making diagnoses and
                         therapeutic decisions.
                         The presentation of a clinical case through diagnostic imaging modalities. The
Teaching                 images contained in the file are typically supplemented with the following
file/Clinical Case       information; case title, history/presentation, findings, diagnosis, discussion.
Study
                         A detailed account or a process or activity, prepared to assist the student in
Case Study        (Non   determining what factors led to its success or failure.
Clinical)
                         Example: a detailed analysis of an institutional change/innovation, a description of a
                         critical incident.
                         An activity to develop the practical skills of a subject, or an examination of such
Practical                practical skills.
                         Example: an activity to be carried out in the laboratory.
                          A task, or other effort to be performed by the student to develop, maintain, or
Problem/exercise         increase skill or cognitive abilities. May include Projects and Fieldwork
(no feedback)
                         A task, or other effort to be performed by the student to develop, maintain, or
Problem/exercise         increase skill, including problem-solving, and for which feedback is readily available
(with feedback)          to the student.
                         A structured activity, usually undertaken for enjoyment, used to teach about a
Game/Serious             subject or to develop specific skills while playing.
Game
                         An item, activity, system or instrument designed to measure student learning. May
Assessment               include question/answers pairs or practical skill demonstration.
Item/Instrument
                         Example: Self-assessment questions, Multiple choices, Objective Structured Clinical
                         Examination (OSCE)
                         Any content generated by a student of a group of students as a result of coursework.
Student generated        Can be used to exemplify good and bad practices, or shared for reference.
content                  Examples: entries in a Wiki, final coursework, a discussion thread in a Forum
                         A complete package for a course, module or unit including learning resources of
Resource Pack            several types and supporting documentations (e.g., teacher/learning instructions).
                         Sometimes referred to as “teaching file” in educational contexts other than medicine.
                         A written guide created to provide direction, and point out critical information to the
Study Guide              students. It may include techniques for problem solving and hints to manage the
                         study process (e.g. topic sequencing and timing, learning and testing strategies)
                         A document describing the contents of units, courses, and courses of study. May
Syllabus/Programm        include information about access requirements, delivery, activities and assessment
e/Curriculum             modalities.
                         A document containing statements about how an educational system should
Educational Policy       operate.
    Applying and Extending Semantic Wikis for
              Semantic Web Courses

                    Lloyd Rutledge1 and Rineke Oostenrijk2
                    1
                      Faculty of Informatics, Open Universiteit
                          Amsterdam, The Netherlands
                              Lloyd.Rutledge@ou.nl
                 2
                   Netherlands Environmental Assessment Agency
                           Bilthoven, The Netherlands


      Abstract. This work describes the application of semantic wikis in dis-
      tant learning for Semantic Web courses. The resulting system focuses its
      application of existing and new wiki technology in making a wiki-based
      interface that demonstrates Semantic Web features. A new layer of wiki
      technology, called “OWL Wiki Forms” is introduced for this Semantic
      Web functionality in the wiki interface. This new functionality includes a
      form-based interface for editing Semantic Web ontologies. The wiki then
      includes appropriate data from these ontologies to extend existing wiki
      RDF export. It also includes ontology-driven creation of data entry and
      browsing interfaces for the wiki itself. As a wiki, the system provides the
      student an educational tool that students can use anywhere while still
      sharing access with the instructor and, optionally, other students.


      Keywords: wikis, semantic wikis, distance learning, linked data


1   Introduction

A wiki helps multiple people at a distance to cooperate in writing and sharing
documents. The Semantic Web enables people of multiple institutions to create
data that machines on the web can exchange and share. A semantic wiki com-
bines these types of systems: it lets multiple people at multiple places cooperate
in making documents and data, along with the programs that retrieve, organize
and present this data. Distance learning is another form of distant collaboration,
in which not just documents and data are exchanged, but also skills and knowl-
edge, and the means of conveying them. The ability of semantic wikis to enable
distance and collaborative development of Semantic Web systems has special
ramifications for distant learning about the Semantic Web.
    The Master’s course Capita Selecta Thema - Semantic Web of the Informatics
faculty at the Open Universiteit applied semantic wikis as distance education
tool. Now the software and the didactic technique developed for and used in this
course are being adapted and extended for an upcoming Bachelor’s course on
the Semantic Web. As with most Open Universiteit courses, these Semantic Web
courses apply distance learning techniques. Students in both courses develop,
as part of their graded assignments, Semantic Web systems on semantic wikis.
This work explains and demonstrates this application of semantic wikis as course
material in distance learning about the Semantic Web.
   This paper starts with a discussion of related work in semantic wikis. We
then describe Semantic Web educational activities at the Open Universiteit and
our use of semantic wiki’s in them. The last section presents OWL Wiki Forms,
an extension of existing semantic wiki’s developed for Semantic Web courses.


2     Related Work

A variety of related research and systems apply to our wiki-based approach
for developing courseware for distant learning about the Semantic Web. This
section starts by presenting a broad selection of concepts and tools that apply
to Semantic Web courses. We then present the central tool we apply: Semantic
MediaWiki. This section concludes with the wiki interface tool we use most:
Semantic Forms.


2.1   Foundational Semantic Web tools

Our application of semantic wikis as Semantic Web courseware is influenced and
supported by Semantic Web tools beyond wikis. This section starts by presenting
Protégé, an ontology editor that many Semantic Web courses use. We then
present a progression of different systems that lead to semantic wikis.
    Protégé-OWL (or just Protégé) is a GUI-editor for developing ontologies,
queries and rules in Semantic Web formats [7]. Protégé supports development
and visualization of RDF, RDFS, OWL and SWRL. It is easily installed and
free of charge. Protégé is frequently used in courses about the Semantic Web.
Protégé is a good implementation of many technical Semantic Web components.
However, it functions mostly as a stand-along back-end developer’s interface,
lacking as such a direct end-user interface.
    Semantic browsers such as Tabulator [3] form one type of easily accessible
end-user interface to Semantic Web data. While they offer search and analysis
interfaces in a variety of ways, they usually have a view for a single resource,
shown as a table of the resource’s properties with their assigned values. In ear-
lier work, we presented a semantic browser whose focus was generating outline
structure, whose links than lead to such tabular annotation displays for given
resources [12].
    Wikis provide an important and large-scale online source for highly collab-
orative authoring of information. Initiatives such as Wikiwijs [13] provide plat-
forms on which teachers share education material with each other. Wikis are
also helpful for distance learning because their interfaces make facilitate online
collaboration.
    The best-known wiki is Wikipedia 1 . Wikipedia’s infoboxes provide annota-
tions of pages using a simple editing format, which results in tabular displays of
facts for given Wikipedia pages. Infobox displays resemble semantic browsing in
their structured presentation of properties and linked values. An important dif-
ference, however, is that each page’s infobox is human-edited separately, whereas
semantic browsers automatically generate their displays.
    The DBpedia project brings Wikipedia’s infoboxes into the Semantic Web
by converting them to RDF and providing Semantic Web-based querying and
browsing interfaces to the infobox data [1]. DBpedia data is then integrated
into the Linked Open Data dataset [4]. All this makes the infobox format on
Wikipedia a large and important source of annotations on the Semantic Web.
The most important technology for wikis is MediaWiki 2 , which is used for, among
other things, Wikipedia.


2.2   Semantic MediaWiki

Semantic wikis are wikis that add crowd-editing of data to that of documents.
With semantic wikis, people can add, manage, distribute and present data with
wiki interfaces as they do with documents, but so that machines process the data
as well. The most important software for semantic wikis is Semantic MediaWiki
(SMW) [9], an extension of MediaWiki. This tool supports making systems that
follow the vision and high-level functionality of the Semantic Web. However, the
individual Semantic Web components only become explicitly visible in SMW’s
RDF export feature.
    Semantic MediaWiki, like MediaWiki, is free of charge and open source, and
therefore not only easily acquired, but also easy to build further as assignments
for student projects. There are at least 216 active, public Semantic MediaWiki
installations online with several types of applications, for fields such as compa-
nies, culture, training and management, and 33 specifically for education [14].
The industrial use of SMW for data management is growing of, as shown by the
development and adoption of industry-oriented Semantic MediaWiki extensions
such as SMW+3 .
    Table 1 shows how various Semantic Web constructs are mimicked by or di-
rectly implemented in features from the incremental layers of SMW technologies.
Semantic MediaWiki starts by recognizing that MediaWiki itself has features
similar to Semantic Web components, as indicated in the “MediaWiki” column
of Table 1. The next column show features of the SMW extension that build
on top of MediaWiki to add Semantic Web features. The final column shows
the indirect Semantic Web support provided by Semantic Forms, which is an
extension on top of Semantic MediaWiki.
    First, SMW treats each MediaWiki link like a triples joining the current
page with the link end via an unspecified property. The foundation of Semantic
1
  http://www.wikipedia.org
2
  http://www.mediawiki.org/
3
  http://www.ontoprise.de/en/solutions/smw-plus/
Table 1. Ontological features implemented in progressive layers of Semantic
MediaWiki technologies

Feature                          MediaWiki [9]* SMW [9]*                         Semantic Forms [8]
RDF triples                      [[xxx]] link         [[pred ::obj ]]            infobox assigns
                                 (untyped),           queried and exported       SMW annotation
                                 infoboxes [1]        as triple
        type                     category             export                  infobox assigns
                                                                              category
     Property                                         property                infobox/form row
RDFS datatype                                         SMW datatype: affects specialized input field
                                                      query, display, sorting
                                                      and export
        Resource                 page                 export as URI from      each page annotated
                                                      wiki                    by infobox/form
        Class                    category             exported                usually own
                                                                              infobox/form
        domain                                                                property assignment
                                                                              to infobox/form
        range                                                                 autocompletion on
                                                                              category
                                                                              “Has default
                                                                              form” property
        subClassOf               categories in        in queries and export
                                 categories
        subPropertyOf                                 SMW subproperty:
                                                      queries and export
OWL inverseOf                                         invert-property
                                                      template [15]
        sameAs,        #REDIRECT                      in query processing [15]
        equivalent-                                   export uses SMW’s
        Class/Property                                equivalent URI [16]
                                                      and ontology import
                                                      [15]
*all content of this column cites this reference unless stated otherwise


MediaWiki is its typed links, which extend MediaWiki link syntax by adding
specified properties. This handling as triples becomes literal in the RDF export
feature.
    In addition to this Semantic Web-based foundation of typed links, SMW has
many smaller, albeit sometimes more indirect, equivalences of Semantic Web
functionality. SMW offers a variety of datatypes. These influence how their data
is displayed and sorted, as well as explicit coding in RDF export. MediaWiki’s
special page namespace for categories offers a rough equivalent of Semantic Web
classes in internal behavior. SMW extends this by class-like handling of cate-
gories in queries, and by explicitly making categories be classes in RDF export.
Semantic MediaWiki adds a wiki page namespace for properties, whose mem-
bers then function as equivalents of Semantic Web properties, and are exported
explicitly as such.
    Categories of categories in MediaWiki have subclass-like inheritance in inter-
nal Semantic MediaWiki queries. SMW has its own subproperties, with similar
internal inheritance, and also with explicit RDF export. The invert-property
template distributed with Semantic MediaWiki provides interface behavior sim-
ilar to that indicated by owl:inverseOf. MediaWiki’s #REDIRECT is treated by
SMW queries in some ways as owl:sameAs. Most important for the ability to link
internal data to the Linked Data cloud are Semantic MediaWiki’s import ontol-
ogy and equivalent URI functions, which link given internal pages to previously
existing external Web resources.
    The Semantic MediaWiki Linked Data extension (SMW-LDE ) is an exten-
sion on top of SMW+ that gives a wiki improved access to external linked data
[2]. This improved access includes unification of data from multiple external
sources in internal querying and in other aspects of the wiki interface. SMW-
LDE illustrates the growing role of semantic wikis for linked data. While such
external data integration is an important part of Semantic Web systems in gen-
eral, our work here focuses more on ontology modeling on the wiki and the
exporting and integration of data from these ontologies made on the wiki.

2.3   Semantic Forms
Semantic Forms (here just “Forms”) is an extension on top of Semantic Media-
Wiki that facilitates the entry of data and automates its presentation for brows-
ing [8]. Data entry facilitation comes as a form-based user interface for adding
annotations to wiki pages. Semantic Forms then generates presentations of this
form-entered data for each wiki page in tables similar to Wikipedia infoboxes,
providing automatically rendered semantic browsing of data in the wiki. The
rightmost column of Table 1 shows how Semantic Forms implements certain
Semantic Web system features. Here we make our own observations and com-
parisons, since Forms does not explicitly target Semantic Web functionality but
instead implements more general data management.
    The infoboxes in Semantic Forms are MediaWiki templates. MediaWiki tem-
plates serve as functions that accept parameters in generating wiki page code.
Each parameter in a Forms infobox template corresponds with a SMW property.
A call to an infobox template from a wiki page generates SMW code that assigns
the parameter value to the given property for that page.
    Forms offers special form-based interfaces to let the user enter the specifica-
tions that drive the generation both of these templates for infobox displays and
of the forms that populate them. Through these special forms, the user indicates
which properties appear in each infobox/form pair, and optionally which cate-
gory the infobox puts its pages into. Semantic Forms applies the SMW datatype
of each property to determine interface details of how its forms accept data input
for that property. By specifying in this manner the properties asked for in each
form and the appearance of each type of infobox, the user sets up the equivalent
of a data model for the wiki.
    Some Semantic Form features reflect certain aspects of the properties
rdfs:domain and rdfs:range. That a property is assigned to a form and an in-
fobox suggests that the form’s class is included in its domain. Similarly, Forms’s
“autocompletion on category” parameter of form value input field specifica-
tions suggests a rdfs:range for the property. In addition, some rdfs:range
interface behavior shows in the assignment of default forms to pages linked from
a given property that have not yet been created.
    These features of MediaWiki, Semantic MediaWiki and Semantic Forms offer
a semantic browsing environment with forms for entering data. Installations of
these tools let users create and export RDF code for integration with Semantic
Web sets such as the Linked Data Cloud. In addition, administrators can fine-
tune their installations so that the user interface reflects additional constructs
from the Semantic Web. In the next section, we discuss our application of these
tools to Semantic Web educational activities. The section after that presents the
OWL Wiki Forms extension of these tools, for use in upcoming courses.


2.4   Ontowiki

There are other semantic wikis beyond Semantic MediaWiki. One that is partic-
ularly relevant to this work and its venue is OntoWiki [6]. While, the collection of
semantic features OntoWiki offers is similar to that of Semantic MediaWiki and
its extensions, it is not built on top of MediaWiki. Advantages that OntoWiki
has SMW include smoother integration with existing Linked Data sources and
direct internal support of OWL constructs. For our work here, we apply Seman-
tic MediaWiki and its extensions because they comprise a larger development
endeavor with a wider variety of data processing features.


3     Applying Semantic Wikis to Semantic Web Courses

The semantic wiki technologies from the previous section apply to making a
variety of Semantic Web systems. This section describes our application of these
technologies to administer classes and student projects about the Semantic Web.


3.1   Bachelor’s Thesis Project

The Semantic Friendly Forms Bachelor’s thesis project at the Open Universiteit
built an extension of Semantic Forms [5]. The primary functionality it adds is
a form-based interface for modifying pre-existing code for input forms and data
presentation tables. With just Semantic Forms, on the other hand, the form
pages for page forms and infoboxes only created them. Any subsequent changes
were edited manually.
    Semantic Friendly Forms also provides forms for assigning certain RDF,
RDFS and OWL properties to any property the semantic wiki uses. These
Semantic Web standard properties are rdf:type, rdfs:label, rdfs:comment,
rdfs:domain, rdfs:range and owl:isInverseOf. The form also lets users indi-
cate that the current property is an owl:SymmetricProperty. In addition, Se-
mantic Friendly Forms processes rdfs:domain annotations to determine which
properties to include in autogeneration of forms and infoboxes.
   Semantic Friendly Forms is implemented as PHP encoding of MediaWiki
“Special pages”, just as all of Forms and much of SMW is.


3.2    Master’s Course

The Open Universiteit recently gave a one-time Master’s “Capita Selecta
Thema” course about the Semantic Web [11]. This course used Semantic Media-
Wiki, along with Semantic Forms and Semantic Friendly Forms, as the core with
which students share their data and interfaces with the teacher, and with each
other [11]. It was also used for the general course administration and as a plat-
form for large student projects. Validation and some higher-level Semantic Web
processing was carried out by external tools, including Protégé.
    The challenge for each student was to make a system with the functionality
and interface of the Semantic Web. Each student developed an ontology in OWL
for their project on the semantic wiki. These ontologies then became populated
and presented with Semantic Forms-based interfaces that the students made
for them, in part with Semantic Friendly Forms. Each project also produced
additional interfaces for its data in the form of SMW query result tables and code
for showing them with the various presentations formats SMW offers. Finally,
the students set up their projects so that the RDF export feature of Semantic
MediaWiki export each as an RDF file that confirmed to their OWL-defined
ontology.
    The shared use of the course wiki allowed students to share their work in
progress with the teacher. The teacher could access, use, and even edit the stu-
dents systems at any time from any web browser. In this Master’s course, all
students worked on different project with different topics on a common wiki. Stu-
dents could thus see each other’s projects and code. In addition, student project
sometimes used components of other student projects, as was encouraged.
    Another communal online aspect of the course was its series of online sessions.
We used the online lecture tool Elluminate4 for a series of lectures. In these
sessions, both teachers and students gave presentations. In addition, students
were able to use Elluminate’s Application Share tool to present and give demos
of their projects on the common semantic wiki. Fig. 1 has a reproduction of
an online session of this course, with demonstrator of a student’s project on
the wiki. Other forms of communication in the course are email, a discussion
group, individual student-teach online sessions with demos and, of course, the
wiki itself.
    The definition of ontologies was a core part of each project. The students
defined these in OWL as separate OWL files. They then used Semantic Forms
and Semantic Friendly Forms to build the user interfaces that provided entry,
4
    http://www.elluminate.com/
           Fig. 1. Screen display of an online class with a semantic wiki


browsing and RDF exporting of data conforming to these ontologies. The ability
of Semantic Friendly Forms to assist in the modification of existing infobox
templates and forms was exploited by the students in building and implementing
their ontologies. However, the defining of the data interface and of the OWL code
it reflects remained technically separate tasks.

3.3   Master’s Course Project on OWL
One of the student projects from this Master’s course was about OWL itself
[10]. This project provided form interfaces for annotating wiki pages with OWL
properties and classes. It thus enables users to enter OWL-defined ontologies
as SMW pages and Forms infoboxes. The project also processed some of these
OWL constructs for OWL-related inferencing inside the wiki. As with Semantic
Friendly Forms, this project offers a form page for defining properties. It also
has form pages for annotating classes, restrictions and OWL properties on any
resource. These four forms provide annotation with the majority of RDF, RDFS
and OWL constructs. This project provided the entry of OWL-defined ontologies
inside the wiki instead of as external files. However, these internally defined
ontologies still had no automated influence on the creation of the entry forms
and infoboxes for data populating these ontologies.
    Like all students in the course, this student had no administrator access to
the computer hosting the wiki while developing this project. Thus the project
involved no PHP code. Instead, all functionality was encoded as SMW page
and template code. While programming this functionality in PHP instead could
improve performance, this project shows how much the wiki interface alone can
define such interaction and processing of a semantic wiki installation.


3.4    Bachelor’s Course

The Open Universiteit is developing a new bachelors course about the Semantic
Web 5 . The techniques applied in this course are for the most part adaptations
and extensions of those for the Master’s course. One technical extension is of
the semantic wiki course tools, which we package in the tool OWL Wiki Forms,
described in the next section. This extension adds internal wiki interface behavior
that more directly reflects the Semantic Web constructs used.
    The Bachelor’s course follows much of the approach of the Master’s course.
This includes communication via e-mail, discussion groups, wikis, individual on-
line meetings and schedule online lectures. As with the Master’s course, the core
of the Bachelor’s course is that each student builds a Semantic Web system on a
semantic wiki. A key difference is that each student has the same assignments,
and each student implements these on their own wiki, which the teacher can
also access. In addition, all students can access a common course wiki for gen-
eral course information as well as manuals and examples of the semantic wiki
technology the course applies.
    While the Master’s course projects were general systems with high-level fea-
tures, the Bachelor’s course assignments are smaller in scale and focused more on
understanding and applying individual components of Semantic Web technolo-
gies. This motivates one requirement in developing OWL Wiki Forms: that its
interface directly reflects more individual Semantic Web technology components.
    As with the Master’s course, the Bachelor’s courses system development is
split up into assignments that represent different levels of Semantic Web tech-
nologies. These are: RDF and data, SPARQL and data presentation, RDFS,
OWL and rules. When each assignment is given a passing grade, the starting
point of the next assignment, which is in part the teacher’s answer to the previ-
ous assignment, is loaded onto the student’s wiki. In this way, all students start
each assignment with the same foundation.


4     Extending SMW for Semantic Web Courses

This section presents the software that we developed for use in courses about
the Semantic Web. The features and techniques from the software developed
for the Semantic Friendly Forms project and the Master’s class OWL project
are unified and applied to the development of new software, OWL Wiki Forms
5
    http://www.studieaanbod.ou.nl/T64211.htm
            Table 2. Ontological features implemented in OWL Wiki Forms

 Feature                            Ontology editing Browsing   Data Input
 RDF Property         inherited by object and datatype properties
 RDFS datatype        pull-down menu associated SMW            determines Forms
                      on property form datatype sets display input type
     Resource         own form
     Class            own form                         determines ...
                                          infobox              form
     domain           property form                 determines row in ...
                      value fields        infobox              form
     range            autocomplete on new page links to        value
                      class               form                 autocompletion
     subPropertyOf    property form
 OWL inverseOf        value fields        shows in both
     equivalent-      autocomplete on
     Property         property
     ObjectProperty own form
     DatatypeProperty own form
     Symmetric-       checkbox in         infoboxes
     Property         property form
     Transitive-                          shows in chain of
     Property                             infoboxes
     (Inverse-)
     Functional-
     Property
     Restriction      own form
italics = still under development


(OWF). OWL Wiki Forms focuses on direct support for standard Semantic Web
constructs in the wiki interface itself. Table 2 shows how OWF provides sup-
port for various Semantic Web components. It identifies four areas of support:
ontology editing, querying, browsing and data input. Ontology editing consists
of form-based interfaces for defining ontologies inside the wiki. Semantic Media-
Wiki queries should return not only the triples asserted in the wiki page code
and infobox parameters but also the triples inferred by processing the relevant
RDF, RDFS and OWL code.
    OWL Wiki Forms also generates data display infobox based on the ontologies
entered, which provide central browsing interface for the wiki. The tool similarly
generates data input forms. Each of the following subsections discusses one of
these semantic wiki interface components.


4.1    Ontology editing

The editing of ontologies is an important part of a complete Semantic Web
system. One aim of this work is thus to include ontology editing features in OWL
Wiki Forms. Its primary interface for letting user’s edit ontologies is encoded
using Semantic Forms. This gives the user a form-based interface for entering
annotations using classes and properties from RDF, RDFS and OWL.
    Our adaptation of Semantic Forms infobox templates provides the page dis-
play of this ontological data. It also ensures the corresponding annotations with
categories and properties that we define for standard Semantic Web constructs
by using Semantic MediaWiki. Our code for these constructs ensures that SMW’s
RDF export of any wiki content data includes the RDF, RDFS and OWL triples
relevant for the ontologies around that data.
    OWL Wiki Forms provides roughly the same forms for entering ontologies
that the Masters course OWL project did: forms for general resources, classes,
properties and restrictions. These correspond with, respectively, the Seman-
tic Web standard classes rdfs:Resource, rdfs:Class, owl:DatatypeProperty
owl:ObjectProperty and owl:Restriction. Each of these forms allows the en-
try of value assignments for properties from RDF, RDFS and OWL. Assignments
using Semantic MediaWiki’s equivalent URI and ontology import facilities cause
all Semantic Web definitions entered with these ontologies forms to appear ap-
propriately in RDF exports from the wiki. Fig. 2 shows screen displays for forms
for classes and object properties.
    Each of the five main OWF forms for editing ontologies provides for the entry
of values for standard properties. Their entry here results in not only explicit
RDF export of what is entered but also corresponding internal wiki behavior for
data display and entry. The property forms have value entry fields for various
standard properties. Two of these are rdfs:domain and rdfs:range. The values
entered for these two particular properties have later influence on the entry and
presentation of data in OWL Wiki Forms, as described later. For object property
forms, the rdfs:range field accepts a class name. For datatype property forms,
it has a pull-down menu for selecting the rdfs:Datatype of the property.
    The object property form also has three standard properties that re-
fer to other properties. These are rdfs:subPropertyOf, owl:inverseOf
owl:equivalentProperty. The form assists ontologists in entering values for
these properties by providing autocompletion for existing property names.
    Some OWL property classes have no additional OWL properties of
their own. These include owl:SymmetricProperty, owl:TransitiveProperty,
owl:FunctionalProperty and owl:InverseFunctionalProperty. Protégé uses
checkboxes rather than class assignments to indicate that properties are members
of these classes. Similarly, Semantic Friendly Forms uses a checkbox to indicate
that a property is a owl:SymmetricProperty [5]. OWL Wiki Forms takes the
same approach, using the checkbox feature in Semantic Forms. When such a box
is checked, it generates code that assigns the given property to the corresponding
class assignment. All four of these checkboxes are on object property forms. A
checkbox for owl:FunctionalProperty is also on datatype property forms.
    By providing relevant internal ontological structure with each RDF export,
OWL Wiki Forms facilitates the integration of wiki-generated triples into broader
Semantic Webs such as the Linked Data Cloud. For Semantic Web courses, such
exports let students see their wiki-edited ontologies in other Semantic Web tools
Fig. 2. Screen displays of the OWL Wiki Forms screen displays for a class form, an
object property form, a page annotation form and an infobox


such as Protégé, and thus exercise with the integrability of the data their wiki
implementations create. The ontologies entered via OWF also drive the inferenc-
ing, presentation and data input interfaces described later in this section. One
limitation is that all of this processing currently only applies to ontological infor-
mation entered via the wiki. Potential future work includes processing external
ontological data for inferencing, browsing and data input assistance.
4.2   Querying and Browsing

Semantic MediaWiki has its own query language for internal data. One goal of
OWL Wiki Forms is to have results from these queries include the inferencing
that is defined by Semantic Web constructs entered in the wiki. Our initial means
of programming this is as queries in infobox templates that find property assign-
ments from that should infer properties of the current page, and then transform
these query results into SMW code for the corresponding OWL assignments.
    One important type of query result to include these inferences in is the prop-
erty value display in infoboxes. In Wikipedia, Semantic Forms and OWL Wiki
Forms, infobox templates provide a consistent table-based data report for the
pages in a given category. Fig. 2 shows an infobox generated by OWL Wiki
Forms. In Semantic Forms, a user determines the infobox rows and the proper-
ties they display. In OWF, on the other hand, infobox template code is generated
automatically based on relevant ontologies in the system. OWL Wiki Forms de-
termines each infobox’s rows and how they appear by processing Semantic Web
ontology constructs.
    OWF also extends the processing of generation of specific Semantic Web con-
structs in generating infoboxes. The most important aspect of the infoboxes, the
assigning of the rows themselves, is generated by OWL Wiki Forms by process-
ing the rdfs:domain property. The processing by Semantic Friendly Forms of
property domains [5] is applied here: if a property has a class as an rdfs:domain,
then the property gets a row in the infobox for that class.
    OWL Wiki Forms’s generation of infoboxes applies the rdfs:range property
by having links to nonexistent pages go to forms for the corresponding classes,
so that the pages are registered with this class when the user later adds them. To
do so, it generates code using Semantic Forms’s “Has default form” property.
Here, not only links from infoboxes trigger such forms, so do links anywhere in
the wiki that lead to new pages for which range-based inferencing assigns them
to a class.
    OWF infoboxes show for each property not just the direct assignments from
that page’s form, as Semantic Forms does, but also the inferred values of the
property. For example, infobox displays of values of symmetric properties show
all values of this property for the current page, including names of pages with
annotations with the same property pointing to the current page. Properties
with an owl:inverseOf have each page in its triples show in the infobox of
the other, albeit, unlike with owl:SymmetricProperty under different property
names. Similarly, we plan development for owl:TransitiveProperty by having
a value for such property rows in all infoboxes along the chain of pages link by
the property.


4.3   Data Input

OWL Wiki Forms aims to process ontologies in order to guide the inter-
face in assisting both system designers and data enterers. As with infoboxes,
the rdfs:domain property determines the rows in a class’s input form. The
rdfs:range property, on the other hand, plays a different role here. Fig. 2 shows
a screen display for a page’s data input form, as generated by OWF processing
of these and other properties.
    One means of assisting data enterers is autocompletion on property value
fields. To provide this, OWF processes the rdfs:range properties of each prop-
erty in order to feed autocompletion in entering its values. Here, as the user
types, autocompletion offers a pull-down menu of the wiki pagenames that are
members of the classes in the property’s range.
    OWL Wiki Forms achieves this by generating corresponding Semantic Forms
“autocomplete on category” parameter code. On the Semantic Web level, this
gives the autocompletion the names of pages representing resources within the
given classes. The user can still enter new names outside this list. As described
earlier, such a new name will then appear in the resulting infobox as a link to a
form for one of the range classes that generates a new page of that name.


5   Conclusion

The Informatics faculty of the Open Universiteit offers Semantic Web courses
that apply semantic wikis. These wikis support the didactic approach of these
courses, in which knowledge of the Semantic Web and its components and tech-
nologies is tested by means of system development assignments. The function-
ality of this semantic wiki supports writing and sharing course material, the
implementation of student programming assignments, communication between
the teacher and each student, and between the students. Since distance learning
applies here, the communicative aspects of the wiki are particularly useful. The
technical contributions of this work include new support for Semantic Web com-
ponents directly within a semantic wiki interface. These functional extensions
of current semantic wikis are programmed in the software package OWL Wiki
Forms. More information about OWL Wiki Forms is available at its website at
http://icommas.ou.nl/lru/OWLwikiForms/.


6   Acknowledgements

Funding for this work comes in part from the Open Universiteit IPO online
begeleiding (online teaching) project and from the OUNL’s Informatics depart-
ment. Students of the OUNL course Capita selecta thema - Semantic Web pro-
vided feedback from their use of implementation in the execution of their class
projects. The Semantic Friendly Forms implementation discussed in this work
was part of an OUNL Bachelor’s degree team project. The students from this
project, Henri Gerrits and Rens te Kloese, also gave comments beyond the scope
of their project that were helpful for this work. Open Universiteit colleagues Stef
Joosten and Laura Benvenuti gave insightful discussions about this work. Harold
Pootjes and Leo Wiegerink collaborated in developing the Bachelor’s Semantic
Web course, during which they helped with many of the concepts presented here.
References
 1. S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, and Z. Ives. DBpedia: A nucleus for
    a web of open data. In In 6th Intl Semantic Web Conference, Busan, Korea, pages
    11–15. Springer, 2007.
 2. C. Becker, C. Bizer, M. Erdmann, and M. Greaves. Extending SMW+ with a
    linked data integration framework. In 9th International Semantic Web Conference
    (ISWC2010), November 2010.
 3. T. Berners-Lee, Y. Chen, L. Chilton, D. Connolly, R. Dhanaraj, J. Hollenbach,
    A. Lerer, and D. Sheets. Tabulator: Exploring and analyzing linked data on the
    semantic web. In Proceedings of the 3rd International Semantic Web User Inter-
    action, 2006.
 4. C. Bizer, R. Cyganiak, and T. Heath. How to publish linked data on the web. http:
    //www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/, 2007. Revised
    2008. Accessed 04/03/2011.
 5. H. Gerrits and A. L. te Kloese. Semantic Friendly Forms. http://icommas.ou.nl/
    lru/OWLwikiForms/index.php?title=Semantic Friendly Forms, 2010. Accessed
    04/03/2011.
 6. N. Heino, S. Dietzold, M. Martin, and S. Auer. Developing semantic web appli-
    cations with the ontowiki framework. In T. Pellegrini, S. Auer, K. Tochtermann,
    and S. Schaffert, editors, Networked Knowledge - Networked Media, volume 221 of
    Studies in Computational Intelligence, pages 61–77. Springer, Berlin / Heidelberg,
    2009.
 7. H. Knublauch, M. A. Musen, and A. L. Rector. Editing description logic ontologies
    with the Protégé OWL plugin. CEUR, 104, 2004.
 8. Y. Koren. Semantic Forms. MediaWiki extention, http://www.mediawiki.org/
    wiki/Extension:Semantic Forms, 2011. Accessed 04/03/2011.
 9. M. Krötzsch, D. Vrandecic, M. Völkel, H. Haller, and R. Studer. Semantic
    wikipedia. Journal of Web Semantics, 5:251–261, September 2007.
10. R. Oostenrijk. Implementeren en toepassen van Web Ontology Language (OWL) in
    Semantic MediaWiki (SMW). Master’s course student project, Informatics Faculty,
    Open Universiteit, 2009.
11. L. Rutledge. Semantic wiki’s voor onderwijs - linking data, kennis en leren. In
    Nationale Informatica Onderwijs Congres (NIOC 2009), Utrecht, The Netherlands,
    2009.
12. L. Rutledge, J. van Ossenbruggen, and L. Hardman. Making RDF presentable:
    integrated global and local Semantic Web browsing. In Proceedings of the 14th
    international conference on World Wide Web, WWW ’05, pages 199–206, New
    York, NY, USA, 2005. ACM.
13. R. Schuwer and F. de Vries. Structuring OER using learning trajectories. In Open
    Ed Conference 2010, Barcelona, Spain, 2010.
14. Semantic MediaWiki Community Wiki. Browse data page for public active
    sites.    http://smw.referata.com/w/index.php?title=Special:BrowseData&
    cat=Sites&Public=1&Status=Active, 2011. Accessed 04/03/2011.
15. Semantic MediaWiki project.               Semantic MediaWiki.             http://
    semantic-mediawiki.org/, 2011. Accessed 04/03/2011.
16. D. Vrandecic and M. Krötzsch. Reusing ontological background knowledge in
    semantic wikis. In M. Völkel, S. Schaffert, and S. Decker, editors, Proceedings
    of the First Workshop on Semantic Wikis – From Wikis to Semantics, Budva,
    Montenegro, Juni 2006.
   Navigating learning resources through linked data: a
  preliminary report on the re-design of Organic.Edunet

                             1
      Miguel-Angel Sicilia , Hannes Ebner2, Salvador Sánchez-Alonso1, Fernando
                Álvarez, Alberto Abián, Elena García-Barriocanal

           1
             Information Engineering Research Unit, Computer Science Department,
            University of Alcalá, Polytechnic Building, Ctra. De Barcelona km. 33.6
                            28871 Alcalá de Henares (Madrid), Spain
      {msicilia, salvador.sanchez, fernandol.alvarez, alberto.abian, elena.garciab}@uah.es
       2
         School of Computer Science and Communication, Royal Institute of Technology,
                        Lindstedtsvägen 3, 10044 Stockholm, Sweden
                                         hebner@kth.se


      Abstract. Learning objects repositories have grown and matured in the last
      years, being currently a cornerstone for open education. Several current systems
      are offering metadata openly through mainstream harvesting protocols or
      providing standardized query interfaces. Also, the use of standardized
      vocabularies or ontologies is becoming more common to provide a degree of
      semantic interoperability. However, learning object metadata is typically not
      linked across repositories, and it is not providing a way to navigate by using
      other sources of data available on the Web. The linked open data (LOD)
      approach provides the framework for the evolution of learning object
      repositories into a more flexible system of sharing learning resource metadata.
      This paper describes how linked data has been integrated in the design and re-
      design of the export mechanisms of Organic.Edunet, a federation of learning
      repositories in the domain of organic agriculture that uses an RDF store and
      several ontologies to browse and search resources. The paper focuses on how
      the existing search and semantic browsing mechanisms can benefit from the use
      of LOD across repositories.

      Keywords: learning         objects,   repositories,   linked   data,   ontologies,
      Organic.Edunet


1 Introduction

Learning object repositories have grown and matured in the last years, being
considered currently an essential component for e-learning (Ochoa and Duval, 2009).
Each of these repositories provide some particular interaction style for searching,
browsing and navigating across learning resources, typically including text and
metadata search along with some form of topical navigation. Some repositories
provide also capabilities that extend its reach to resources or metadata distributed
across different repositories (Klemke et al., 2010). This is currently achieved mainly
by two mechanisms: harvesting and distributed search. In federated systems using
harvesting, the metadata from a number of repositories is collected into a central
location and regularly updated. Protocols like OAI-PMH are typically used for that
process (van de Sompel et al., 2004). In distributed search, a query is distributed to
several repositories and the query results are put together and eventually ranked
according to some criteria. The Simple Query Interface (SQI) is an example of a
protocol enabling that kind of arrangement (Simon et al., 2005).
   While harvesting and distributed search have proved effective for aggregating
repositories for the purpose of search, they do not allow actual browsing and
navigation across different systems. In the case of systems based on harvesting, the
navigation is still local to the centralized metadata store, and thus restricted to the
repositories that are inside the federation. In the case of distributed search, browsing
is also limited as only metadata records of query results are provided, and not links to
categorizations of resources or to other resources not in the same repository, so cross-
repository navigation is not allowed. The common problem with these systems is that
they restrict navigation either by using a concrete request/response format or by
restricting to the metadata already harvested in a central location, which is in most
cases not providing navigation to other resources in its original systems. A distributed
browsing and navigation system across repositories would require a way to expose
metadata or query results that provides links to permanently identifiable resources
along with links to elements of terminologies or taxonomies that are able to mediate
navigation to other systems.
   The collection of practices and recommendations for exposing linked open data
(Bizer, Heath and Berners-Lee, 2009) are providing the required infrastructure for
allowing that distributed navigation possible by exposing resource information in a
common RDF format and enabling navigation through linked open data (LOD) to
other systems. However, the deployment of the infrastructure for realizing that
concept requires a re-conceptualization of the way repositories are commonly offering
metadata and search facilities, along with the development of specialized browsers
that are capable of traversing the Web of linked data. This paper provides the ideas
and initial design elements for the implementation of such approach, using the
Organic.Edunet repository1 as an example. The research objective of the present work
is laying out the foundations and requirements for exposing LOD in learning object
repositories that enable cross-repository search and navigation. In consequence, the
scope of the proposals presented focuses on how to give support to these scenarios
and it is not addressing other infrastructural aspects (e.g. bootstrapping or the
practicalities of migrating/extending current repositories).
The rest of this paper is structured as follows. Section 2 briefly describes the
requirements from the data provider and consumer perspectives. Then, Section 3
describes the main decisions in the way IEEE LOM metadata is exported as linked
data. Section 4 describes how the availability of generalized linked data following the
conventions exposed could be exploited for browsing across repositories. Finally,
conclusions and outlook are provided in Section 5.


1   http://portal.organic-edunet.eu/
2 Requirements

   Data provider requirements are no essentially different from those of exposing
linked data in other domains in which some metadata schemas are already in place.
The most widespread specific standard for providing descriptive metadata of learning
objects is IEEE LOM (IEEE, 2002). There are a number of systems providing
ontologies or RDF mappings of IEEE LOM, however they are not standardized.
Nilsson et al. (2008) published a mapping of IEEE LOM into the Dublin Core
Abstract Model that is discussed within the joint DCMI/IEEE LTSC Taskforce2. This
mapping was also used within Organic.Edunet to annotate resources using the
repository tool Confolio (Ebner, 2009). The mapping sets the basis for exposing both
Dublin Core and IEEE LOM metadata by using a shared format.
   Further, there is a need to consider the particular nature of learning objects and the
fact that a LOM metadata record is describing two very distinct types of information:
(a) factual information on the resource, e.g. identifier, language, technical description,
and (b) contextual information in which the resource is described for a particular
educational context, i.e. difficulty is relying on some particular target user population,
and it is possible that another metadata record for the same resource is having a
different value as it is considering a different target. Such a contextualisation is
supported by the Confolio tool and its backend (Ebner, 2008). One common use case
is to harvest metadata from generic repositories and to provide context by adding
educational metadata. Separate named graphs with relations between them are used to
express metadata (in our case educational metadata) and meta-metadata such as
provenance information and access control.
   The general availability of learning object metadata as LOD across repositories
would enable a number of applications that are beyond the reach of current systems.
From the viewpoint of filtering and selection, it would be possible to build systems
that mediate search through terminologies or ontologies representing educational
systems and competencies required for particular profiles. Particularly, it would be
possible to navigate through these representations to browse and search,
complementing current distributed query and harvesting mechanisms. Another key
application would be that of aggregating metadata descriptions for the same learning
object, creating new opportunities for filtering based on quality, and having a way to
measure the reusability of the resource across different educational contexts (Sicilia
and García-Barriocanal, 2003).
   In order to enable cross-repository search and browsing through LOD, there is a
need for some conventions on exposing resource metadata and also some descriptions
of the repository themselves. These are described in what follows.


3 Exporting IEEE LOM as linked data

   Organic.Edunet features two different approaches for storing metadata: (1) the
distributed repository tools using a triple-store with an abstraction of named graphs

2   http://dublincore.org/educationwiki/DCMIIEEELTSCTaskforce
and an implementation of the DCMI/IEEE draft using DCAM (Ebner, 2009), and (2)
on the federated portal side, an OWL-based repository based on HP Jena with a
relational datastore backend, using an OWL representation of IEEE LOM combined
with several ontologies, including one with an specific terminology for organic
agriculture and agroecology (Sánchez-Alonso, 2009). The repository tools within the
federation expose the metadata following the linked data approach and according to
the abovementioned DCMI/IEEE draft. In a different way, the process of exporting
linked data through the portal required the construction of a new module that uses the
existing SPARQL endpoint to translate the native RDFOWL representation to the
RDF export described below, combined with a module creating additional RDF links
whenever viable in an automated way.
   Identifiers should be representing at least two types of entities: the object
themselves (i.e. the Web contents) and the metadata records, as the same objects may
be described several times in different repositories (Downes, 2004). In our case, all
the resources are external to the portal and identified by URIs, so we are exposing
only the second kind of objects. The resource identifiers themselves in IEEE LOM are
specified in element General.Identifier. In our case, the URI is the object of all
the resource statements about the resource, with the exception of the Meta-
metadata category, that will have as object the document requested. In consequence,
a fragment expressed in Turtle syntax for the description of an example resource
would be as follows (namespace declarations are omitted for brevity):
  # Meta-metadata information
  <>
      lom-meta:schema lom-meta:LOMv1.0
      lom-meta:language < http://dbpedia.org/page/English_language>;
      …
  # The description
  < http://confolio.vm.grnet.gr/scam/6/resource/271>
      lom-gen:language < http://dbpedia.org/page/English_language>;
      lom-gen:keyword [
        a rdf:Alt;
        rdf:_1 "energy consumption"@en ;
        rdf:_2 "energiforbruk"@no ;
      ];
      lom-gen:structure lom-gen:atomic;
      lom-gen:coverage <http://dbpedia.org/page/Austria>;
      lom-edu:interactivityType lom-edu:expositive;
      lom-life:editor [
            lom-life:contributor [
                   foaf:name “Petros Lameras”
                   foaf: workplaceHomepage <http://www.ea.gr>
                   …
              ]
      ]
      lom-life:validator [
      …
   There are several cases in which links to other linked data sites can be provided,
including language of the metadata record and of the resource, which can be
mapped to DBPedia resources. The lifecycle of the resources in Organic.Edunet
requires metadata on the content provider (editor) and on the party that underwent
quality assessment (validator). Both are using internally vCards, however in the
exporting through linked data, these are converted into FOAF sentences linking to
organizations where available. Another interesting case is lom-gen:coverage in
which spatial coverages are linked to the DBPedia, and can also be linked to other LD
datasets representing geopolitical information. In general, the vocabulary values used
(e.g. lom-edu:expositive) are represented using the (namespace, value) construct
proposed as a general mechanism in Nilsson et al. (2003).
   Linking to ontology/terminology elements is achieved by exposing the
Organic.Edunet ontology as linked data by using the following convention:
<http://www.organic-edunet.eu/ont/element>. Where term is one of the terms,
instances or properties in the ontology. The RDF description enables the navigation
through the is-a hierarchy but also laterally by following predicates. For example:
  < http://www.organic-edunet.eu/ont/biological-fertilizer>
   oe:is-made-from
     < http://www.organic-edunet.eu/ont/plant-origin-processed-product>;
   A special case is that of element Classification which is used for various
purposes in Organic.Edunet. IEEE LOM provides a vocabulary of purposes for this
field, but these have been extended in the project with additional ones that were
selected to provide a better description of classifications. The following fragment
shows an example of IEEE LOM and extended classification purposes.
  oe:details <http://www.organic-edunet.eu/ont/soil-fertility>
  The oe:detail property is an rdfs:subPropertyOf of the vocabulary property
lom-cls:educationalObjective. This is one of the IEEE LOM purpose values
refined currently in the portal, a similar approach could be used for others as for
example, prerrequisites.
   The abovementioned conventions are the basis for exposing metadata in LOD
format, including links to arbitrary terminologies. An additional step is that of
providing links to other resources. This is implemented by extending the meanings of
the IEEE LOM Relation.Kind element (which has a value space based on Dublin
Core). As there are no data for this element in Organic.Edunet, descriptions are
extracted from other parts of the portal. Currently and for demonstration purposes,
links based on co-ocurrence of user-created tags are automatically added.


4 Browsing across repositories by using IEEE LOM

Architecting a browser that works over repositories publishing linked data following
the abovedescribed conventions requires components to start the search process and to
traverse links across metadata to reach related resources. We limit our discussion to
reaching resources that are related through common reference to entities or concepts
in terminologies that are also published as linked data. Organic.Edunet features a
form of “berrypicking” search as described by Bates (1989). The existing
implementation was based on the traversal of ontology relations, namely subclass for
the main navigation and arbitrary properties for an alternative traversal. The selection
one or several terms is done by navigation, iteratively and search processes are done
on the central metadata repository that harvests regularly the provider systems. A
REST-based interface is invoked from a Flash-based visualization, using JSON for
the transfer of data for the rendering of the traversal tree. The re-design to exploit
linked data consumption entails two main aspects:
1.   Changing the interfaces serving terms during navigation to provide ontology
     terms across distributed linked terminologies. This allows the visualization to
     render any kind of relationship that is provided as backlinks in the description of
     the terms.
2. Changing the invocation from a local search to a distributed one. This can do in
     two flavors: (i) using text descriptions to go through conventional distributed
     query systems and (ii) selecting repositories or aggregators depending on the self-
     declaration of the terminologies used by repositories.
   In both cases, repositories need to expose some form of standardized query
interface as the SQI. The second approach would requires knowing which
terminologies are in use by each repository, so that RDF links as applicable in
Classification are used for the selection. A kind of self-declaration from
repositories may follow the linked data approach by exposing information about
themselves using linked data. An example is as follows:
     < http://www.organic-edunet.eu/lor>
         dc:requires <http://www.organic-edunet.eu/ont/>;
         dc:requires <http://aims.fao.org/aos/agrovoc/>;
          ...
The W3C VoID3 vocabulary for expressing metadata about RDF datasets can be used
for the same purpose, concretely using the void:vocabulary property. Also, a
property similar to VoID sparqlEndpoint could be used to provide the SQI endpoint
of the repository. While many repositories are not using any kind of controlled
vocabulary there are some relevant cases starting to use them as the LRE Thesaurus4,
the Organic.Edunet ontology or the different classification systems used in MACE5.
In other cases, they can be generated automatically from metadata, as has been
experimentally tested with AGROVOC in Organic.Edunet by using KEA6 keyphrase
extraction mechanisms. As an example, let’s consider the current semantic
navigational interface in Organic.Edunet showed in Figure 1. The OE ontology is
providing mappings to terms in the AGROVOC thesaurus via owl:sameAs, which has
very recently started to be exposed as linked data7. This enables the navigation to
move to AGROVOC for the collection of query terms. Once the user has finished the
query formulation, the terms selected would be sent via SQI to the underlying
repository (in this case Organic.Edunet) or to any other that is exposing their
description. The requires statements serve the role of selecting the more appropriate
repositories according to terminologies used. Repositories can be discovered by
crawling or by trying with the prefixes of URIs of resources included in the relation
statements of records obtained.
The results snippets in Figure 1 would then potentially correspond to resources in
several repositories without the need for central harvesting or registries of
repositories. Also, relations between learning objects and to other linked data sets
(e.g. as in coverage) can be exploited for query expansion.


3
  http://www.w3.org/TR/void/
4
  http://lre.eun.org/node/87
5
  http://portal.mace-project.eu/
6
  http://www.nzdl.org/Kea/
7
  http://aims.fao.org/website/Linked-Open-Data/sub
Fig. 1. The “berrypicking” interface exposed in Organic.Edunet


5 Conclusions and outlook

The linked data approach represents an opportunity to evolve existing learning object
repository technology towards the open exposure of metadata in a form that enables
novel approaches to search and navigation that are not restricted to centralized
metadata stores and that enable navigating across repositories without a need for
integrating them beyond the provision of links to other linked data sources.
   Linked data support in the Organic.Edunet portal is currently in a prototype stage
and in ongoing development, so it may still be subject to change. It has been used in a
different way from the beginning within the repository tools of the providers to the
portal based on Confolio, complying with DCMI/IEEE drafts for encoding the
metadata. The support for distributed browsing is currently based only on pilot
prototypes, as there are no other learning object repositories exposing linked data that
could be used for integration testing. However, the re-design of the browsing
interfaces show that the approach is feasible to implement. Future work should deal
with the evaluation and refinement of the conventions used to provide semantic
relations between learning resources and their metadata, moving to a formal
specification that could be used in the many repositories currently exposing IEEE
LOM metadata through OAI-PMH or SQI interfaces. It should be noted that the
success of the approach presented here relies on the adoption of a shared set of linked
data exposure conventions for repositories, which can only come from a kind of
community consensus.


Acknowledgments

  The work presented in this paper has been funded with support by the European
Commission through the projects Organic.Edunet and TEL-Map.


References

Bates, M.: The Design of Browsing and Berrypicking Techniques for the Online Search
   Interface. Online Review 13 (October 1989): 407-424.
Bizer, C., Heath, T. and Berners-Lee, T.: Linked data - the story so far. International Journal
   on Semantic Web and Information Systems (IJSWIS), 5(3): 1-22 (2009).
Downes, S. (2004) 'Resource profiles', Journal of Interactive Media in Education, Available at:
   http://www-jime.open.ac.uk/2004/5.
Ebner, H., Palmér, M.: A Mashup-friendly Resource and Metadata Management Framework,
   In: Wild, Kalz, Palmér (Eds.): Mash-Up Personal Learning Environments, Proceedings of
   the 1st Workshop MUPPLE, European Conference on Technology Enhanced Learning (EC-
   TEL), Maastricht, The Netherlands, 17-19 September, 2008.
Ebner, H., Manouselis, M., Palmér, M., Enoksson, F., Palavitsinis, N., Kastrantas, K., Naeve,
   A.: Learning Object Annotation for Agricultural Learning Repositories, IEEE International
   Conference on Advanced Learning Technologies, Riga, Latvia, 15-17 July, 2009.
IEEE LOM, “Draft Standard for Learning Object Metadata”, IEEE Learning Technology
   Standards Committee, IEEE 1484.12.1-2002, 15 July 2002.
Klemke, R., Ternier, S., Kalz, M. and Specht, M. (2010), Implementing infrastructures for
   managing learning objects. British Journal of Educational Technology, 41: 873–882
Nilsson, M.:Draft Recommended Practice for Expressing IEEE Learning Object Metadata
   Instances Using the Dublin Core Abstract Model. Draft IEEE P1484.12.4tm/D1 (2008).
Ochoa, X. and Duval, E.: Quantitative analysis of learning object repositories,” IEEE
   Transactions on Learning Technologies 2(3), 226–238 (2009).
Van de Sompel, H., Nelson, M.L., Lagoze, C. and Warner, S.: Resource harvesting within the
   OAI-PMH framework. D-Lib Magazine, 10(12), 1082-9873 (2004).
Sánchez-Alonso, S.: Enhancing availability of learning resources on organic agriculture and
   agroecology. The Electronic Library 27(5): 792-813 (2009).
Sicilia, M.A. & García, E. (2003). On the Concepts of Usability and Reusability of Learning
   Objects . International Review of Research in Open and Distance Learning 4(2).
Simon, B., Massart, D., van Assche, F., Ternier, S., Duval, E., Brantner, S.,Olmedilla, D.,
   Miklos, Z.: A Simple Query Interface for Interoperable Learning Repositories. In:
   Proceedings of the 1st Workshop on Interoperability of Web-based Educational Systems,
   pp. 11–18 (2005)
               Educational Resources as Social Objects
                   in Semantic Social Networks

      Eleni Kaldoudi1 , Nikolas Dovrolis1, Daniela Giordano2 and Stefan Dietze3,
        1
         School of Medicine, Democritus University of Thrace, Alexandroupoli, Greece
                         kaldoudi@med.duth.gr, ndovroli@alex.duth.gr
  2
    University of Catania, Dipartimento di Ingegneria Elettrica, Elettronica e Informatica, Viale
                              A. Doria 6, 95125, Catania, Italy
                                     dgiordan@diit.unict.it
  3
    The Knowledge Media Institute, The Open University, MK7 6AA, Milton Keynes, UK.
                                      s.dietze@open.ac.uk


       Abstract. Educational content is often shared among different educators and is
       enriched, adapted and in general repurposed so that it can be re-used in different
       contexts. This paper exploits the various aspects of object-centered sociality
       that the educational resource can manifest in a participatory virtual community
       of authors and learners. These social facets include: (a) the obvious connections
       via common tags; (b) connections based on collective usage and other related
       interaction of human users; (c) social connections based on the inheritance as
       realized via educational content repurposing; (d) semantic connections and
       similarities of accompanying metadata. The concept of exploiting the various
       aspects of educational resources as social objects is shown in the
       implementation of the MetaMorphosis+ social network for educational resource
       sharing in health sciences.

       Keywords: educational resource, social networking, semantic technologies,
       Linked Open Data.


1 Introduction

During the past ten years we have experienced the realization of the so-called
“information society”, characterized by mass information seeking and based on the
distribution, mainly via the internet, of pre-defined and standardized digital data.
Current and emerging trends in information systems and communication services
mark the shift towards a “knowledge society”, that is, a society based on knowledge
as a value. Thus, educational resources (in the broad sense) become a central issue in
our society. This paper exploits the social aspect of educational resources within a
participatory virtual community of authors and learners as realized via the
MetaMorphosis+ environment for sharing educational resources in medical education.
          This work is conducted partly within the mEducator project, an EU funded
best practice network (under the eContentPlus2008 programme, Contract Nr: ECP
2008 EDU 418006) with the aim to implement and critically evaluate existing
standards and reference models in the field of e-learning in order to enable specialized
state-of-the-art medical educational content to be discovered, retrieved, shared and re-
used across European higher academic institutions. The following sections discuss
some of the different facets of the educational object sociality, including (a) the
obvious connections via common tags; (b) connections based on collective usage and
other related interaction of human users; (c) social connections based on the
inheritance as realized via educational content repurposing; (d) semantic connections
and similarities of accompanying metadata. Then, the concept of exploiting the
various aspects of educational resources as social objects is shown in the
implementation of the MetaMorphosis+ semantic social network [1].


2 Educational Content as a Social Object

The social Web, or Web 2.0 [2], has become an important trend during the last few
years. Among the prominent social web tools, social networking websites focus on
creating online communities of individuals who publish their content and activities
while exploring others content and activities, thus creating virtual on-line social
groups and associations.
   This communication paradigm has been taken up by the community of researchers
and academics and nowadays there is a thriving number of social networks dedicated
to science and professional relations. Most of these virtual communities provide a
way for researchers to create and populate their online profiles and share their work
with others. Recently the term object-centered sociality was introduced [3] to describe
the fact that strong social relationships are built mainly when individuals are grouped
together around a shared object that mediates the ties between them. Therefore, we
can assume that each content item on a social network site can be a source of social
connectivity, catalyzing social networking in virtual spaces. This new approach to
sociality has drawn attention, and current state-of-the-art research in the area involves
various ways to exploit object-oriented sociality to the benefit of the community. For
example, a semantic tag model has been recently introduced for representing tag data
in the form of reusable constructs at a semantic level in order to share tag metadata
across heterogeneous social networking sites [4]. A similar work proposes to use
existing information on Web 2.0 and convert it to RDF which can be used as a
flexible model for describing and integrating data [5],[6].
   In this paper we take a different novel view about merging semantic technologies
with the social web in order to enhance object-centered sociality and data integration.
Here the social object is the educational resource. In our work, we view and exploit
their „social aspects‟ of educational resources in a variety of different ways as
described below.
2.1   Building a Social Profile for Educational Resources

The most straightforward social dimension of an educational resource as a social
object in a network can be realized in the conventional way of connections among
profile tags. This requires a standardized metadata set to describe concisely an
educational resource and thus create its social profile. Standardizing metadata for
describing digital educational resources constitutes one of the main research topics in
the e-learning community. Currently, the IEEE LOM (Learning Object Metadata)
XML scheme seems the most prominent standard for describing learning objects [7]
as it derives from a number of related standardization initiatives. LOM defines a wide
range of metadata to classify and characterize learning resources, which include:
overall description (cataloguing, annotations, and associations and relationships with
other learning resources), technical data (file size, format, installation/usage
descriptions), educational data (educational purpose, learning objectives,
classification), and management data (intellectual property rights). However, there are
no generally accepted conventions for properly describing learning objectives or the
learning context, and although attempts have been made (such CLEO, and
Educational Markup Languages) these only capture some of the semantics; thus more
complex models are needed [8]. Thus, different communities give different meanings
and use different descriptions (e.g. extensions/alterations to LOM, such as Healthcare
LOM, etc), really breaking down the notion of the standard. Whereas the above
difficulties relate to issues of relevance, semantics and usability, a more severe
critique of learning object metadata standards [9] points out that the assumption
implied in the current ways of describing an educational resource is that “information
equates learning”.
   Since metadata currently only describes simplified technical and structural
characteristics of learning objects, In order to put emphasis on the educational nature
and context of a resource (as opposed to its technical integration in the digital world)
one had to resort towards metatags that describe the nature of the learning activities in
relation to the knowledge states that may result, or the thinking processes stimulated
by the activities entailed by the resource. This approach is taken by the mEducator
consortium in their recent proposal of a metadata scheme to capture pedagogical
aspects of a learning resource as well as patterns of activity (e.g. repurposing
episodes, but not only). The mEducator metadata scheme [10] includes a number of
fields addressing different aspects of the educational resource: (a) general fields:
resource title, unique identifier, URL, URN, intellectual property rights
clearance/license, quality stamp (if any); (b) fields related to a general resource
description: resource authors, creation date, citation (i.e. how the resource should be
formally cited), keywords, content description, technical description (including any
technical requirements to access and use the resource); (c) fields related to the
educational aspect of the resource: educational context (for which the resource is
intended), teaching/using instructions, educational objectives, expected learning
outcomes, suggested assessment methods, educational prerequisites; (d) fields related
to classification/taxonomy information: resource language, type, discipline, discipline
subspecialty, educational level; and (e) fields addressing repurposing: resource
parents, repurposing context, repurposing description.
  These user generated description metadata, when treated as social tags, create a
complex and dynamic organization of educational resources in a similar fashion as in
any conventional social network, thus realizing the resources‟ social network.


2.2   Resource Sociality based on Collective Usage Interaction

Educational resources are accessed, used, shared, repurposed, and also rated,
commented upon, and can be organized in a number of user specified ways in
collections. Capturing and sharing information about the attention that users spend on
resources in specific contexts can provide a different aspect of sociality based on the
personal views and mental models of the users. This way one can build the profile of
a resource as it appears to the external user, as opposed to the profile of the resource
according to the view of its creator, as depicted in the description metadata.
         The perspective and attention is normally captured via recording contextual
attention metadata [11]. This includes data about the users‟ attention and activities
that relate both to semantically rich actions on and interactions with educational
content items as well as data on indirect interactions amongst content items.
Additionally, basic interaction metadata can also be considered which includes all
other basic user-system interactions that provide some kind of basic attention
information (not necessarily semantically and contextually rich).


2.3   Resource Family Trees based on Repurposing History and Inheritance

The term „repurposing‟ refers to changing a learning resource initially created and
used for a specific educational purpose in a specific educational context in order to fit
a different new educational purpose in the same or different educational context.
Although not formally addressed as such, educational content repurposing is what any
educator is routinely engaged in when preparing a new educational experience,
including preparing the educational content itself. Customarily, when an educator sets
the context and goals of a new educational experience, he/she will overview existing
content and/or search for new relative content and then repurpose and re-organize
content to fit the purpose of the new educational experience.
   There can be a variety of situations where repurposing educational content is
desired. These situations, referred to as “repurposing contexts”, can be of a
pedagogical nature, a technical nature or both, and include the following [12],[13]: re-
purposing (1) in terms of the actual content; (2) to different languages; (3) to different
cultures; (4) for different pedagogical approaches; (5) for different educational levels;
(6) for different disciplines or professions; (7) to different content types; (8) for
different content delivery media and/or technology; (9) to educational content from an
initial content type that is not intended for education; and (10) for people with special
needs.
   Considerable research work has targeted the field of automatic learning resource
repurposing, e.g. [14],[15],[16],[17]. Only few works have really concentrated on
modelling repurposing history of the content. The problem of the granularity level at
which the modifications should be described is posed in [18], where the approach is to
track content changes only in structural, layout and content aspects of the content (i.e.
insert, delete, replace, rearrange). A shift from tracking structural changes towards
studying changes in the educational context is seen in the repurposing of a specific
content type (that of virtual patients) as addressed in the eVip/ReVip projects [13].
Finally, the MURLLO project from the community of language teachers, considers
repurposing by tracking the changes in the “contextual metadata” that describe the
intended teaching context [19]. Interestingly, metadata modifications that account for
repurposing are carried out collaboratively based on Wiki technology.
   In our work, we address repurposing as a means to provide a different kind of
sociality for the educational resources. Thus repurposing history and inheritance are
used as basic social relationship among educational resources in order to cluster
resources into families. Each repurposed resource declares its parent(s) resource(s).
Following iteratively the „parents‟ in a chain of repurposing ancestors, the entire
„family‟ tree of the particular resource can be compiled.


2.4   Semantic Links between Resources by Harvesting the Linked Data Cloud

Finally, the social dimension of educational resources can be further expanded and
enriched by harvesting semantically rich information existing in the Linked Data
cloud. The Linked Open Data (LOD) approach is simply about “using the Web to
create typed links between data from different sources. These may be as diverse as
databases maintained by two organisations in different geographical locations, or
simply heterogeneous systems within one organisation that, historically, have not
easily interoperated at the data level [20]. The Linking Open Data Project [21] is a
community project of the World Wide Web Consortium‟s Semantic Web Education and
Outreach Group (W3C SWEO). The goal of the project is to extend the Web with a data
commons by publishing various open data sets on the Web, and making links between
data items from different data sources. Since its inception in June 2007, the size of the
cloud has rapidly exploded and already includes a large variety of open data sets
including several research and medical data sets. This wealth of information can be
used to automatically enrich educational resources metadata with references to
external vocabularies, and in particular domain specific vocabularies, thus creating a
rich domain specific profile and extending the resource‟s social connections to other
web objects. For example, in medical education the BioPortal [22] enables searching,
sharing, visualizing, and analyzing a large set of biomedical ontologies,
terminologies, and annotations exposed as LOD.
3 A Semantic Social Network for Educational Content Sharing

The various ways of exploiting educational resources as social objects have been
implemented in the MetaMorphosis+ social network [1], which can be viewed as two
distinctive and interacting networks. The first one is a network of persons, including
authors, potential authors and final users of learning objects (students, or teachers or
others, e.g. educational managers, etc). The second is a network of published learning
resources. The network of persons is functioning in a way similar to other social
networks. Persons can interact with each other via their personal blogs, declare
friends and create their own interest groups. At a different level, learning resources
themselves create an equivalent social network with interactions with other learning
resources as well as with persons. These interactions are variable and dynamic, thus
create an evolving, user centric and goal oriented organization of resources and
persons, based on social dynamics.
   From the point of view of the resources‟ social network, interactions are more
complex. Each resource is described by a variety of fields that capture its basic
characteristics as well as features pertaining to repurposing. This collection of fields
forms the resource profile and is a virtual representation of the resource in the social
network of resources. Educational resources are distributed, and they can reside
anywhere on the Web (e.g. within a Learning Management System, another on-line
repository, a Web page, etc), as long as their URL is known. MetaMorphosis+ only
holds their metadata description and the pointer to their actual location.
   The organization of educational resources is dynamically created around the four
different social aspects presented in the previous section, as shown in Fig. 1. A most
straightforward organization is created on the basis of user generated tags that
constitute the resource profile as a set of metadata.
   A second type of organization is a hierarchical one, describing the repurposing
history of each resource. Each repurposed resource declares its parent(s) resource(s)
and thus a dynamic organization of the resources in „families‟ is generated. In
MetaMorphosis+, a force-directed graph is used to depict the specific resource‟s
family and inheritance patterns. Each node in the graph represents a resource, while
the directed edges represent repurposing relationship, with the arrows pointing from
the “source” objects to their “repurposed” descendants. The nodes also state the
„repurposing context‟, while they are active links to the resource profile where more
information on the repurposing description can be obtained. For the entire resource
collection, a circular directed graph representation is used which depicts all the
resources with the various individual inheritance trees, usually not interconnected
amongst them. A resource inheritance tree is a group of resources that have a
relationship based on repurposing – this can also be viewed as resource „family‟.
Fig. 1. A schematic representation of the double network organization in MetaMorphosis+
showing the different social aspects of the educational resources.

   The third dimension in the social organization of resources is based on a model for
framing the representation and treatment of information gathered from the reuse and
repurposing of learning resources from distributed repositories [23]. The model takes
into account as sources of information both static user-edited or automatically
generated metadata fields and the emerging, dynamic information clouds that
surround a learning resource when users comment on it, tag it, or explicitly link it to
other learning resources, i.e. by a combined use of strict taxonomies/controlled
vocabularies with folksonomies (dynamic, user‟s tags facilitators/aggregators). By
coordinating these separate information layers, we hope to reduce the semantic gap
occurring when unanticipated contexts of use are to be described by resorting only to
predefined vocabularies, and thus to improve the relevance of the retrieved resources
after a query.
   Finally, the social profile of the educational resources is augmented by semantic
tagging. The architectural framework for semantic data and service linking and
federating of disparate educational resource pools that powers the social environment
is described in detail elsewhere [24]. At the lower level of this semantic technology
framework, a Web data and service layer employs distributed Web services that
harvest educational resource metadata from heterogeneous data sources on the Web.
In the upper layer, semantic data and service integration is achieved based on the
Linked Services approach and on semantic technologies such as iServe [25] and
SmartLink [26]. An RDF repository exposes harvested educational resource metadata
as triples.
   Metadata as harvested by Web sources can also be enriched with existing LOD
vocabularies). This is of particular importance to extend rather unstructured metadata,
such as keywords or free text subject and discipline descriptions with structured data
based on well-known vocabularies. This is achieved by exploiting a variety of
medical domain ontologies and the expanding LOD cloud to semantically annotate
the existing RDF description of a resource and then expose its metadata back to the
LOD cloud for further exploitation by third parties which make use of the web of
LOD.
   Biomedical ontologies provide essential domain knowledge to drive data
integration, information retrieval, data annotation, natural-language processing and
decision support. BioPortal (http:// bioportal.bioontology.org) is an open repository of
biomedical ontologies that provides access via Web services and Web browsers to
ontologies developed in various formats including OWL, RDF, OBO format and
Protégé frames [22]. In MetaMorphosis+ we have utilized the NCBO BioPortal‟s
RESTful Web services programming interface to access and incorporated terms and
concepts from the more than 260 ontologies provided to this day, corresponding to
more than 4.5 million medical and life sciences terms. This way the MetaMorphosis+
user can annotate an educational resource with suggested standardized terms and
concepts from a variety of ontologies, enriching the RDF output with dereferencable
standardized terms as values for the various fields, e.g. keywords, discipline,
specialty, etc. The ontologies used include amongst else for prominent medical
ontologies such as SNOMED-CT (Systematized Nomenclature of Medicine – Clinical
Terms), ICD9/10 (International Statistical Classification Diseases and Related Health
Problems), Body System (body system terms used in ICD11), MeSH (Medical
Subject Headings), NCI (Meta)Thesaurus, Galen (the high level ontology for the
medical domain), HL7 (the Normative RIM model v2), Biomedical Resource
Ontology (BRO, a controlled terminology of resources to improve sensitivity and
specificity of Web searches).
   As an example, suppose a user intends to describe an educational resource by using
the term/concept Telemedicine, in the list of „Keywords‟ or in the „Discipline‟ and
„Specialty‟ fields of the metadata description of the educational resource. Semantic
annotation in MetaMorphosis+ can suggest a number of related standardized terms
from the available ontologies. For example, the equivalent term from the NCI
Thesaurus       is    represented      by    a     dereferencable      URI,    (namely
http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#Telemedicine), thus enabling
rich linking and reasoning within the semantic Web. Figure 1 shows an example of
RDF triples in the mEducator scheme before and after semantic annotation in
MetaMophosis+.
   The current deployment of MetaMorphosis+ social network is implemented using
the Elgg open source social engine (http://elgg.com) and is available on-line on
http://metamorphosis.med.duth.gr. Additionally, FOAF (Friend-of-a-Friend) ontology
is used for integration of human profiles, while Triplify [27] is employed to publish
metadata profile fields as Linked Data. Graph representation was implemented based
on the Prefuse information visualization toolkit (http://www.prefuse.org) and is based
on the GraphML (http://graphml.graphdrawing.org/) standard and a FOAF/GraphML
standard interface so as to dynamically extract the data and form the graphical
representation at the time of request.


Fig. 2. The above depictions of RDF/XML snippets show the differences of the RDF output
before (above) and after (below) the semantic enrichment via the BioPortal. Before enrichment
the values for the „Keyword‟ and the „Discipline‟ fields appear as strings with no linking to an
ontology, while after enrichment these values are substituted with dereferencable URIs
corresponding to standardized ontology terms.

   The main goal of this implementation is to collect and organize pilot educational
content within the mEducator project and test and re-engineer the metadata scheme
for describing educational content in medical education. During the first few months
of deployment the environment exhibits more than 100 registered users and more than
350 educational resources, including 80 repurposed resources. Although about half of
the resources are in English language, there is a representation of more than 15 other
European languages. The resources included in the environment are distributed
among the various educational levels, 33% intended for undergraduate medical
education, 23% intended for postgraduate/resident studies and 21% for continuing
life-long education, while 22% are intended for educating the public. The majority of
resources are of conventional content types, such as lecture notes and books (34%),
lecture presentations (12%) and graphs/diagrams/figures/images (16%). Clinical
cases, teaching files and virtual patients are 12% of available resources, while there is
a small but notable representation of Web 2.0 type resources (4%), serious games
(2%), algorithms (4%), and simulators (1%). When it comes to the repurposed
resources, 84% of them have only one parent, while the rest declare two parent
resources. A total of 42% of the repurposed resources have declared 2 repurposing
contexts and 22% more than two repurposing contexts. All repurposing contexts are
well represented, with repurposing to “different technology”, and “different
educational level”, being the most common.


4 Discussion

In retrospect, it is possible to identify three generations of information technology
supported learning. The first generation is based on multimedia technology support,
such as videos, CD-ROMs or other stand-alone educational software. The second
generation employs telematic technologies and it is basically set up as teaching via the
Web, where conventional educational material, and entire educational courses, is
delivered via the network to remote students. The last, emerging generation, is about
Web based learning, where the Internet is used as a means to create active, context
based, personalized learning experiences. This last generation of e-learning shifts the
emphasis from „teaching‟ to „learning‟ and from the notion of technology as a didactic
mediator to the notion of a sociable, peer-supported, involved learner. This new
learning paradigm inevitably places emphasis on educational resources and virtual
communities of practice. Although a lot of effort has been put in the area of
educational content development, description, and sharing, currently there is no
prominent clear and standards-based solution for the seamless sharing of educational
content in medicine and in general.
         Current efforts mainly address the problem of educational content sharing
via centralized or distributed repositories, but such approaches do not consider
notions such as author/learner participation and collaboration, nor do they address
issues of usage and inheritance as resources are shared, re-used and repurposed. The
MetaMorphosis+ social network presented in this paper presents a novel approach for
capturing the multi-faceted social aspect of educational resources, as they form the
social-object basis of a human network of educators and learners. The goal is to
provide a conceptually different approach to educational resource search and retrieval
via „social‟ associations amongst learning resources and their authors and ultimate
consumers.

Acknowledgments. The authors acknowledge invaluable comments and input with
the mEducator consortium partners. This work is funded in part by the mEducator
project (Contract Nr: ECP 2008 EDU 418006 mEducator) under the eContentplus
programme, a multiannual Community programme to make digital content in Europe
more accessible, usable and exploitable.


References

1. MetaMorphosis+, http://metamorphosis.med.duth.gr
2. O‟Reilly T.:.What is Web 2.0: Design Patterns and Business Models for the Next
   Generation of software. (2005) http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/
   30/what-is-web-20.html
3. Engeström, J.: Why Some Social Network Services Work and Others Don‟t. The Case for
   Object-Centered        Sociality.     (2005).     http://www.zengestrom.com/blog/2005/04/
   why_some_social.html
4. Kim, H.-L., Decker, S., Breslin, J.G.; Representing and Sharing Folksonomies with
   Semantics”, J. Inf. Sci. 36, 57-72, (2010)
5. Bojārs U., Breslin, J.G., Finn, A., Decker, S.: Using the Semantic Web for Linking and
   Reusing Data across Web 2.0 Communities. Web Semantics: Science, Services and Agents
   on the World Wide Web, 6, 21-28 (2008)
6. Bojārs U., Breslin, J.G., Peristeras, V., Tummarello, G., Decker, S.: Interlinking the Social
   Web with Semantics. IEEE Intelligent Systems, 23, 29-40 (2008)
7. LOM working draft v4.1 (2000) http://ltsc.ieee.org/doc/wg12/LOMv4.1.htm
8. Rehak, D.R., Mason, R.: Keeping the Learning in Learning Objects”, in Littlejohn, A. (ed):
   Reusing Online Resources. pp. 20-34, Routledge, London (2003)
9. Jonassen, D., Churchill, D.:Is There a Learning Orientation in Learning Objects?
   International Journal on E-learning, 3, 32-41 (2004)
10. mEducator Schema, http://purl.org/meducator/ns
11.Wolpers, M., Najjar, J., Verbert, K., & Duval, E.. Tracking Actual Usage: the Attention
   Metadata Approach. Educational, Technology & Society, 10, 106-121 (2007)
12.Kaldoudi, E., Dovrolis, N., Konstantinidis, S., Bamidis, P.D.: Social Networking for
   Learning Object Repurposing in Medical Education”, J. Inform Techn Healthcare, 7, 233–
   243 (2009)
13.Balasubramaniam, C., Poulton, T., Huwendiek, S.: Repurposing Existing Virtual Patients;
   an Anglo-German Case Study. Bio-Algorithms and Med-Systems, 5, 91-98 (2009)
14.Zaka, B., Kulathuramaiye, N., Balke, W.-T., Maurer, H.: Topic-Centered Aggregation of
   Presentations for Learning Object Repurposing. In Proc. World Conference on E-Learning
   in Corporate, Government, Healthcare, & Higher Education (E-Learn), Las Vegas (2008)
15.Jovanović, J., Gašević, D., Verbert, K., Duval, E.: Ontology of Learning Object Content
   Structure. In Proc. 12th International Conference on Artificial Intelligence in Education, pp.
   322–329. Amsterdam (2005)
16.Lenski, W., Wette-Roch, E.: The TRIAL-SOLUTION Approach to Document Re-use
   Principles and Realization. In Proc. of Workshop on. Electronic Media in Mathematics,
   Coimbra (2001)
17.Najjar, J., Klerkx, J., Vuorikari, R., Duval, E.: Finding Appropriate Learning Objects: an
   Empirical Evaluation. in Research and Advanced Technology for Digital Libraries, LNCS,
   vol. 3652, pp. 323-335. Springer Berlin, Heidelberg (2005)
18.Meyer, M., Bergstraesser, S., Zimmermann, B., Rensing, C., Steinmetz, R.: Modeling
   Modifications of Multimedia Learning Resources Using Ontology-Based Representations.
   In: Advances in Multimedia Modeling, LNCS vol. 4351, pp. 34-43 (2006)
19.Wang, C., Dickens, K., Davis, H., Wills, G.: Community Tools for Repurposing Learning
   Objects. In: Second European Conference on Technology Enhanced Learning, pp. 378-392,
   Crete, Greece (2007)
20.Bizer, C., Heath, T., Berners-Lee, T.: Linked Data – The Story so Far. International Journal
   on Semantic Web and Information Systems (IJSWIS), 5, 1-22 (2009)
21.LOD, http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
22.Noy N F, Shah N H, Whetzel P L, Dai, B., Dorf, M., Griffith, N., Jonquet, C., Rubin, D.L.,
   Storey, M.A. Chute, D.G., Musen, M.A.: BioPortal: ontologies and integrated data resources
   at the click of a mouse. Nucleic. Acids. Res. 37, W170-3 (2009)
23.Giordano, D., Faro, A., Maiorana, F., Pino, C., Spampinato, C.: Feeding Back Learning
   Resources Repurposing Patterns into the “Information Loop”: Opportunities and Challenge.
   In The Proceedings of ITAB2009: 9th International Conference on Information Technology
   and Applications in Biomedicine, Larnaca, Cyprus (2009)
24.Yu, H.Q., Dietze, S., Li, N., Pedrinaci, C., Taibi, D., Dovrolis, N., Stefanut T., Kaldoudi, E.,
   Dominque, J.: A Linked Data-driven & Service-oriented Architecture for Sharing
   Educational Resources. In the Proceedings of the Linked Learning 2011: 1st International
   Workshop on eLearning Approaches for the Linked Data Age, (2011) in press.
25.Pedrinaci, C., Liu, D., Maleshkova, M., Lambert, D., Kopecky, J., Domingue, J.: iServe: a
   Linked Services Publishing Platform. In the Workshop in Ontology Repositories and Editors
   for the Semantic Web at 7th Extended Semantic Web Conference (2010)
26.Dietze, S., Yu, H.Q., Pedrinaci, C., Liu, D. and Domingue, J. SmartLink: a Web-based
   editor and search environment for Linked Services, 8th Extended Semantic Web Conference
   (ESWC), Heraklion, Greece (2011)
27.Auer, S., Dietzold, S., Lehmann, J., Hellmann, S., Aumueller, D.: Triplify: Light-Weight
   Linked Data Publication from Relational Databases. In the Proceedings of the 18th
   International Conference on World Wide Web, pp. 621-630. ACM, NY (2009)
            Generating educational assessment items
         from Linked Open Data: the case of DBpedia

                                     Muriel Foulonneau

                                   Tudor Research Centre,
                                   29, av. John F. Kennedy
                             L-1855 Luxembourg, Luxembourg
                                 muriel.foulonneau@tudor.lu


       Abstract. This work uses Linked Open Data for the generation of educational
       assessment items. We describe the streamline to create variables and populate
       simple choice item models using the IMS-QTI standard. The generated items
       were then imported in an assessment platform. Five item models were tested.
       They allowed identifying the main challenges to improve the usability of
       Linked Data sources to support the generation of formative assessment items, in
       particular data quality issues and the identification of relevant sub-graphs for
       the generation of item variables.

       Keywords: Linked Data, open data, DBpedia, eLearning, e-assessment,
       formative assessment, assessment item generation, data quality, IMS-QTI


1 Introduction

   Assessment takes a very important role in education. Tests are created to evaluate
what students have learned in the class, to assess their level at the beginning of a
cycle, to enter a prestigious university, or even to obtain a degree. More and more
assessment is also praised for its contribution to the learning process through
formative assessment (i.e., assessment to learn, not to measure) and/or self-
assessment whereby the concept of a third party controlling the acquisition of
knowledge is totally taken out of the assessment process. The role of assessment in
the learning process has considerably widened. The New York Times even recently
published an article entitled “To Really Learn, Quit Studying and Take a Test” [1],
reporting on a study by Karpicke et al. [2] which suggests that tests are actually the
most efficient knowledge acquisition method.
   The development of e-assessment has been hampered by a number of obstacles, in
particular the time and effort necessary to create assessment items (i.e., test questions)
[3]. Therefore, automatic or semi-automatic item generation has gained attention over
the last years. Item generation consists in using an item model and creating
automatically or semi-automatically multiple items from that model.
   The Semantic Web can provide relevant resources for the generation of assessment
items because it includes models of factual knowledge and structured datasets for the
generation of item model variables. Moreover, it can provide links to relevant
learning resources, through the interlinking between different data sources.
   Using a heterogeneous factbase for supporting the learning process however raises
issues related for instance to the potential disparities of data quality. We implemented
a streamline to generate simple choice items from DBpedia. Our work aims at
identifying the potential difficulties and the feasibility of using Linked Open Data to
generate items for low stake assessment, in this case formative assessment.
   We present existing approaches to the creation of item variables, the construction
of the assessment item creation streamline, and the experimentation of the process to
generate five sets of items.


2 Existing work
    Item generation consists in creating multiple instances of items based on an item
model. The item model defines variables, i.e., the parts which change for each item
generated. There are different approaches to the generation of variables, depending on
the type of items under consideration.
    In order to fill item variables for mathematics or science, the creation of
computational models is the easiest solution. Other systems use natural language
processing (NLP) to generate for instance vocabulary questions and cloze questions
(fill in blanks) in language learning formative assessment exercises ([4], [5], [6]).
Karamanis et al. [7] also extract questions from medical texts.
    The generation of variables from structured datasets has been experimented in
particular in the domain of language learning. Lin et al. [8] and Brown et al. [9] for
instance generated vocabulary questions from the WordNet dataset, which is now
available as RDF data on the Semantic Web. Indeed, the semantic representation of
data can help extracting relevant variables. Sung et al. [10] use natural language
processing to extract semantic networks from a text and then generate English
comprehension items.
    Linnebank et al. [11] use a domain model as the basis for the generation of entire
items. This approach requires experts to elicit knowledge in specifically dedicated
models. However, the knowledge happens to already exist in many data sources (e.g.,
scientific datasets), contributed by many different experts who would probably never
gather in long modeling exercises. Those modeling exercises would have to be
repeated over time, as the knowledge of different disciplines evolves. Moreover, in
many domains, the classic curricula, for which models could potentially be developed
and maintained by authorities, are not suitable. This is the case of professional
knowledge for instance.
    Given the potential complexity of the models for generating item variables, Liu
[12] defines reusable components of the generation of items (including the heuristics
behind the creation of math variables for instance). Our work complements this
approach by including the connection to semantic datasets as sources of variables.
Existing approaches to item generation usually focus on language learning [13] or
mathematics and physics where variable can be created from formulae [14]. We aim
to define approaches applicable in a wider range of domains (e.g., history) by reusing
existing interlinked datasets.
3 Generating item variables from a SPARQL endpoint
An item model includes a stem, options, and potentially auxiliary information [15].
Only the stem (i.e., the question) is mandatory. Response options are provided in the
case of a multiple choice item. Auxiliary information can be a multimedia resource
for instance. In some cases, other parameters can be adapted, including the feedback
provided to candidates after they answer the item.


            Figure 1 - Semi-automatic item generation from semantic datasets
   In order to investigate the use of Linked Data as a source of assessment items, we
built a streamline to generate simple choice items from a SPARQL endpoint on the
Web. The item generation process is split in different steps detailed in this section.
Figure 1 shows the item model represented as an item template, the queries to extract
data from the Semantic Web, the generation of a set of potential variables as a
variable store, the organization of all the values of variables for each item in data
dictionaries, and the creation of items in QTI-XML format from the item template and
item data dictionaries. These steps are detailed in this section.


3.1 Creating an IMS QTI-XML template

In order to generate items which are portable to multiple platforms, it is necessary to
format them in IMS-QTI (IMS Question & Test Interoperability Specification)1. IMS-
QTI is the main standard used to represent assessment items [16]. It specifies
metadata (as a Learning Object Metadata profile), usage data (including psychometric
indicators), as well as the structure of items, tests, and tests sections. It allows
representing multimedia resources in a test. IMS-QTI has an XML serialization.


1
    http://www.imsglobal.org/question/
<choiceInteraction responseIdentifier="RESPONSE" shuffle="false" maxChoices="1">
            <prompt>What is the capital of {prompt}?</prompt>
            <simpleChoice identifier="{responseCode1}">{responseOption1}</simpleChoice>
            <simpleChoice identifier="{responseCode2}">{responseOption2}</simpleChoice>
            <simpleChoice identifier="{responseCode3}">{responseOption3}</simpleChoice>
 </choiceInteraction>


   Figure 2 - Extract of the QTI-XML template for a simple choice item

   No language exists for assessment item templates. We therefore used the syntax of
JSON templates for an XML-QTI file (Figure 2). All variables are represented with
the variable name in curly brackets. Unlike RDF and XML template languages, JSON
templates can define variables for an unstructured part of text in a structured
document. For instance, in Figure 2, the {prompt} variable is only defined in part of
the content of the <prompt> XML element. Therefore, the question itself can be
stored in the item model, only the relevant part of the question is represented as a
variable.


3.2 Collecting structured data from the Semantic Web

In order to generate values for the variables defined in the item template, data sources
from the Semantic Web are used. The Semantic Web contains data formatted as RDF.
Datasets can be interlinked in order to complement for instance the knowledge about
a given resource. They can be accessed through browsing, through data dumps, or
through a SPARQL interface made available by the data provider. For this
experiment, we used the DBpedia SPARQL query interface (Figure 3). The query
results only provide a variable store from which items can be generated. All the
response options are then extracted from the variable store (Figure 1).


                          SELECT ?country ?capital
                          WHERE {
                          ?c <http://dbpedia.org/property/commonName> ?country .
                          ?c <http://dbpedia.org/property/capital> ?capital
                          }
                          LIMIT 30


                Figure 3 - SPARQL query to generate capitals in Europe
   Linked data resources are represented by URIs. However, the display of variables
in an assessment item requires finding a suitable label for each concept. In the case
presented on Figure 3, the ?c variable represents the resource as identified by a URI.
The <http://dbpedia.org/property/commonName> property allows finding a suitable
label for the country. Since the range of the <http://dbpedia.org/property/capital>
property is a literal, it is not necessary to find a distinct label.
    The label is however not located in the same property in all datasets and for all
resources. In the example of Figure 3, we used the property
<http://dbpedia.org/property/commonName> which provides the capital names as
literals. However, other properties, such as <foaf:name> are used for the same
purpose. In any case, the items always need to be generated from a path in a semantic
graph rather than from a single triple. This makes Linked Data of particular relevance
since the datasets can complete each other.


3.3 Generating item distractors

The SPARQL queries aim to retrieve statements from which the stem variable and the
correct answer are extracted. However, a simple or multiple choice item also needs
distractors. Distractors are the incorrect answers presented as options in the items. In
the case of Figure 3, the query retrieves different capitals, from which the distractors
are randomly selected to generate an item. For instance, the capital of Bulgaria is
Sofia. Distractors can be Bucarest and Riga.


3.4 Creating a data dictionary from Linked Data

The application then stores all the variables for the generated items in data
dictionaries. Each item is therefore represented natively with this data dictionary. We
created data dictionaries as Java objects conceived for the storage of QTI data. We
also recorded the data as a JSON data dictionary.
In addition to the variables, the data dictionary includes provenance information, such
as the creation date and the data source.


3.5 Generating QTI Items

QTI-XML items are then generated from the variables stored in the data dictionary
and the item model formalized as a JSON template. We replaced all the variables
defined in the model by the content of the data dictionary. If the stem is a picture, this
can be included in the QTI-XML structure as an external link.


4 The DBpedia experiment
In order to validate this process, we experimented the generation of assessment items
for five single choice item models. We used DBpedia as the main source of variables.
The item models illustrate the different difficulties which can be encountered and help
assessing the usability of the Linked Data for the generation of item variables.
4.1 The generation of variables for five item models

Q1 - What is the capital of { Azerbaijan }?

    The first item model uses the query presented on Figure 3. This query uses the
http://dbpedia.org/property/ namespace, i.e., the Infobox dataset. This dataset
however is not built on top of a consistent ontology. It rather transforms the properties
used in Wikipedia infoboxes. Therefore, the quality of the data is a potential issue2.
    Out of 30 value pairs generated, 3 were not generated for a country (Neuenburg am
Rhein, Wain, and Offenburg). For those, the capital was represented by the same
literal as the country. Two distinct capitals were found for Swaziland (Mbabane, the
administrative capital and Lobamba, the royal and legislative capital). The Congo is
identified as a country, whereas it has been split into two distinct countries. Its capital
Leopoldville was since renamed Kinshasa. The capital of Sri Lanka is a URI, whereas
the range of the capital property is usually a de facto literal. Finally the capital of
Nicaragua is represented with display technical instructions “Managua right|20px”.
Overall, 7 value pairs out of 30 were deemed defective.

Q2 - Which country is represented by this flag ?


    SELECT ?flag ?country
    WHERE {
    ?c <http://xmlns.com/foaf/0.1/depiction> ?flag .
    ?c <http://dbpedia.org/property/commonName> ?country .
    ?c <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
    <http://dbpedia.org/class/yago/EuropeanCountries>
    }
    LIMIT 30


   Q2 uses the Infobox dataset to identify the label of the different countries.
However, the FOAF ontology also helps identifying the flag of the country and the
YAGO (Yet Another Great Ontology) [17] ontology ensures that only European
countries are selected. This excludes data which do not represent countries.
   Nevertheless, it is more difficult to find flags for non European countries, while
ensuring that only countries are selected. Indeed, in the YAGO ontology,
<http://dbpedia.org/class/yago/EuropeanCountries>           is     a     subclass     of
<http://dbpedia.org/class/yago/Country108544813>. But most European countries
are       not      retrieved      when        querying        the      dataset     with
<http://dbpedia.org/class/yago/Country108544813>. Indeed, the SPARQL endpoint
does not provide access to inferred triples. It is necessary to perform a set of queries
to retrieve relevant subclasses and use them for the generation of variables.
   Out of 30 items including pictures of flags used as stimuli, 6 URIs did not resolve
to a usable picture (HTTP 404 errors or encoding problem).


2
    http://wiki.dbpedia.org/Datasets
Q3 - Who succeeded to { Charles VII the Victorious } as ruler of France ?

SELECT DISTINCT ?kingHR ?successorHR
WHERE {
?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/class/yago/KingsOfFrance> .
?x <http://dbpedia.org/property/name> ?kingHR .
?x <http://dbpedia.org/ontology/successor> ?z .
?z <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/class/yago/KingsOfFrance> .
?z <http://dbpedia.org/property/name> ?successorHR
}
LIMIT 30


   Q3 uses the YAGO ontology to ensure that the resource retrieved is indeed a king
of France. Out of 30 results, one was incorrect (The three Musketeers). The query
generated duplicates because of the multiple labels associated to each king. The same
king was named for instance Louis IX, Saint Louis, Saint Louis IX. Whereas de-
duplication is a straight forward process in this case, the risk of inconsistent naming
patterns among options of the same item is more difficult to tackle. An item was
indeed generated with the following 3 options: Charles VII the Victorious, Charles 09
Of France, Louis VII. They all use a different naming pattern, with or without the
king’s nickname and with a different numbering pattern.

Q4 - What is the capital of { Argentina }? With feedback


SELECT ?countryHR ?capitalHR ?pictureCollection
WHERE {
?country <http://dbpedia.org/property/commonName> ?countryHR .
?country <http://dbpedia.org/property/capital> ?capitalHR .
?country                                              <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://dbpedia.org/class/yago/EuropeanCountries> .
?country <http://dbpedia.org/property/hasPhotoCollection> ?pictureCollection
}
LIMIT 30


   The above question is a variation of Q1. It adds a picture collection from a distinct
dataset in the response feedback. It uses the YAGO ontology to exclude countries
outside Europe and resources which are not countries. A feedback section is added.
When the candidate answers the item, he then receives a feedback if the platform
allows it. In the feedback, additional information or formative resources can be
suggested. Q4 uses the linkage of the DBpedia dataset with the Flickr wrapper
dataset. However the Flickr wrapper data source was unavailable when we performed
the experiment.
     Q5 - Which category does { Asthma } belong to?


SELECT DISTINCT ?diseaseName ?category
WHERE {
?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Disease> .
?x <http://dbpedia.org/property/meshname> ?diseaseName .
?x <http://purl.org/dc/terms/subject> ?y .
?y <http://www.w3.org/2004/02/skos/core#prefLabel> ?category
}
LIMIT 30


   Q5 aims to retrieve diseases and their categories. It uses SKOS and Dublin Core
properties. The Infobox dataset is only used to find labels. Labels from the MESH
vocabularies are even available. Nevertheless, the SKOS concepts are not related to a
specific SKOS scheme. Categories retrieved range from Skeletal disorders to
childhood. For instance, the correct answer to the question on Obesity is childhood.


4.2 The publication of items on the TAO platform

The TAO platform3 is an open source semantic platform for the creation and delivery
of assessment tests and items. It has been used in multiple assessment contexts,
including large scale assessment in the PIAAC and PISA surveys of the OECD,
diagnostic assessment and formative assessment.
We imported QTI items generated for the different item models in the platform, in
order to validate the overall Linked Data based item creation streamline. Figure 4
presents an item generated from Q1 (Figure 3) imported in the TAO platform.


                        Figure 4 - Item preview on the TAO platform


3
    http://www.tao.lu
5 Data analysis
   The experimentation of the streamline was therefore tested with SPARQL queries
which use various ontologies and which collect various types of variables. It raised
two types of issues for which future work should find relevant solutions: the quality
of the data and the relevance of particular statements for the creation of an assessment
item.


5.1 Data quality challenges

   In our experiment, the chance that an item will have a defective prompt or a
defective correct answer is equal to the number of defective variables used for the
item creation. Q1 uses the most challenging dataset in terms of data quality. 7 out of
30 questions had a defective prompt or a defective correct answer (23,33%).
   The chance that an item will have defective distractors is represented by the
following formula, where D is the total number of distractors, d(V) is the number of
defective variables and V is the total number of variables:


   We used 2 distractors. Among the items generated from Q1, 10 items had a
defective distractor (33,33%). Overall, 16 out of 30 items had neither a defective
prompt nor a defective correct answer nor a defective distractor (53,33%).
As a comparison, the items generated from unstructured content (text) that are
deemed usable without edit were measured between 3,5% and 5% by Mitkov et al.
[18] and between 12% and 21% by Karamanis et al. [7]. The difficulty of generating
items from structured sources should be lower. Although a manual selection is
necessary in any case, the mechanisms we have implemented can be improved.

The ontology
Q1 used properties from the Infobox dataset, which has no proper underlying
ontology. Q1 can therefore be improved by using ontologies provided by DBpedia, as
demonstrated by Q2 for which no distractor issue was identified. We present Q1 and
Q2 to illustrate this improvement but it should be noted that there is not always a
straight equivalent to the properties extracted from the Infobox dataset.
Q5 could be improved either if the dataset would be linked to a more structured
knowledge organization system (KOS) or through an algorithm which would verify
the nature of the literals provided as a result of the SPARQL query.

The labels
The choice of the label for each concept to be represented in an item is a challenge
when concepts are represented by multiple labels (Q4). The selection of labels and
their consistency can be ensured by defining representation patterns or by using
datasets with consistent labeling practices.
Inaccurate statements
Most statements provided for the experiment are not inaccurate in their original
context but they sometimes use properties which are not sufficiently precise for the
usage envisioned (e.g., administrative capital). In other cases, the context of validity
of the statement is missing (e.g., Leopoldville used to be the capital of a country
called Congo). The choice of DBpedia as a starting point can increase this risk in
comparison to domain specific data sources provided by scientific institutions for
instance. Nevertheless, the Semantic Web raises similar quality challenges as the ones
encountered in heterogeneous and distributed data sources [19]. Web 2.0 approaches,
as well as the automatic reprocessing of data can help improve the usability of the
Semantic Web statements. This requires setting up a traceability mechanism between
the RDF paths used for the generation of items and the items generated.

Data linkage
Data linkage clearly raises an issue because of the reliability of the mechanism on
different data sources. Q3 provided 6 problematic URIs out of 30 (i.e., 20%). Q4
generated items for which no URI from the linked data set was resolvable since the
whole Flickr wrapper data source was unavailable. This clearly makes the generated
items unusable. The creation of infrastructure components such as the SPARQL
Endpoint status for CKAN4 registered data sets5 can help provide solutions to this
quality issue over the longer run.

Missing inferences
Finally, the SPARQL endpoint does not provide access to inferred triples. Our
streamline does not tackle transitive closures on the data consumer side (e.g., through
repeated queries), as illustrated with Q3. Further consideration should be given to the
provision of data including inferred statements. Alternatively, full datasets could be
imported. Inferences could then be performed in order to support the item generation
process.


Different strategies can therefore be implemented to cope with data quality issues we
encountered. Data publishers can improve the usability of the data, for instance with
the implementation of an upper ontology in DBpedia. However, other data quality
issues require data consumers to improve their data collection strategy, for instance to
collect as much information as possible on the context of validity of the data,
whenever it is available.


5.2 Data selection

The experiment also showed that the Linked Data statements should be selected. The
suitability of an assessment item for a test delivered to a candidate or a group of
candidates is measured in particular through such information as the item difficulty.

4
    http://www.ckan.net
5
    http://labs.mondeca.com/sparqlEndpointsStatus/index.html
The difficulty can be assessed through a thorough calibration process in which the
item is given to beta candidates for extracting psychometric indicators. In low stake
assessment, however, the evaluation of the difficulty is often manual (candidate or
teacher evaluation) or implicit (the performance of previous candidates who took the
same item). In the item generation models we have used, each item has a different
construct (i.e., it assesses a different knowledge). In this case, the psychometric
variables are more difficult to predict [20]. A particular model is necessary to assess
the difficulty of items generated from Semantic Web sources. For instance, it is likely
that for a European audience, the capital of the Cook Islands will raise a higher rate of
failure than the capital of Belgium. There is no information in the datasets, which can
support the idea of a higher or lower difficulty. Moreover, the difficulty of the item
also depends on the distractors, which in this experiment were generated on a random
basis from a set of equivalent instances. As the generation of items from structured
Web data sources will become more elaborated, it will therefore be necessary to
design a model for predicting the difficulty of generated items.


6 Conclusion and future work
The present experimentation shows the process for generating assessment items
and/or assessment variables from Linked Data. The performance of the system in
comparison with other approaches shows its potential as a strategy for assessment
item generation. It is expected that data linkage can provide relevant content for
instance to propose formative resources to candidates who failed an item or to
illustrate a concept with a picture published as part of a distinct dataset.
The experimentation shows the quality issues related to the generation of items based
on such a resource as DBpedia. It should be noted that the measurements were made
with a question which raises particular quality issues. It can be easily improved as
shown with other questions. Nevertheless the Linked Data Cloud also contains
datasets published by scientific institutions, which may therefore raise less data
accuracy concerns. In addition, the usage model we are proposing is centered on low
stake assessment, for which we believe that the time saved makes it worthwhile
having to clean some of the data, while the overall process remains valuable.
    Nevertheless, additional work is necessary both on the data and on the assessment
items. The items created demonstrate the complexity of generating item variables for
simple assessment items. We aim to investigate the creation of more complex items
and the relevance of formative resources which can be included in the item as
feedback. Moreover, the Semantic Web can provide knowledge models from which
items could be generated. Our work is focused on semi-automatic item generation,
where users create item models, while the system aims to generate the variables.
Nevertheless, the generation of the items from a knowledge model as in [11] requires
that more complex knowledge is encoded in the data (e.g., what happens to water
when the temperature decreases). The type and nature of data published as Linked
Data need therefore to be further analyzed in order to support the development of
such models for the fully automated generation items based on knowledge models.
    We will focus our future work on the creation of an authoring interface for item
models with the use of data sources from the Semantic Web, on the assessment of
item quality, on the creation of different types of assessment items from Linked Data
sources, on the traceability of items created, including the path on the Semantic Web
datasets which were used to generate the item, and on the improvement of data
selection from semantic datasets.

Acknowledgments. This work was carried out in the scope of the iCase project on
computer-based assessment. It has benefited from the TAO semantic platform for e-
assessment (https://www.tao.lu/) which is jointly developed by the Tudor Research
Centre and the University of Luxembourg, with the support of the Fonds National de
la   Recherche      in   Luxembourg,     the     DIPF    (Bildungsforschung   und
Bildungsinformation), the Bundesministerium für Bildung und Forschung, the
Luxemburgish ministry of higher education and research, as well as OECD.


 References

    1.  Belluck, P. To Really Learn, Quit Studying and Take a Test. New York Times.
        January 20th, 2011
    2. Karpicke, J. D., & Blunt, J. R. Retrieval Practice Produces More Learning than
        Elaborative Studying with Concept Mapping. Science. (2011)
    3. Gilbert, L., Gale, V., Warburton, B., & Wills, G. Report on Summative E-Assessment
        Quality (REAQ). Joint Information Systems Committee, Southampton. (2008)
    4. Aldabe, I., Lopez de Lacalle, M., Maritxalar, M., Martinez, E., Uria, L. Arikiturri: an
        Automatic Question Generator Based on Corpora and NLP techniques, ser. Lecture
        Notes in computer science, vol. 4053, pp. 584–594. Springer, Heidelberg (2006)
    5. Lee, J. S. Y. Automatic correction of grammatical errors in non-native English text.
        PhD dissertation at The Massachussets Institute of Technology. (2009)
    6. Goto, T., Kojiri, T., Watanabe, T., Iwata, T., & Yamada, T. Automatic Generation
        System of Multiple-Choice Cloze Questions and its Evaluation. Knowledge
        Management & E-Learning: An International Journal (KM&EL), 2(3), 210. (2010)
    7. Karamanis, N., Ha, L. A., & Mitkov, R. Generating multiple-choice test items from
        medical text: a pilot study. In Proceedings of the Fourth International Natural
        Language Generation Conference, pp. 111–113. (2006)
    8. Lin, Y.C., Sung, L.C., and Chen, M.C. An Automatic Multiple-Choice Question
        Generation Scheme for English Adjective Understanding. Workshop on Modeling,
        Management and Generation of Problems/Questions in eLearning, the 15th
        International Conference on Computers in Education (ICCE 2007), pages 137-142.
        (2007)
    9. Brown, J. C., Frishkoff, G. A., & Eskenazi, M. Automatic question generation for
        vocabulary assessment. In Proceedings of the conference on Human Language
        Technology and Empirical Methods in Natural Language Processing (pp. 819–826).
        (2005)
    10. Sung, L.-C. Lin, Y.-C., Chen, M. C. The Design of Automatic Quiz Generation for
        Ubiquitous English E-Learning System. Technology Enhanced Learning Conference
        (TELearn 2007), pp. 161-168, Jhongli, Taiwan. (2007)
    11. Linnebank, F., Liem, J., and Bredeweg, B. Question generation and answering.
        DynaLearn, EC FP7 STREP project 231526, Deliverable D3.3. (2010)
    12. Liu, B. SARAC: A Framework for Automatic Item Generation. In 2009 Ninth IEEE
        International Conference on Advanced Learning Technologies (pp. 556-558).
    Presented at the 2009 Ninth IEEE International Conference on Advanced Learning
    Technologies (ICALT), Riga, Latvia. (2009)
13. Xu, Y., Seneff, S. Speech-Based Interactive Games for Language Learning: Reading,
    Translation, and Question-Answering. Computational Linguistics and Chinese
    Language Processing Vol. 14, No. 2, pp. 133-160. (2009)
14. Lai, H., Alves, C., & Gierl, M. J. Using automatic item generation to address item
    demands for CAT. In Proceedings of the 2009 GMAC Conference on Computerized
    Adaptive Testing. (2009)
15. Gierl, M.J., Zhou, J., Alves, C. Developing a Taxonomy of Item Model Types to
    Promote Assessment Engineering. Journal of Technology, Learning, and Assessment,
    7(2). (2008)
16. Sarre, S., Foulonneau, M. Reusability in e-assessment: Towards a multifaceted
    approach for managing metadata of e-assessment resources. Fifth International
    Conference on Internet and Web Applications and Services. (2010)
17. Suchanek, F. M., Kasneci, G., & Weikum, G. Yago: a core of semantic knowledge. In
    Proceedings of the 16th international conference on World Wide Web (pp. 697–706).
    (2007)
18. Mitkov, R., An Ha, L., & Karamanis, N. A computer-aided environment for
    generating multiple-choice test items. Natural Language Engineering, 12(02), 177–
    194. (2006)
19. Foulonneau, Muriel, Cole, Timothy W. Strategies for reprocessing aggregated
    metadata. European Conference on Digital Libraries. Lecture notes in computer
    science 3652 , 290-301 (2005)
20. Bejar, I. I., Lawless, R. R., Morley, M. E., Wagner, M. E., Bennett, R. E., &
    Revuelta, J. A feasibility study of on-the-fly item generation in adaptive testing.
    Educational Testing Service. (2002)
    A Mobile and Adaptive Language Learning
       Environment based on Linked Data

                    Davy Van Deursen1 , Igor Jacques2 ,
       Stefan De Wannemacker2 , Steven Torrelle1 , Wim Van Lancker1 ,
       Maribel Montero Perez2 , Erik Mannens1 , and Rik Van de Walle1
                            1
                               Ghent University - IBBT,
           Gaston Crommenlaan 8/201, B-9050 Ledeberg-Ghent, Belgium
                           firstname.lastname@ugent.be
 2
   ITEC - Interdisciplinary research on Technology, Education and Communication,
   K.U. Leuven Campus Kortrijk, Etienne Sabbelaan 53, B-8500 Kortrijk, Belgium
                   firstname.lastname@kuleuven-kortrijk.be


      Abstract. The possibilities within e-learning environments increased
      dramatically the last couple of years. They are more and more deployed
      on the Web, allow various types of tasks and fine-grained feedback, and
      they can make use of audiovisual material. On the other hand, we are
      confronted with an increasing heterogeneity in terms of end-user de-
      vices (smartphones, tablet PCs, etc.) that are able to render advanced
      Web-based applications and consume multimedia content. Therefore, the
      major contribution of this paper is an adaptive, Web-based e-learning
      environment that is able to provide rich, personalized e-learning expe-
      riences to a wide range of devices. We discuss the global architecture
      and data models, as well as how the integration with media delivery can
      be realized. Further, we give a detailed description of a reasoner, which
      is responsible for the adaptive selection of learning items, based on the
      usage environment and the user profile.

      Keywords: Adaptive, Language Learning, Mobile, Web-based


1   Introduction
The last years, the use of e-learning environments has increased spectacularly,
not only in formal educational settings, but also in working and private environ-
ments. At the same time, the possibilities within these e-learning environments
increased dramatically: learning environments for instance have become easier
and more pleasant to use, they allow various types of tasks and fine-grained feed-
back, and they can make use of audiovisual material. Moreover, while e-learning
environments were traditionally offered as applications on stand-alone comput-
ers, nowadays they are more and more being rendered over the Internet. It is
clear that these evolutions are related to technological evolutions, and the wide
availability of fast multimedia computers and internet access.
    Next to the fact that e-learning environments are more and more deployed
over the Web, we are confronted with an increasing heterogeneity in terms of
end-user devices that are able to connect to the Web and consume multimedia
content. Therefore, personal devices such as tablet PCs and smartphones could
be used as learning devices, next to traditional desktop and laptop devices. Also,
the role of personalization within e-learning environments has become more and
more important. Personalization can be applied both at the learning level (i.e.,
adjust learning sessions according to the learner’s capabilities) and at the envi-
ronmental level (i.e., adjust the rendering of the learning environment according
to the characteristics of the usage environment).
    The above described challenges are exactly the ones that are currently tack-
led in the IBBT MAPLE project (Mobile, Adaptive & Personalized Learning
Experience3 ), which aims to make adaptive mobile e-learning possible. There-
fore, in this paper, we present a Web-enabled e-learning environment that is
able to offer personalized learning sessions on any device, primarily focused on
language learning making optimal use of digital multimedia. In order to realize
such an environment, we need the following key components:

 – a common, machine-understandable data model that is independent of usage
   environments and is able to express both learning content and metadata
   about the learning content;
 – a logging framework that allows to capture the behaviour and performance
   of the learner on a detailed level;
 – a reasoner that is able to select learning items based on the learner’s capa-
   bilities and behaviour;
 – a media delivery platform taking into account usage environment character-
   istics and restrictions.

    In the remainder of this paper, we provide an overview of the architecture
of our adaptive e-learning platform. Further, we discuss the above described key
components in more detail. Finally, we discuss related work, draw a number of
conclusions, and discuss some future work.


2     MAPLE platform

In order to offer a highly adaptive e-learning platform that can also deal with
(mobile) multimedia delivery, we designed the architecture that is depicted in
Fig. 1. Two major parts can be distinguished: the e-learning platform and the
media delivery platform. The e-learning platform relies on two RDF stores, i.e.,
a store for learning exercises and a store for learner profiles. The learning items
store is filled through the learning item ingest service. More details regarding
the creation of learning items and the data model according to which they are
modeled are provided in Section 3. Further, the learner profile store is build up,
based on the learners’ actions and preferences (see Section 3.5). The reasoner
is responsible for selecting the most adequate exercise, based on the learner’s
profile and environment and the available learning items. Detailed information
3
    http://www.ibbt.be/en/projects/overview-projects/p/detail/maple-2
                            media delivery platform
                                                               media
                                                              delivery
                                            selection         channels
      media                                     &
       ingest      media store              packaging
      service                                            ADTE
                                                        service


      learning
    item ingest   learning item             reasoner
       service         DB

                                                              learning
                                                              endpoint
                  learner profile
                       DB


                               e-learning platform


                        Fig. 1. The MAPLE e-learning platform


regarding the reasoner is provided in Section 4. Finally, the learning endpoint is
the communication point between learner devices and the e-learning platform.
    The media delivery platform corresponds to NinSuna4 , which is a metadata-
driven media adaptation and delivery platform [25]. At its core, format-independent
modules for temporal selection and packaging of media content are present. Al-
most all existing media delivery channels are supported by NinSuna: RTSP,
RTMP, HTTP progressive download, and HTTP adaptive streaming. Moreover,
native support for Media Fragments 1.0 [24] is provided, which enables the de-
livery of media fragments (i.e., temporal or track fragments) in a standardized
way [15]. Finally, NinSuna comes with an Adaptation Decision-Taking Engine
(ADTE), which is able to 1) detect the capabilities of the device issuing the
request and 2) take a decision regarding which quality version of the requested
media resource is the most adequate for the detected device. A more detailed
description of the NinSuna platform can be found in [25].
    The presented e-learning platform exposes its data (i.e., learning content and
accompanying media resources) as linked data. More specifically, it follows the
guidelines regarding the publication of linked data5 : use dereferencable HTTP
URIs as names for things, provide useful information using the standards (RDF,
SPARQL), and include links to other URIs. Hence, within our platform, the
learning items and learner profiles are available through a SPARQL endpoint,
while the metadata of the media resources are published as RDF URIs. This
way, services such as the reasoner and the ADTE can rely on the linked data
and can start reasoning over it.
4
    http://ninsuna.elis.ugent.be
5
    http://www.w3.org/DesignIssues/LinkedData.html
      A typical e-learning scenario using this architecture is then as follows:
(1) the learner logs in into the Web-based e-learning application using its mobile
    device, which contacts the learning endpoint of the e-learning platform; the
    end point approaches the reasoner which provides a personalized overview
    of the available courses;
(2) based on the course selected by the learner, the reasoner selects an exercise
    from the learning item store, taking into account the learner profile and the
    available exercises within that course;
(3) when the selected exercise contains media content (audio, video, or images),
    the ADTE of NinSuna is contacted in order to select the media resource
    version that fits best for the current device;
(4) the learning endpoint renders the selected exercise in HTML and sends the
    response to the learner;
(5) when the learner is solving the selected exercises, his/her answers and his/her
    behaviour in terms of clicks and timing is logged and sent back to the e-
    learning platform;
(6) the received answers and behaviour information are used to update the
    learner’s profile.
   In the next sections, more detailed information regarding a number of com-
ponents in the architecture is provided.


3      Data Models and Instance Generation
A number of different data models need to be developed in order to structure
and define the content used on the e-learning platform. More specifically, we
need the following data models:
 – model for the learning items and their metadata (e.g., question, possible
   answers, difficulty level);
 – model for the learning domain;
 – model for the metadata of the media resources (e.g., bit rate);
 – model for the learner profile;
 – model for the logging.
    In the following subsections, we provide more information regarding these dif-
ferent models and how they are populated. Note that all ontologies are modelled
in OWL and published online.

3.1     Model for learning items and their metadata
The model for learning items consists of two ontologies: one for the learning items
themselves6 and one for their metadata7 . An example instance of a learning
6
    http://multimedialab.elis.ugent.be/organon/ontologies/maple/content
7
    http://multimedialab.elis.ugent.be/organon/ontologies/maple/llomp
     Listing 1.1. Representing a learning item and its metadata in RDF (in Turtle).
 1    @prefix mplc : < http :// multimedialab . elis . ugent . be / organon / ontologies /
           maple / content # >.
      @prefix llomp : < http :// multimedialab . elis . ugent . be / organon / ontologies /
           maple / llomp # >.
      @prefix xsd : < http :// www . w3 . org /2001/ XMLSchema # >.
      @prefix dc : < http :// purl . org / dc / terms / > .
 5
      < http :// ninsuna . elis . ugent . be / rdf / resource / maple / blcc_47363 >
         a llomp : Exercise ;
         dc : title "47363" ;
         mplc : exerciseType mplc : Multiple Choice ;
10       mplc : media < http :// ninsuna . elis . ugent . be / Media / Maple / FLAA2V0 # t =0 ,19 > ;
         mplc : task " What do Belgians eat , according to the reporter ?" @en ,
                 " Wat eten de Belgen volgens de reporter ?" @nl ;
         mplc : answerSpace " Les Belges mangent ..." ;
         mplc : input [
15          a mplc : Input ;
            mplc : answer [
               a mplc : Choice ;
               mplc : isCorrect " false "^^ xsd : boolean ;
               mplc : scoreCorrect "0"^^ xsd : int ;
20             mplc : scoreFalse "0"^^ xsd : int ;
               mplc : text " des frites " @fr .
            ] ;
            mplc : answer [
               a mplc : Choice ;
25             mplc : isCorrect " true "^^ xsd : boolean ;
               mplc : scoreCorrect "1"^^ xsd : int ;
               mplc : scoreFalse "0"^^ xsd : int ;
               mplc : text " de la glace " @fr .
            ] .
30       ] ;
         mplc : maxScore "1" ;
         mplc : minScore "0" ;

        llomp : educational [
35        a llomp : Educational ;
          llomp : difficulty llomp : medium ;
          llomp : level llomp : A2 ;
          llomp : l e a r n i n g C o m p o n e n t : l e a r n i n g C o m p o n e n t _ 4 4 8 5 4 .
        ] ;
40      llomp : lifeCycle : l if eC y cl e_ 47 3 63 .


item modelled according to our model is shown in Listing 1.1. We explain and
illustrate both ontologies based on this example. The model is heavily based
on the Learning Object Metadata (LOM, [2]). LOM specifies a conceptual data
scheme and the corresponding XML-binding for metadata of learning items.
We started from LOM and defined a number of extensions in order to provide
improved support for learning subject, feedback and scoring, as well as better
integration with media resources. Further, as mentioned before, we split our
model between learning items and their metadata.
    We describe not only the metadata of learning items, but also the exer-
cises themselves. This way, they are formally represented, independent of any
rendering. Moreover, they can be easily integrated with their metadata and cor-
responding media resources. Also, the reasoner (Section 4) will not only rely on
the learning item metadata, but also on the items themselves (e.g., this type of
exercise is preferred by the learner). For the moment, six mplc:exerciseTypes
are supported (focussed on language learning):

 – Multiple Choice: given a number of answers, the learner has to choose exactly
   one answer;
 – Multiple Response: given a number of answers, the learner has to choose one
   or more answers;
 – Fill Gaps: given a text with some gaps, the learner needs to fill in missing
   text in text boxes;
 – Dropdown: same as Fill Gaps, but instead of free text fields, the learner can
   choose between a number of predefined answers;
 – Click on Text: given a text, the learner needs to click/tab on one or more
   words;
 – Click on Zone: given an image or video, the learner needs to click/tab at one
   or more regions within the image or video.

Note that media elements can also occur within the first five types of exercises.
For instance, a movie can be played followed by the question to solve. Only the
last type (Click on Zone) uses multimedia in an interactive way as described
in [19].
    In Listing 1.1, a multiple choice exercise is used as example (line 9). A link
to a movie fragment is provided via the mplc:media property (line 10), which
takes as value a Media Fragment URI (see Section 3.3). The mplc:task descrip-
tion (line 11) provides the question or task in multiple languages (based on the
level of the learner, the reasoner can choose if the language is presented in the
native language of the learner or not). Further, the mplc:answerSpace (line 13)
corresponds to the zone where the learner can enter its answers. Within such
an answer space, mplc:input is provided (line 14), where each mplc:answer
corresponds to one possible answer. In case of a multiple choice type, each an-
swer corresponds to a mplc:Choice. It contains information such as ‘is this
possible answer the correct one?’, ‘how much does the learner score when (s)he
selects this one?’, and the possible answer itself. LOM-specific elements such
as llomp:lifeCycle (line 40) and llomp:educational (line 34) are present as
well.
    As a part of the aforementioned LOM extensions, we added the learning
component property to the educational component. Since the MAPLE project
focusses on language learning, we extended this learning component property
with specific support for language learning. The learning component is split up
into three separate subcomponents: target language, theme and language compo-
nent. The latter component can have one or more of the following subproperties:

 – knowledge property: vocabulary, pronunciation, etc.;
 – skill property: reading, listening, writing or speaking

We also defined a hierarchical structure for the range of the knowledge property
based on which the exact knowledge URIs can be deduced. This was done in
a language-independent way extendable with language-specific elements. As the
           Listing 1.2. Representing a learning component in RDF (in Turtle).
 1    @prefix lang : < http :// kuleuven - kortrijk . be / itec / ext / ontologies /
           i t e c _ e l e a r n i n g _ o n t o l o g y / l a n g u a g e c o m p o n e n t /# >.
      @prefix llomp : < http :// multimedialab . elis . ugent . be / organon / ontologies /
           maple / llomp # >.

      < http :// ninsuna . elis . ugent . be / rdf / resource / maple / learningComponent_40001 >
 5       a llomp : L e a r n i n g C o m p o n e n t ;
         llomp : theme " agriculture " ;
         llomp : targe tLangua ge " en - UK " ;
         llomp : l a n g u a g e C o m p o n e n t [
           a llomp : L a n g u a g e C o m p o n e n t ;
10         llomp : knowledge < http :// kuleuven - kortrijk . be / itec / ext / ontologies /
                  i t e c _ e l e a r n i n g _ o n t o l o g y / l a n g u a g e c o m p o n e n t / grammar / partsOfSpeech /
                  substantive > ;
           llomp : skill lang : writing .
         ] .


skill and knowledge property exists next to each other, it is possible to specify
the subject of an exercise very accurately. In Listing 1.2 an example instance
of a learning component can be found. The exercise in this instance trains the
writing skill of substantives related to agriculture.
    Within the MAPLE project, we use learning items from Televic Education
(TEDU)8 . Currently, TEDU stores their learning items and accompanying meta-
data in a SQL store. Through XML feeds, the store can be accessed from outside.
Hence, we implemented a converter taking as input the XML feeds and produc-
ing RDF learning items according to the above described model.

3.2      Model for the learning domain
The learning items are not physically arranged into courses. Which learning ob-
jects belong together is determined by the metadata, namely the learning compo-
nent within the educational component of each item. The domain model consists
of two type of relations: prerequisite and hierarchical relations. In the project,
the domain model is supposed to be simple. It is a three level hierarchical model
in which the items are first distinguished by their target language, secondly by
their theme, and thirdly by their language component. Additionally, there exist
prerequisite requirements between the language components, expressing one lan-
guage component depends on the knowledge of another. The reasoner will take
into account these prerequisites when determining what courses are available for
the learner.

3.3      Model for media metadata
To model media resources, we rely on the W3C Media Annotations ontology [11],
which is supposed to foster the interoperability among various kinds of metadata
formats currently used to describe media resources on the Web. Moreover, it
8
     http://www.televic-education.com/en/
               Listing 1.3. Representing a learner profile in RDF (in Turtle).
 1    @prefix itec : < http :// kuleuven - kortrijk . be / itec / ext / ontologies /
           i t e c _ e l e a r n i n g _ o n t o l o g y # >.
      @prefix foaf : < http :// xmlns . com / foaf /0.1/ >.
      @prefix mplc : < http :// multimedialab . elis . ugent . be / organon / ontologies /
           maple / content # >.

 5    < http :// kuleuven - kortrijk . be / itec / ext / ontologies / i t e c _ e l e a r n i n g _ o n t o l o g y /
            maple / learners # blcc_piet_lambrecht >
         foaf : nick " piet_ lambrec ht " ;
         foaf : firstName " Piet " ;
         foaf : lastName " Lambrecht " ;
         itec : ha sProfic iency [
10         a itec : Proficiency ;
           itec : h a s L e a r n i n g S u b j e c t : l e a r n i n g C o m p o n e n t _ 4 7 5 8 4 ;
           itec : h a s S c o r e d E v a l u a t i o n [
              a itec : S c o r e dE v a l u a t i o n ;
              itec : score "3.2"^^ xsd : float ;
15            itec : scoreVariance "1.1"^^ xsd : float ;
              itec : scoreScale itec : d e f a u l t E u r o p e a n L a n g u a g e L e v e l S c a l e .
           ] .
         ] ;
         itec : ha sL e ar ni n gG oa l [
20         a itec : S c o r e d E v a l u a t i o n L e a r n i n g G o a l ;
           itec : h a s S c o r e d E v a l u a t i o n [
              a itec : S c o r e dE v a l u a t i o n ;
              itec : score "4"^^ xsd : float ;
              itec : scoreScale itec : d e f a u l t E u r o p e a n L a n g u a g e L e v e l S c a l e .
25         ] ;
           itec : h a s L e a r n i n g S u b j e c t : l e a r n i n g C o m p o n e n t _ 4 7 5 8 4 .
         ] ;
         itec : p r e f e r r e d E x e r c i s e T y p e mplc : DropDown .


already contains mappings to many other existing metadata formats. Further,
the ontology also provides support for Media Fragment URIs.


3.4     Model for the learner profile

In order to steer the decision making of the reasoner, an up-to-date learner
profile is required for each of the learners in the learning system. This profile
holds proficiency score estimations for each of the appropriate learning subjects.
Each of these values is supplemented with a reliability parameter, namely the
variance of the estimator. As we focus on language learning, the proficiency scores
are expressed on a continuous scale based on the discrete European Language
Levels [4]. The level of A1 conforms to a score of 0, A2 to 1, B1 to 2, etc. Also,
the profile keeps a list of the learning goals which were set for that learner. An
example of such a learning goal could be “Achieve the B2 level for the French
verb form imparfait”. The type of learning items the learner prefers can also be
saved in the profile. An example instance can be found in Listing 1.3.
    The properties in the model will be caught either automatically either by
means of preference setting. The learner’s favourite learning item types can be
edited through a preference menu and the learner’s proficiency scores will be
updated by a module of the reasoner. Additionally, the ontological model sup-
            Listing 1.4. Representing a logging abstract in RDF (in Turtle).
 1    @prefix itec : < http :// kuleuven - kortrijk . be / itec / ext / ontologies /
           i t e c _ e l e a r n i n g _ o n t o l o g y # >.
      @prefix learners : < http :// kuleuven - kortrijk . be / itec / instances / maple /
           learners # >.
      @prefix log : < http :// kuleuven - kortrijk . be / itec / instances / maple / logging # >.
      @prefix maple : < http :// ninsuna . elis . ugent . be / rdf / resource / maple / >.
 5
      log : l e a r n e r S e s s i o n 1 2 4 5 2
        a itec : Learn erSessio n ;
        itec : ha sS e ss io n St ar t "2010 -10 -26 T21 :32:52.126"^^ xsd : dateTime ;
        itec : ha sSessio nStop "2010 -10 -26 T21 :38:52.526"^^ xsd : dateTime ;
10      itec : hasLearner learners : b l c c _ p i e t _ l a m b r e c h t ;
        itec : hasSubSession [
            a itec : L ea r ni ng Se s si on ;
            itec : ha sS e ss io nS t ar t "2010 -10 -26 T21 :32:52.229"^^ xsd : dateTime ;
            itec : ha sSession Stop "2010 -10 -26 T21 :38:52.501"^^ xsd : dateTime ;
15          itec : h a s I t e m O b j e c t S e s s i o n [
                a itec : I t e m O b j e c t S e s s i o n ;
                itec : hasItemObject maple : blcc_47363 ;
                itec : ha sS e ss io nS t ar t "2010 -10 -26 T21 :32:56.233"^^ xsd : dateTime ;
                itec : ha sSession Stop "2010 -10 -26 T21 :32:59.999"^^ xsd : dateTime ;
20              itec : h a s A n s w e r S u b m i t t e d E v e n t [
                    itec : ha sInputOb ject maple : i n p u t O b j e c t _ 5 7 4 9 5 ;
                    itec : ha sGivenAn swer maple : answer_57495 ;
                    itec : dateTime "2010 -10 -26 T21 :32:59.526"^^ xsd : dateTime .
                ] .
25          ] .
        ] .


ports properties like motivation, learning style, learner strategy, and cognitive
ability’s, but currently these are not used in the MAPLE e-learning platform.


3.5     Model for logging the learner’s activity

Finally, we developed a model for describing logging information. For instance,
the model is able to express information such as the start and stop of a learner
session or the learner’s course selection. Once the learner has chosen a course, a
learning session is initiated in which the reasoner successively selects a new learn-
ing item, each time resulting in a learning item session which lasts for the time
the learner interacts with the item. During such an item session a learner can give
an answer, request a hint, or change his mind by changing his answer. All these
interactions are logged by the system. This results in a huge amount of informa-
tion which is consumed in two ways. Firstly, a part of the logging information is
used at run-time by the reasoner. For instance, a score attained by the learner
will affect the proficiency score of a learner’s profile through the functionality of
the reasoner’s proficiency manager. Secondly, after runtime, the logged informa-
tion will be used as input for statistical research tracing how certain interactions
of the learner give information about the learning process. In Listing 1.4 an ex-
ample instance can be found. The learner and learning session, and the session
of the item are respectively interconnected by the itec:hasSubSession and the
itec:hasItemObjectSession relation.
    These resulting triples are partially generated in the core of the reasoner,
e.g. the start en stop of the learner and the learning sessions. The low level
interactions concerning one specific exercise are generated at the client and sent
back to the reasoner which processes the logging and stores it in the learner
profile RDF store.


4   Adaptive Learning Item Selection

The reasoner, introduced in Section 2, is a crucial component within the MAPLE
learning system architecture as it is responsible for the adaptive learning item
selection. If a learner logs in, the reasoner will first of all provide a short list
of courses from which the learner can choose. As the reasoner is aware of the
learning goals for each learner through the learner profile model, only courses
that contribute to the not yet attained learning goals can be selected. Next, once
the learner has chosen a course, the reasoner will start up a learning session and
will successively decide on the exact exercise to deliver to the learner.
    The reasoner takes into account the learner profile as well as some real time
environmental properties. For the environmental adaptivity, both the screen ca-
pacity and connection quality of the user’s device are sources of adaptivity. In
case the screen size is too small, the reasoner will avoid the use of exercises with
media. A slow network connection will also result in avoiding media exercises.
For the learner profile adaptivity, there are two main policies which can steer
the decision process. The first one is based on a theory stating that the exer-
cise difficulty needs to be increased each time a learner has answered a series of
four exercises correctly. Similarly, when four consecutive exercises are answered
incorrectly, it should go down [12]. The second policy is based on a pedagogical
theory which tries to keep the learner’s motivation high by chasing a predefined
(e.g. 70 %) correct-answer probability. This probability can be estimated based
on the IRT theory ([5]) by combining the current proficiency estimation with the
level and difficulty of the exercise [28, 7]. The aforementioned policies are supple-
mented with an event-driven feedback system. The system allows the sequencer
to shift in a feedback item (instead of an exercise) to explain a learning subject
once a specific and predefined condition is met. For instance, “the learner made
five errors in a row against the same learning subject”. This feedback item is
chosen based on the learning component property which both the feedback and
the exercise item have in their metadata. For both policies, also the preferred
exercise types of the learner are taken into account by favouring them though
not completely cold-shouldering the other exercise types.
    To fulfil the aforementioned tasks, the architecture of the reasoner (shown
in Fig. 2) consists of six modules, supplemented by a facade for communicating
with the learning endpoint. The six reasoner modules are the Learner manager,
Environment manager, Learning task decision manager, Sequence manager, Log-
ging manager and Proficiency manager. We elucidate the functionality of these
modules by means of the following example.
                                      Environment
                                        manager


                                                    Learning task
                        Learner
                                                      decision
                        manager
 Learner profile                                      manager
      DB
                                                                             Learning
                                                                    Facade
                                                                             endpoint
 Learning item                         Sequence
      DB                               manager


                        Proficiency                   Logging
                         manager                     mananger


                                               Reasoner


                          Fig. 2. The reasoner architecture


    Suppose a learner’s initial profile was set by a teacher thereby providing the
learning goal “Achieve the B2 level for the French verb form imparfait” and also
providing an estimation for the learner’s initial level, namely A2, for “the French
verb form imparfait”. When the learner logs in, the Learner manager produces
a learner session. Consequently, the Learning task decision manager loads the
learner’s learning goals in order to compose a three-level tree representation of all
courses relevant for this learner, as explained in Section 3.2. This tree is sent to
the learning endpoint which produces a representation such that the learner can
navigate through the tree. Let us assume that the learner first selects ‘French’
followed by the theme ‘General’ and finally the language component ‘Imparfait’.
Besides, the learner opens the preferences menu and sets the dropdown exercise
type as his favourite one.
    Next, the Learning task decision manager composes a learning task object
which is send to the Sequence manager. Here the learning task is sequencing the
items (exercises and feedback) with the first policy of adaptivity, starting from
level A2, having as a stop criterion the achievement of the level B2, and taking
into account the learner’s preferred exercise types and environmental proper-
ties. Subsequently, the Sequence manager loads the sequencer necessary for the
learning task. To this end, the sequencer makes use of the Environment manager,
which is an access point for information on the current connection quality and
the screen size of the device of the learner. At this point, the sequencer can suc-
cessively decide on the id of the next item and passes its choice to the learning
endpoint, which automatically generates a visual representation and makes use
of the delivery platform in case media are present.
    Once the learner finishes the exercise or has read the feedback in case of a
feedback item, the logging information about the interactions of the learner with
the item are sent back to the Logging manager of the reasoner. The latter sends
this information as a specific logging object to a couple of observer objects which
all have different functionalities. For instance, there is an observer writing these
logs to the learner profile RDF store. Another observer warns the sequencer when
for example four exercises have been consecutively answered correctly and yet
another sends the learner’s score to the Proficiency manager together with the
level, difficulty and the learning subject of the answered exercise. The Proficiency
manager keeps the proficiency scores up to date. Prior to every decision of the
sequencer, the stop criterion is tested based on a proficiency that is retrieved from
the Proficiency manager. If this criterion is reached, the sequencer sequences a
special concluding feedback item announcing the end of the learning session to
the learner.


5    Related Work

The architecture of the reasoner builds further on existing proposals for generic
learning system architectures such as in [20]. These architectures however have
mostly been designed having an adaptive hypermedia learning system in mind.
Even though most systems currently developed are based on providing learner
control based on adaptive links, e.g. [3], our system is specialized in adaptive
curriculum sequencing, meaning that the learning objects are sequenced in an
automated way. To create an adaptive learning system the method of using on-
tologies has often been proposed in literature, e.g., in [23, 17, 8]. We partially rely
on existing ontologies and data models, and introduced new data models such as
a model for describing learning exercises and language-learning specific informa-
tion. The latter were all done in collaboration with educationalists. Additionally,
both the delivery platform and the reasoner take into account connection quality
and screen size either to choose the right video format either to avoid sending
any media to a device if they cannot be delivered in an optimal way. This way,
our system implements a part of the context-awareness which has been claimed
to be crucial in mobile learning [23, 27].
    The ontology for the learner profile is a compact non-exhaustive synopsis of
the most common learner characteristics found in literature [21, 13, 10] which
can be used in steering an adaptive learning system. For the preservation of
the learner’s knowledge we used what is classified as an overlay model in [13].
Until now, the IEEE Learning Object Model standard LOM is considered to be
the standard for many repositories storing thousands of learning objects with
metadata. There have been attempts to transform the LOM metadata model
into an RDF version (e.g., [18]). However, the model provided by LOM was not
sufficient. Hence, we adopted part of the LOM model (by relying on previous
LOM RDF efforts) and extended it with our own needs.
    Our realizations in this project largely replace the functionality of the restric-
tive SCORM standard [1]. SCORM, an abbreviation for Sharable Content Object
Reference Model, is a collection of specifications imposing a format for bundling
Web-based exercises into courses, thereby imposing LOM for the metadata, as
well as a data model for communicating learning scores between server and client.
The standard was updated in 2004, now supporting a limited set of instructions
for adaptive behavior. In practise however, the imposed syntax for adaptivity
had low expressivity but remaining very complicated [14]. Although in the past
SCORM had an important impact on the sharing of bundled learning courses on
the web and although many tried to improve the SCORM standard [16, 22, 29],
we think its starting point has become outdated. After all, we believe grouping
learning objects in a container format conflicts with the principle of the Semantic
Web of data in which objects are scattered over the web. Additionally, its exten-
sibility pointed out to be low [14, 6] and the data model for exchanging learning
results is limited to the exchange of a single score, thereby not fulfilling our
needs of more advanced reporting of a learner’s interactions with the exercises.
Our formalized representation model for recording scores and interactions with
exercises makes it possible to develop true interoperable exercises that are able
to report learning results in a universal way. Until now, the importance for adap-
tive learning systems having an extendible although universally understandable
learning result reporting system was largely ignored.
    Gang et al. proposed a framework for mobile learning in [9] that approaches
the challenges similarly as we did here. More specifically, a media delivery sys-
tem was developed, as well as an adaptive module for learning item selection.
However, they relied on MPEG-21 technology while we use the NinSuna plat-
form, which is based on MPEG-21 principles but proven to be more efficient
and generic [26]. Further, learning item selection is not based on educational
properties such as skills or experience, but solely on environmental properties.


6    Conclusions and Future Work

In order to exploit the possibilities of Web-based e-learning environments, we
proposed an e-learning architecture that is able to provide rich, personalized
e-learning experiences to a wide range of devices. We discussed the various data
models used within the e-learning framework. Moreover, we provided details of
the reasoner, a crucial component allowing to select learning items based on the
usage environment and the learner profile.
    Future work consists of exploiting the possibilities of the Semantic Web even
more by linking learning items to the Linked Open Data cloud. Further, data
models could be optimized and linked to upcoming efforts (e.g., how to represent
the life cycle of a learning item as provenance information on the Web). Also,
more detailed domain models should be investigated. Regarding the reasoner,
future work consists of taking into account more information obtained from the
logging framework, as well as investigating how error-specific feedback could be
generated (e.g., link frequently occurring errors to answers).
Acknowledgments
The research activities as described in this paper were funded by Ghent Uni-
versity, the Interdisciplinary Institute for Broadband Technology (IBBT, 50%
co-funded by industrial partners), the Institute for the Promotion of Innovation
by Science and Technology in Flanders (IWT), the Fund for Scientific Research-
Flanders (FWO-Flanders), and the European Union.


References
 1. Scorm 2004 4th edition version 1.1 overview, http://www.adlnet.gov/
    Technologies/scorm/SCORMSDocuments/2004%204th%20Edition/Overview.aspx
 2. Standard for learning object metadata, http://standards.ieee.org/findstds/
    standard/1484.12.1-2002.html
 3. Grapple, a generic responsive adaptive personalized learning environment. http:
    //www.grapple-project.org (Jun 2010)
 4. European language levels - self assessment grid (2011), http://europass.cedefop.
    europa.eu/LanguageSelfAssessmentGrid/en
 5. Baker, F.B., Kim, S.H. (eds.): Item Response Theory: Parameter Estimation Tech-
    niques, Second Edition (Statistics: A Series of Textbooks and Monographs). CRC
    Press, 2 edn. (July 2004)
 6. Bohl, O., Scheuhase, J., Sengler, R., Winand, U.: The sharable content object
    reference model (scorm) - a critical review. In: Computers in Education, 2002.
    Proceedings. International Conference on. pp. 950 – 951 vol.2 (2002)
 7. Chen, C.M., Lee, H.M., Chen, Y.H.: Personalized e-learning system using item
    response theory. Computers & Education 44(3), 237 – 255 (2005)
 8. Chi, Y.L.: Ontology-based curriculum content sequencing system with semantic
    rules. Expert Syst. Appl. 36, 7838–7847 (May 2009)
 9. Gang, Z., Zongkai, Y.: Learning Resource Adaptation and Delivery Framework
    for Mobile Learning. In: Frontiers in Education, 2005. FIE ’05. Proceedings 35th
    Annual Conference (October 2005)
10. Jia, B., Zhong, S., Wang, W., Yang, B.: The construction and evolution of learner
    model in adaptive learning system. Computer Technology and Development, In-
    ternational Conference on 1, 148–152 (2009)
11. Lee, W., Bürger, T., Sasaki, F., Malaisé, V., Stegmaier, F., Söderberg, J. (eds.):
    Ontology for Media Resource 1.0. W3C Working Draft, World Wide Web Consor-
    tium (June 2010)
12. Leutner, D.: Instructional design principles for adaptivity in open learning envi-
    ronments. Curriculum, Plans, and Processes in Instructional Design: International
    Perspectives pp. 289–307 (2004)
13. Loc Nguyen, P.D.: Learner model in adaptive learning. World Academy of Science,
    Engineering and Technology 45 (2008)
14. Mackenzie, G.: Scorm 2004 primer, a (mostly) painless introduction to
    scorm. Tech. rep. (2004), http://www.pro-ductivity.com/Compliance21CFR/
    CTMW/scormintro.pdf
15. Mannens, E., Van Deursen, D., Troncy, R., Pfeiffer, S., Parker, C., Lafon, Y.,
    Jansen, J., Hausenblas, M., Van de Walle, R.: A URI-Based Approach for Ad-
    dressing Fragments of Media Resources on the Web. To appear in Multimedia
    Tools and Applications – Special Issue on Multimedia Data Semantics
16. Melia, M., Barrett, R., Pahl, C.: A model-based approach to scorm sequencing. In:
    In Proceeding of the Sixth Annual Irish Educational Technology Users Conference
    (EdTech06) - Research Track. ILTA (2006)
17. Min, W.X., Wei, C., Lei, C.: Research of ontology-based adaptive learning system.
    In: Proc. Int. Symp. Computational Intelligence and Design ISCID ’08. vol. 2, pp.
    366–370 (2008)
18. Nilsson, M., Palmer, M., Brase, J.: The LOM RDF binding - principles and imple-
    mentation. In: Proceedings of 3rd Annual Ariadne Conference (2003)
19. Perez, M.M., Cornillie, F., Senecaut, M.P., De Wannemacker, S., Desmet, P.: Mo-
    bile vocabulary learning: activities allowing for interaction through input enhance-
    ment. In: De Wannemacker, S., Clarebout, G., De Causmaecker, P. (eds.) Interdis-
    ciplinary Approaches to Adaptive Learning: A Look at the Neighbours,. pp. 15–27.
    Springer-Verlag (2011)
20. Rani, S., Ashok, M., Palanivel, K.: Adaptive content for personalized e-learning
    using web service and semantic web. pp. 1 –4 (jul 2009)
21. Roll, I., Baker, R.S., Aleven, V., Koedinger, K.R.: A metacognitive act-r model of
    students’ learning strategies in intelligent tutoring systems. In: Intelligent Tutoring
    Systems. pp. 854–856 (2004)
22. Silva, L., Mustaro, P.N., Stringhini, D., Silveira, I.F.: Using conceptual lattices
    to represent fine granular learning objects through scorm meta-objects. The Elec-
    tronic Journal of e-Learning 4, 141–148 (2006)
23. Soylu, A., Vandewaetere, M., Wauters, K., Jacques, I., De Causmaecker, P.,
    Desmet, P., Clarebout, G., Van Den Noortgate, W.: Ontology-driven Adaptive
    and Pervasive Learning Environments - APLEs: An Interdisciplinary Approach.
    In: De Wannemacker, S., Clarebout, G., De Causmaecker, P. (eds.) Interdisci-
    plinary Approaches to Adaptive Learning: A Look at the Neighbours,. pp. 99–115.
    Springer-Verlag (2011)
24. Troncy, R., Mannens, E., Pfeiffer, S., Van Deursen, D. (eds.): Media Fragments
    URI 1.0. W3C Working Draft, World Wide Web Consortium (January 2011)
25. Van Deursen, D., Van Lancker, W., De Neve, W., Paridaens, T., Mannens, E.,
    Van de Walle, R.: NinSuna: a Fully Integrated Platform for Format-independent
    Multimedia Content Adaptation and Delivery based on Semantic Web Technolo-
    gies. Multimedia Tools and Applications – Special Issue on Data Semantics for
    Multimedia Systems 46(2-3), 371–398 (January 2010)
26. Van Deursen, D., Van Lancker, W., De Bruyne, S., De Neve, W., Mannens,
    E., Van de Walle, R.: Format-independent and Metadata-driven Media Resource
    Adaptation using Semantic Web Technologies. Multimedia Systems 16(2), 85–104
    (2010)
27. Wang, Y.K.: Context awareness and adaptation in mobile learning. In: Wireless and
    Mobile Technologies in Education, 2004. Proceedings. The 2nd IEEE International
    Workshop on. pp. 154 – 158 (2004)
28. Wauters, K., van den Noortgate, W., Desmet, P.: The use of irt for adaptive item
    selection in item-based learning environments. In: Proceeding of the 2009 confer-
    ence on Artificial Intelligence in Education. pp. 785–786. IOS Press, Amsterdam,
    The Netherlands, The Netherlands (2009)
29. Zhu, X.H., Yan, R.Q., Wu, L.P., Zhang, L.F.: An interoperable learning sequencing
    model based on the flow control object. Computer Science and Software Engineer-
    ing, International Conference on 5, 1217–1220 (2008)