<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Generating educational assessment items from Linked Open Data: the case of DBpedia</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Muriel Foulonneau</string-name>
          <email>muriel.foulonneau@tudor.lu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Tudor Research Centre</institution>
          ,
          <addr-line>29, av. John F. Kennedy L-1855 Luxembourg</addr-line>
          ,
          <country country="LU">Luxembourg</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This work uses Linked Open Data for the generation of educational assessment items. We describe the streamline to create variables and populate simple choice item models using the IMS-QTI standard. The generated items were then imported in an assessment platform. Five item models were tested. They allowed identifying the main challenges to improve the usability of Linked Data sources to support the generation of formative assessment items, in particular data quality issues and the identification of relevant sub-graphs for the generation of item variables.</p>
      </abstract>
      <kwd-group>
        <kwd>Linked Data</kwd>
        <kwd>open data</kwd>
        <kwd>DBpedia</kwd>
        <kwd>eLearning</kwd>
        <kwd>e-assessment</kwd>
        <kwd>formative assessment</kwd>
        <kwd>assessment item generation</kwd>
        <kwd>data quality</kwd>
        <kwd>IMS-QTI</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Assessment takes a very important role in education. Tests are created to evaluate
what students have learned in the class, to assess their level at the beginning of a
cycle, to enter a prestigious university, or even to obtain a degree. More and more
assessment is also praised for its contribution to the learning process through
formative assessment (i.e., assessment to learn, not to measure) and/or
selfassessment whereby the concept of a third party controlling the acquisition of
knowledge is totally taken out of the assessment process. The role of assessment in
the learning process has considerably widened. The New York Times even recently
published an article entitled “To Really Learn, Quit Studying and Take a Test” [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
reporting on a study by Karpicke et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] which suggests that tests are actually the
most efficient knowledge acquisition method.
      </p>
      <p>
        The development of e-assessment has been hampered by a number of obstacles, in
particular the time and effort necessary to create assessment items (i.e., test questions)
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Therefore, automatic or semi-automatic item generation has gained attention over
the last years. Item generation consists in using an item model and creating
automatically or semi-automatically multiple items from that model.
      </p>
      <p>The Semantic Web can provide relevant resources for the generation of assessment
items because it includes models of factual knowledge and structured datasets for the
generation of item model variables. Moreover, it can provide links to relevant
learning resources, through the interlinking between different data sources.</p>
      <p>Using a heterogeneous factbase for supporting the learning process however raises
issues related for instance to the potential disparities of data quality. We implemented
a streamline to generate simple choice items from DBpedia. Our work aims at
identifying the potential difficulties and the feasibility of using Linked Open Data to
generate items for low stake assessment, in this case formative assessment.</p>
      <p>We present existing approaches to the creation of item variables, the construction
of the assessment item creation streamline, and the experimentation of the process to
generate five sets of items.</p>
    </sec>
    <sec id="sec-2">
      <title>2 Existing work</title>
      <p>Item generation consists in creating multiple instances of items based on an item
model. The item model defines variables, i.e., the parts which change for each item
generated. There are different approaches to the generation of variables, depending on
the type of items under consideration.</p>
      <p>
        In order to fill item variables for mathematics or science, the creation of
computational models is the easiest solution. Other systems use natural language
processing (NLP) to generate for instance vocabulary questions and cloze questions
(fill in blanks) in language learning formative assessment exercises ([
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]).
Karamanis et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] also extract questions from medical texts.
      </p>
      <p>
        The generation of variables from structured datasets has been experimented in
particular in the domain of language learning. Lin et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and Brown et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] for
instance generated vocabulary questions from the WordNet dataset, which is now
available as RDF data on the Semantic Web. Indeed, the semantic representation of
data can help extracting relevant variables. Sung et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] use natural language
processing to extract semantic networks from a text and then generate English
comprehension items.
      </p>
      <p>
        Linnebank et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] use a domain model as the basis for the generation of entire
items. This approach requires experts to elicit knowledge in specifically dedicated
models. However, the knowledge happens to already exist in many data sources (e.g.,
scientific datasets), contributed by many different experts who would probably never
gather in long modeling exercises. Those modeling exercises would have to be
repeated over time, as the knowledge of different disciplines evolves. Moreover, in
many domains, the classic curricula, for which models could potentially be developed
and maintained by authorities, are not suitable. This is the case of professional
knowledge for instance.
      </p>
      <p>
        Given the potential complexity of the models for generating item variables, Liu
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] defines reusable components of the generation of items (including the heuristics
behind the creation of math variables for instance). Our work complements this
approach by including the connection to semantic datasets as sources of variables.
Existing approaches to item generation usually focus on language learning [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] or
mathematics and physics where variable can be created from formulae [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. We aim
to define approaches applicable in a wider range of domains (e.g., history) by reusing
existing interlinked datasets.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3 Generating item variables from a SPARQL endpoint</title>
      <p>
        An item model includes a stem, options, and potentially auxiliary information [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
Only the stem (i.e., the question) is mandatory. Response options are provided in the
case of a multiple choice item. Auxiliary information can be a multimedia resource
for instance. In some cases, other parameters can be adapted, including the feedback
provided to candidates after they answer the item.
In order to investigate the use of Linked Data as a source of assessment items, we
built a streamline to generate simple choice items from a SPARQL endpoint on the
Web. The item generation process is split in different steps detailed in this section.
Figure 1 shows the item model represented as an item template, the queries to extract
data from the Semantic Web, the generation of a set of potential variables as a
variable store, the organization of all the values of variables for each item in data
dictionaries, and the creation of items in QTI-XML format from the item template and
item data dictionaries. These steps are detailed in this section.
      </p>
      <sec id="sec-3-1">
        <title>3.1 Creating an IMS QTI-XML template</title>
        <p>
          In order to generate items which are portable to multiple platforms, it is necessary to
format them in IMS-QTI (IMS Question &amp; Test Interoperability Specification)1.
IMSQTI is the main standard used to represent assessment items [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. It specifies
metadata (as a Learning Object Metadata profile), usage data (including psychometric
indicators), as well as the structure of items, tests, and tests sections. It allows
representing multimedia resources in a test. IMS-QTI has an XML serialization.
1 http://www.imsglobal.org/question/
&lt;choiceInteraction responseIdentifier="RESPONSE" shuffle="false" maxChoices="1"&gt;
&lt;prompt&gt;What is the capital of {prompt}?&lt;/prompt&gt;
&lt;simpleChoice identifier="{responseCode1}"&gt;{responseOption1}&lt;/simpleChoice&gt;
&lt;simpleChoice identifier="{responseCode2}"&gt;{responseOption2}&lt;/simpleChoice&gt;
&lt;simpleChoice identifier="{responseCode3}"&gt;{responseOption3}&lt;/simpleChoice&gt;
&lt;/choiceInteraction&gt;
No language exists for assessment item templates. We therefore used the syntax of
JSON templates for an XML-QTI file (Figure 2). All variables are represented with
the variable name in curly brackets. Unlike RDF and XML template languages, JSON
templates can define variables for an unstructured part of text in a structured
document. For instance, in Figure 2, the {prompt} variable is only defined in part of
the content of the &lt;prompt&gt; XML element. Therefore, the question itself can be
stored in the item model, only the relevant part of the question is represented as a
variable.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Collecting structured data from the Semantic Web</title>
        <p>In order to generate values for the variables defined in the item template, data sources
from the Semantic Web are used. The Semantic Web contains data formatted as RDF.
Datasets can be interlinked in order to complement for instance the knowledge about
a given resource. They can be accessed through browsing, through data dumps, or
through a SPARQL interface made available by the data provider. For this
experiment, we used the DBpedia SPARQL query interface (Figure 3). The query
results only provide a variable store from which items can be generated. All the
response options are then extracted from the variable store (Figure 1).</p>
        <p>SELECT ?country ?capital
WHERE {
?c &lt;http://dbpedia.org/property/commonName&gt; ?country .
?c &lt;http://dbpedia.org/property/capital&gt; ?capital
}</p>
        <p>LIMIT 30</p>
        <p>Linked data resources are represented by URIs. However, the display of variables
in an assessment item requires finding a suitable label for each concept. In the case
presented on Figure 3, the ?c variable represents the resource as identified by a URI.</p>
        <sec id="sec-3-2-1">
          <title>The &lt;http://dbpedia.org/property/commonName&gt; property allows finding a suitable</title>
          <p>label for the country. Since the range of the &lt;http://dbpedia.org/property/capital&gt;
property is a literal, it is not necessary to find a distinct label.</p>
          <p>The label is however not located in the same property in all datasets and for all
resources. In the example of Figure 3, we used the property
&lt;http://dbpedia.org/property/commonName&gt; which provides the capital names as
literals. However, other properties, such as &lt;foaf:name&gt; are used for the same
purpose. In any case, the items always need to be generated from a path in a semantic
graph rather than from a single triple. This makes Linked Data of particular relevance
since the datasets can complete each other.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3 Generating item distractors</title>
        <p>The SPARQL queries aim to retrieve statements from which the stem variable and the
correct answer are extracted. However, a simple or multiple choice item also needs
distractors. Distractors are the incorrect answers presented as options in the items. In
the case of Figure 3, the query retrieves different capitals, from which the distractors
are randomly selected to generate an item. For instance, the capital of Bulgaria is
Sofia. Distractors can be Bucarest and Riga.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4 Creating a data dictionary from Linked Data</title>
        <p>The application then stores all the variables for the generated items in data
dictionaries. Each item is therefore represented natively with this data dictionary. We
created data dictionaries as Java objects conceived for the storage of QTI data. We
also recorded the data as a JSON data dictionary.</p>
        <p>In addition to the variables, the data dictionary includes provenance information, such
as the creation date and the data source.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5 Generating QTI Items</title>
        <p>QTI-XML items are then generated from the variables stored in the data dictionary
and the item model formalized as a JSON template. We replaced all the variables
defined in the model by the content of the data dictionary. If the stem is a picture, this
can be included in the QTI-XML structure as an external link.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4 The DBpedia experiment</title>
      <p>In order to validate this process, we experimented the generation of assessment items
for five single choice item models. We used DBpedia as the main source of variables.
The item models illustrate the different difficulties which can be encountered and help
assessing the usability of the Linked Data for the generation of item variables.</p>
      <sec id="sec-4-1">
        <title>4.1 The generation of variables for five item models</title>
        <sec id="sec-4-1-1">
          <title>Q1 - What is the capital of { Azerbaijan }?</title>
          <p>The first item model uses the query presented on Figure 3. This query uses the
http://dbpedia.org/property/ namespace, i.e., the Infobox dataset. This dataset
however is not built on top of a consistent ontology. It rather transforms the properties
used in Wikipedia infoboxes. Therefore, the quality of the data is a potential issue2.</p>
          <p>Out of 30 value pairs generated, 3 were not generated for a country (Neuenburg am
Rhein, Wain, and Offenburg). For those, the capital was represented by the same
literal as the country. Two distinct capitals were found for Swaziland (Mbabane, the
administrative capital and Lobamba, the royal and legislative capital). The Congo is
identified as a country, whereas it has been split into two distinct countries. Its capital
Leopoldville was since renamed Kinshasa. The capital of Sri Lanka is a URI, whereas
the range of the capital property is usually a de facto literal. Finally the capital of
Nicaragua is represented with display technical instructions “Managua right|20px”.
Overall, 7 value pairs out of 30 were deemed defective.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>Q2 - Which country is represented by this flag ?</title>
          <p>SELECT ?flag ?country
WHERE {
?c &lt;http://xmlns.com/foaf/0.1/depiction&gt; ?flag .
?c &lt;http://dbpedia.org/property/commonName&gt; ?country .
?c &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt;
&lt;http://dbpedia.org/class/yago/EuropeanCountries&gt;
}
LIMIT 30</p>
          <p>
            Q2 uses the Infobox dataset to identify the label of the different countries.
However, the FOAF ontology also helps identifying the flag of the country and the
YAGO (Yet Another Great Ontology) [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ] ontology ensures that only European
countries are selected. This excludes data which do not represent countries.
          </p>
          <p>Nevertheless, it is more difficult to find flags for non European countries, while
ensuring that only countries are selected. Indeed, in the YAGO ontology,
&lt;http://dbpedia.org/class/yago/EuropeanCountries&gt; is a subclass of
&lt;http://dbpedia.org/class/yago/Country108544813&gt;. But most European countries
are not retrieved when querying the dataset with
&lt;http://dbpedia.org/class/yago/Country108544813&gt;. Indeed, the SPARQL endpoint
does not provide access to inferred triples. It is necessary to perform a set of queries
to retrieve relevant subclasses and use them for the generation of variables.</p>
          <p>Out of 30 items including pictures of flags used as stimuli, 6 URIs did not resolve
to a usable picture (HTTP 404 errors or encoding problem).
2 http://wiki.dbpedia.org/Datasets</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>Q3 - Who succeeded to { Charles VII the Victorious } as ruler of France ?</title>
          <p>SELECT DISTINCT ?kingHR ?successorHR
WHERE {
?x &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://dbpedia.org/class/yago/KingsOfFrance&gt; .
?x &lt;http://dbpedia.org/property/name&gt; ?kingHR .
?x &lt;http://dbpedia.org/ontology/successor&gt; ?z .
?z &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://dbpedia.org/class/yago/KingsOfFrance&gt; .
?z &lt;http://dbpedia.org/property/name&gt; ?successorHR
}
LIMIT 30</p>
          <p>Q3 uses the YAGO ontology to ensure that the resource retrieved is indeed a king
of France. Out of 30 results, one was incorrect (The three Musketeers). The query
generated duplicates because of the multiple labels associated to each king. The same
king was named for instance Louis IX, Saint Louis, Saint Louis IX. Whereas
deduplication is a straight forward process in this case, the risk of inconsistent naming
patterns among options of the same item is more difficult to tackle. An item was
indeed generated with the following 3 options: Charles VII the Victorious, Charles 09
Of France, Louis VII. They all use a different naming pattern, with or without the
king’s nickname and with a different numbering pattern.</p>
        </sec>
        <sec id="sec-4-1-4">
          <title>Q4 - What is the capital of { Argentina }? With feedback</title>
          <p>SELECT ?countryHR ?capitalHR ?pictureCollection
WHERE {
?country &lt;http://dbpedia.org/property/commonName&gt; ?countryHR .
?country &lt;http://dbpedia.org/property/capital&gt; ?capitalHR .
?country &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt;
&lt;http://dbpedia.org/class/yago/EuropeanCountries&gt; .
?country &lt;http://dbpedia.org/property/hasPhotoCollection&gt; ?pictureCollection
}
LIMIT 30</p>
          <p>The above question is a variation of Q1. It adds a picture collection from a distinct
dataset in the response feedback. It uses the YAGO ontology to exclude countries
outside Europe and resources which are not countries. A feedback section is added.
When the candidate answers the item, he then receives a feedback if the platform
allows it. In the feedback, additional information or formative resources can be
suggested. Q4 uses the linkage of the DBpedia dataset with the Flickr wrapper
dataset. However the Flickr wrapper data source was unavailable when we performed
the experiment.</p>
        </sec>
        <sec id="sec-4-1-5">
          <title>Q5 - Which category does { Asthma } belong to?</title>
          <p>SELECT DISTINCT ?diseaseName ?category
WHERE {
?x &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://dbpedia.org/ontology/Disease&gt; .
?x &lt;http://dbpedia.org/property/meshname&gt; ?diseaseName .
?x &lt;http://purl.org/dc/terms/subject&gt; ?y .
?y &lt;http://www.w3.org/2004/02/skos/core#prefLabel&gt; ?category
}
LIMIT 30</p>
          <p>Q5 aims to retrieve diseases and their categories. It uses SKOS and Dublin Core
properties. The Infobox dataset is only used to find labels. Labels from the MESH
vocabularies are even available. Nevertheless, the SKOS concepts are not related to a
specific SKOS scheme. Categories retrieved range from Skeletal disorders to
childhood. For instance, the correct answer to the question on Obesity is childhood.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2 The publication of items on the TAO platform</title>
        <p>The TAO platform3 is an open source semantic platform for the creation and delivery
of assessment tests and items. It has been used in multiple assessment contexts,
including large scale assessment in the PIAAC and PISA surveys of the OECD,
diagnostic assessment and formative assessment.</p>
        <p>We imported QTI items generated for the different item models in the platform, in
order to validate the overall Linked Data based item creation streamline. Figure 4
presents an item generated from Q1 (Figure 3) imported in the TAO platform.</p>
        <p>The experimentation of the streamline was therefore tested with SPARQL queries
which use various ontologies and which collect various types of variables. It raised
two types of issues for which future work should find relevant solutions: the quality
of the data and the relevance of particular statements for the creation of an assessment
item.</p>
      </sec>
      <sec id="sec-4-3">
        <title>5.1 Data quality challenges</title>
        <p>In our experiment, the chance that an item will have a defective prompt or a
defective correct answer is equal to the number of defective variables used for the
item creation. Q1 uses the most challenging dataset in terms of data quality. 7 out of
30 questions had a defective prompt or a defective correct answer (23,33%).</p>
        <p>The chance that an item will have defective distractors is represented by the
following formula, where D is the total number of distractors, d(V) is the number of
defective variables and V is the total number of variables:</p>
        <p>
          We used 2 distractors. Among the items generated from Q1, 10 items had a
defective distractor (33,33%). Overall, 16 out of 30 items had neither a defective
prompt nor a defective correct answer nor a defective distractor (53,33%).
As a comparison, the items generated from unstructured content (text) that are
deemed usable without edit were measured between 3,5% and 5% by Mitkov et al.
[
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] and between 12% and 21% by Karamanis et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The difficulty of generating
items from structured sources should be lower. Although a manual selection is
necessary in any case, the mechanisms we have implemented can be improved.
        </p>
        <sec id="sec-4-3-1">
          <title>The ontology</title>
          <p>Q1 used properties from the Infobox dataset, which has no proper underlying
ontology. Q1 can therefore be improved by using ontologies provided by DBpedia, as
demonstrated by Q2 for which no distractor issue was identified. We present Q1 and
Q2 to illustrate this improvement but it should be noted that there is not always a
straight equivalent to the properties extracted from the Infobox dataset.
Q5 could be improved either if the dataset would be linked to a more structured
knowledge organization system (KOS) or through an algorithm which would verify
the nature of the literals provided as a result of the SPARQL query.</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>The labels</title>
          <p>The choice of the label for each concept to be represented in an item is a challenge
when concepts are represented by multiple labels (Q4). The selection of labels and
their consistency can be ensured by defining representation patterns or by using
datasets with consistent labeling practices.</p>
        </sec>
        <sec id="sec-4-3-3">
          <title>Inaccurate statements</title>
          <p>
            Most statements provided for the experiment are not inaccurate in their original
context but they sometimes use properties which are not sufficiently precise for the
usage envisioned (e.g., administrative capital). In other cases, the context of validity
of the statement is missing (e.g., Leopoldville used to be the capital of a country
called Congo). The choice of DBpedia as a starting point can increase this risk in
comparison to domain specific data sources provided by scientific institutions for
instance. Nevertheless, the Semantic Web raises similar quality challenges as the ones
encountered in heterogeneous and distributed data sources [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ]. Web 2.0 approaches,
as well as the automatic reprocessing of data can help improve the usability of the
Semantic Web statements. This requires setting up a traceability mechanism between
the RDF paths used for the generation of items and the items generated.
          </p>
        </sec>
        <sec id="sec-4-3-4">
          <title>Data linkage</title>
          <p>Data linkage clearly raises an issue because of the reliability of the mechanism on
different data sources. Q3 provided 6 problematic URIs out of 30 (i.e., 20%). Q4
generated items for which no URI from the linked data set was resolvable since the
whole Flickr wrapper data source was unavailable. This clearly makes the generated
items unusable. The creation of infrastructure components such as the SPARQL
Endpoint status for CKAN4 registered data sets5 can help provide solutions to this
quality issue over the longer run.</p>
        </sec>
        <sec id="sec-4-3-5">
          <title>Missing inferences</title>
          <p>Finally, the SPARQL endpoint does not provide access to inferred triples. Our
streamline does not tackle transitive closures on the data consumer side (e.g., through
repeated queries), as illustrated with Q3. Further consideration should be given to the
provision of data including inferred statements. Alternatively, full datasets could be
imported. Inferences could then be performed in order to support the item generation
process.</p>
          <p>Different strategies can therefore be implemented to cope with data quality issues we
encountered. Data publishers can improve the usability of the data, for instance with
the implementation of an upper ontology in DBpedia. However, other data quality
issues require data consumers to improve their data collection strategy, for instance to
collect as much information as possible on the context of validity of the data,
whenever it is available.</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>5.2 Data selection</title>
        <p>
          The experiment also showed that the Linked Data statements should be selected. The
suitability of an assessment item for a test delivered to a candidate or a group of
candidates is measured in particular through such information as the item difficulty.
4 http://www.ckan.net
5 http://labs.mondeca.com/sparqlEndpointsStatus/index.html
The difficulty can be assessed through a thorough calibration process in which the
item is given to beta candidates for extracting psychometric indicators. In low stake
assessment, however, the evaluation of the difficulty is often manual (candidate or
teacher evaluation) or implicit (the performance of previous candidates who took the
same item). In the item generation models we have used, each item has a different
construct (i.e., it assesses a different knowledge). In this case, the psychometric
variables are more difficult to predict [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. A particular model is necessary to assess
the difficulty of items generated from Semantic Web sources. For instance, it is likely
that for a European audience, the capital of the Cook Islands will raise a higher rate of
failure than the capital of Belgium. There is no information in the datasets, which can
support the idea of a higher or lower difficulty. Moreover, the difficulty of the item
also depends on the distractors, which in this experiment were generated on a random
basis from a set of equivalent instances. As the generation of items from structured
Web data sources will become more elaborated, it will therefore be necessary to
design a model for predicting the difficulty of generated items.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6 Conclusion and future work</title>
      <p>The present experimentation shows the process for generating assessment items
and/or assessment variables from Linked Data. The performance of the system in
comparison with other approaches shows its potential as a strategy for assessment
item generation. It is expected that data linkage can provide relevant content for
instance to propose formative resources to candidates who failed an item or to
illustrate a concept with a picture published as part of a distinct dataset.
The experimentation shows the quality issues related to the generation of items based
on such a resource as DBpedia. It should be noted that the measurements were made
with a question which raises particular quality issues. It can be easily improved as
shown with other questions. Nevertheless the Linked Data Cloud also contains
datasets published by scientific institutions, which may therefore raise less data
accuracy concerns. In addition, the usage model we are proposing is centered on low
stake assessment, for which we believe that the time saved makes it worthwhile
having to clean some of the data, while the overall process remains valuable.</p>
      <p>
        Nevertheless, additional work is necessary both on the data and on the assessment
items. The items created demonstrate the complexity of generating item variables for
simple assessment items. We aim to investigate the creation of more complex items
and the relevance of formative resources which can be included in the item as
feedback. Moreover, the Semantic Web can provide knowledge models from which
items could be generated. Our work is focused on semi-automatic item generation,
where users create item models, while the system aims to generate the variables.
Nevertheless, the generation of the items from a knowledge model as in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] requires
that more complex knowledge is encoded in the data (e.g., what happens to water
when the temperature decreases). The type and nature of data published as Linked
Data need therefore to be further analyzed in order to support the development of
such models for the fully automated generation items based on knowledge models.
      </p>
      <p>We will focus our future work on the creation of an authoring interface for item
models with the use of data sources from the Semantic Web, on the assessment of
item quality, on the creation of different types of assessment items from Linked Data
sources, on the traceability of items created, including the path on the Semantic Web
datasets which were used to generate the item, and on the improvement of data
selection from semantic datasets.</p>
      <p>Acknowledgments. This work was carried out in the scope of the iCase project on
computer-based assessment. It has benefited from the TAO semantic platform for
eassessment (https://www.tao.lu/) which is jointly developed by the Tudor Research
Centre and the University of Luxembourg, with the support of the Fonds National de
la Recherche in Luxembourg, the DIPF (Bildungsforschung und
Bildungsinformation), the Bundesministerium für Bildung und Forschung, the
Luxemburgish ministry of higher education and research, as well as OECD.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Belluck</surname>
          </string-name>
          , P. To Really Learn,
          <source>Quit Studying and Take a Test. New York Times. January 20th</source>
          ,
          <year>2011</year>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Karpicke</surname>
            ,
            <given-names>J. D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Blunt</surname>
            ,
            <given-names>J. R.</given-names>
          </string-name>
          <string-name>
            <surname>Retrieval</surname>
          </string-name>
          <article-title>Practice Produces More Learning than Elaborative Studying with Concept Mapping</article-title>
          . Science. (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Gilbert</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gale</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Warburton</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Wills</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>Report on Summative E-Assessment Quality (REAQ)</article-title>
          .
          <source>Joint Information Systems Committee</source>
          , Southampton. (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Aldabe</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , Lopez de Lacalle,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Maritxalar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Martinez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Uria</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          <article-title>Arikiturri: an Automatic Question Generator Based on Corpora and NLP techniques</article-title>
          ,
          <source>ser. Lecture Notes in computer science</source>
          , vol.
          <volume>4053</volume>
          , pp.
          <fpage>584</fpage>
          -
          <lpage>594</lpage>
          . Springer, Heidelberg (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J. S. Y.</given-names>
          </string-name>
          <article-title>Automatic correction of grammatical errors in non-native English text</article-title>
          .
          <source>PhD dissertation</source>
          at The Massachussets Institute of Technology. (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Goto</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kojiri</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Watanabe</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iwata</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Yamada</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <article-title>Automatic Generation System of Multiple-Choice Cloze Questions and its Evaluation. Knowledge Management &amp; E-Learning:</article-title>
          <source>An International Journal (KM&amp;EL)</source>
          ,
          <volume>2</volume>
          (
          <issue>3</issue>
          ),
          <fpage>210</fpage>
          . (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Karamanis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ha</surname>
            ,
            <given-names>L. A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Mitkov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>Generating multiple-choice test items from medical text: a pilot study</article-title>
          .
          <source>In Proceedings of the Fourth International Natural Language Generation Conference</source>
          , pp.
          <fpage>111</fpage>
          -
          <lpage>113</lpage>
          . (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Y.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sung</surname>
            ,
            <given-names>L.C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          <article-title>An Automatic Multiple-Choice Question Generation Scheme for English Adjective Understanding</article-title>
          . Workshop on Modeling, Management and Generation of Problems/Questions in eLearning,
          <source>the 15th International Conference on Computers in Education (ICCE</source>
          <year>2007</year>
          ), pages
          <fpage>137</fpage>
          -
          <lpage>142</lpage>
          . (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Brown</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frishkoff</surname>
            ,
            <given-names>G. A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Eskenazi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Automatic question generation for vocabulary assessment</article-title>
          .
          <source>In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing</source>
          (pp.
          <fpage>819</fpage>
          -
          <lpage>826</lpage>
          ). (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Sung</surname>
          </string-name>
          , L.-
          <string-name>
            <surname>C. Lin</surname>
            ,
            <given-names>Y.-C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M. C.</given-names>
          </string-name>
          <article-title>The Design of Automatic Quiz Generation for Ubiquitous English E-Learning System</article-title>
          .
          <source>Technology Enhanced Learning Conference (TELearn</source>
          <year>2007</year>
          ), pp.
          <fpage>161</fpage>
          -
          <lpage>168</lpage>
          , Jhongli,
          <string-name>
            <surname>Taiwan.</surname>
          </string-name>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Linnebank</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liem</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Bredeweg</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <article-title>Question generation and answering</article-title>
          .
          <source>DynaLearn, EC FP7 STREP project 231526, Deliverable D3</source>
          .
          <fpage>3</fpage>
          . (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <article-title>SARAC: A Framework for Automatic Item Generation</article-title>
          .
          <source>In 2009 Ninth IEEE International Conference on Advanced Learning Technologies</source>
          (pp.
          <fpage>556</fpage>
          -
          <lpage>558</lpage>
          ).
          <source>Presented at the 2009 Ninth IEEE International Conference on Advanced Learning Technologies (ICALT)</source>
          , Riga, Latvia. (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seneff</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>Speech-Based Interactive Games for Language Learning</article-title>
          : Reading, Translation, and
          <string-name>
            <surname>Question-Answering</surname>
          </string-name>
          .
          <source>Computational Linguistics and Chinese Language Processing</source>
          Vol.
          <volume>14</volume>
          , No.
          <issue>2</issue>
          , pp.
          <fpage>133</fpage>
          -
          <lpage>160</lpage>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lai</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alves</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Gierl</surname>
            ,
            <given-names>M. J.</given-names>
          </string-name>
          <article-title>Using automatic item generation to address item demands for CAT</article-title>
          .
          <source>In Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing</source>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Gierl</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alves</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Developing</surname>
          </string-name>
          <article-title>a Taxonomy of Item Model Types to Promote Assessment Engineering</article-title>
          .
          <source>Journal of Technology, Learning, and Assessment</source>
          ,
          <volume>7</volume>
          (
          <issue>2</issue>
          ). (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Sarre</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Foulonneau</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Reusability in e-assessment: Towards a multifaceted approach for managing metadata of e-assessment resources</article-title>
          .
          <source>Fifth International Conference on Internet and Web Applications and Services</source>
          . (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Suchanek</surname>
            ,
            <given-names>F. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kasneci</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Weikum</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>Yago: a core of semantic knowledge</article-title>
          .
          <source>In Proceedings of the 16th international conference on World Wide Web</source>
          (pp.
          <fpage>697</fpage>
          -
          <lpage>706</lpage>
          ). (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Mitkov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>An</given-names>
            <surname>Ha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            , &amp;
            <surname>Karamanis</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>A computer-aided environment for generating multiple-choice test items</article-title>
          .
          <source>Natural Language Engineering</source>
          ,
          <volume>12</volume>
          (
          <issue>02</issue>
          ),
          <fpage>177</fpage>
          -
          <lpage>194</lpage>
          . (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Foulonneau</surname>
          </string-name>
          , Muriel, Cole, Timothy W.
          <article-title>Strategies for reprocessing aggregated metadata</article-title>
          .
          <source>European Conference on Digital Libraries. Lecture notes in computer science 3652</source>
          ,
          <fpage>290</fpage>
          -
          <lpage>301</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Bejar</surname>
            ,
            <given-names>I. I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lawless</surname>
            ,
            <given-names>R. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morley</surname>
            ,
            <given-names>M. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wagner</surname>
            ,
            <given-names>M. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bennett</surname>
            ,
            <given-names>R. E.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Revuelta</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>A feasibility study of on-the-fly item generation in adaptive testing</article-title>
          .
          <source>Educational Testing Service</source>
          . (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>