<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Generation of Assessment Questions from Textbooks Enriched with Knowledge Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lucas Dresscher</string-name>
          <email>l.l.j.dresscher@students.uu.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Isaac Alpizar-Chacon[</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergey Sosnovsky[</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Utrecht University</institution>
          ,
          <addr-line>Utrecht</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Augmenting digital textbooks with assessment material improves their e ectiveness as learning tools. It can be a laborious task requiring considerable amount of time and expertise. This paper presents an automated assessment generation tool that works as a component of the Intextbooks platform. Intextbooks extracts ne-grained knowledge models from PDF textbooks and converts them into semantically annotated learning resources. With the help of the developed assessment components, these textbooks become interactive educational tools capable to assess students' knowledge of relevant concepts. The results of an expert-based pilot evaluation show that generated questions are properly worded and have a good range in term of di culty. From the point of assessment value, some generated questions types fall behind manually constructed assessment, while others obtain comparable results.</p>
      </abstract>
      <kwd-group>
        <kwd>Assessment generation</kwd>
        <kwd>Interactive textbooks</kwd>
        <kwd>Textbook models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        1Adding assessment to digital textbooks can greatly improve their e ectiveness
as learning tools from several perspectives. Being interactive learning activities,
assessment questions allow students to break from mundane consumption of
reading material, thus making learning more engaging [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. They enable
practice and training of knowledge acquired from textbooks, thus allowing students
to work with the learning material on di erent levels of cognitive complexity
[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. And nally, they can provide solid evidence of students' knowledge which is
a crucial step for transforming a textbook into an adaptive educational system
(AES) [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. Without such evidence, reliable modelling of students' knowledge
becomes a much harder task and the AES has to do with less informative
indicators of knowledge comprehension, such as annotations [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], browsing patterns
[
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] or reading time [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        There are three principle approaches to add such assessment resources to a
textbook: by carefully crafting them [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], by integrating textbooks with external
      </p>
    </sec>
    <sec id="sec-2">
      <title>1 Copyright © 2021 for this paper by its authors. Use permitted under Creative</title>
      <p>
        Commons License Attribution 4.0 International (CC BY 4.0).
practice material [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] and by generating assessment directly from the textbook
and/or models attached to it. In this paper, we propose a technology that follows
the latter approach.
      </p>
      <p>
        While the recently published studies on assessment generation do show
promising developments (see [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] for a systematic overview), a number of aspects still
prove to be a challenge. Some of them are related to certain questions types. For
example, generation of e ective distractors - the incorrect options - for
multiple choice questions (MCQs) is a long-standing problem. Other issues are much
more speci c for the eld of cognitive assessment and student modelling where
questions are supposed to provide evidence of knowledge of an individual concept
rather than estimate the level of mastery in the entire domain. In such a case, it
is crucial that the assessment component can accurately de ne the scope of the
questions - the key term/concept that should become the target of assessment.
And as the next step, it should be able to formulate a question that is properly
worded, grammatically correct, easy to understand, has a reasonable level of
di culty, and (most importantly) can be used to assess students' knowledge of
the target concept.
      </p>
      <p>
        To this end, we have developed an automated assessment generation tool
that is used as a component within the Intextbooks platform [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Intextbooks
extracts knowledge models from well-formatted PDF-based textbooks and
transforms them into semantically-annotated educational resources. An important
characteristic of these resources when used as input for assessment generation is
that they become a source of both high-quality learning content and a semantic
model annotating it. The Intextbooks platform can de ne which concept from
the underlying model needs to be tested. As a response, the assessment
component can utilise both the relevant parts of the textbooks as well as the semantic
neighborhood of the target concept to generate a set of questions targeting the
required concept.
      </p>
      <p>The rest of this paper is structured as follows. Section 2 provides a brief
overview of assessment generation research. Section 3 outlines most important
details of the Intextbooks platform. Section 4 describes the proposed assessment
generation component. Section 5 presents the results of an expert-based
validation study. Finally, Section 6 concludes the paper with a discussion and a
summary of potential directions for future work.
2</p>
      <sec id="sec-2-1">
        <title>Related work</title>
        <p>
          Automated question generation (AQG) is a well-researched area that has been
studied for more than three decades, with a surge of activity over the past few
years [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. The main purpose of AQG systems is to aid in or to replace the
manual construction of (assessment) questions by experts - a time consuming
process with an often awed outcome [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]. Many di erent systems have been
described over the years that employ di erent generation methods and generate
questions from varying sources. Text has proven to be the most popular form of
input, rather than structured sources like ontologies [
          <xref ref-type="bibr" rid="ref20 ref7">7, 20</xref>
          ].
        </p>
        <p>
          A system that uses text as input type often employs a rule-based generation
method, an approach that uses rules to specify the conditions and
transformations required to create a certain question [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. It utilizes syntactic and semantic
information of the text to do so, e.g. provided by annotations from a natural
language processing tool. This information is then used to generate di erent
types of questions, like true-false (yes-no) questions (TFQ) [
          <xref ref-type="bibr" rid="ref11 ref17">11, 17</xref>
          ], cloze
(gapll) questions (CQ) [
          <xref ref-type="bibr" rid="ref1 ref25 ref8">1, 8, 25</xref>
          ] or multiple-choice questions (MCQ) [
          <xref ref-type="bibr" rid="ref22 ref23 ref25">22, 23, 25</xref>
          ]. A
TFQ is a simple declarative sentence to which the answer is either true or false.
A CQ consists of a sentence where one word or a sequence of words is replaced
by a gap, to be lled in by the student. An MCQ is any question that contains
multiple options from which the student needs to choose the correct answer.
        </p>
        <p>
          Each question type introduces its own speci c set of challenges. Gap
selection for cloze questions and distractor generation for multiple-choice questions
are the most notable ones. Gap selection is concerned with selecting the most
appropriate word(s) in the sentence to be replaced by a gap. One approach for
this is to use a set of features that evaluate and rank each candidate word based
on its syntactic and semantic information [
          <xref ref-type="bibr" rid="ref1 ref25">1, 25</xref>
          ]. One of the biggest challenges
for MCQs is the generation of good distractors [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] - the incorrect answers that
accompany the correct answer (the key) as options. A lot of research has been
done on generating appropriate distractors - concepts that should be
semantically close to the key, but cannot serve as the right answer itself [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. A dominant
approach is to select distractor concept based on their similarity with the key
concept [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], e.g. syntactical similarity [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] or contextual similarity [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
3
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Intextbooks</title>
        <p>
          The Intextbooks (Intelligent textbooks) system [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] performs the complete
transformation of PDF textbooks into online intelligent educational resources.
After extracting a knowledge model from a PDF textbook, it converts it into an
HTML/CSS representation with a ne-grained DOM (Document Object Model)
enriched with semantic information extracted from the content and formatting
of the textbook. Intextbooks consists of two main components. The o ine
component performs textbook modeling and conversion to HTML, while the online
component supports students' interaction with the textbooks. For the current
work, we are interested in the o ine component.
        </p>
        <p>
          As the rst step, the semantic model of a textbook is extracted by a
rulebased system. Its rule set captures common conventions and formatting
guidelines for textbook formatting, structuring and organisation. Such elements and
tables of contents and indices play the crucial role. However, more subtle
aspects, such as formatting styles, repeated texts and commonly used labels, are
employed as well. More information can be found in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. On the next stage, the
domain terms extracted from the textbook index are linked to DBpedia2. As a
result, the model is enriched with additional semantic information [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Finally,
        </p>
        <sec id="sec-2-2-1">
          <title>2 http://dbpedia.org</title>
          <p>
            the knowledge model is serialized as an XML le using the Text Encoding
Initiative (TEI)3; the additional semantic information from DBPEdia is added as
RDFa annotations4. Altogether, three phases, seven main stages, 17 steps, and
54 unique rules have been de ned to handle the extraction process (a detailed
description of the complete work ow is provided in [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]).
          </p>
          <p>
            The research presented in this paper mostly bene ts of those steps of the
Intextbooks work ow that deal with processing textbooks' indices. Figure 1
illustrates these steps. Index identi cation processes a variety of di erent index
sections (multicolumn, at, hierarchical) to identify individual index terms (main
headings, subentries, locators, cross-references). Each index term has a set of
associated page references, which are identi ed as well. Then, the term recognition
step identi es the correct reading label and the corresponding sentences for each
index term in its reference pages. The reading label is the right reading order
for hierarchical index terms (e.g., `gamma distribution' opposed to `distribution
gamma'). After that, several steps are used to complete term linking and term
enrichment phases in order for index terms to become connected to their
corresponding resources in DBpedia. As a result, the index terms are enriched and
annotated with semantic information: abstract, categories, Wikipedia article,
related terms, and domain speci city { the primary relationship of the index term
to the domain of interest [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]. Finally, in the TEI model construction step, the
structure, content, index terms, and semantic information are expressed using
TEI and RDFa attributes.
          </p>
          <p>In the resulting knowledge models, each content unit (page, subchapter,
chapter) is annotated with its corresponding index terms. Additionally, each index
term is associated with the exact sentences in which it appears in the reference
pages and with additional semantic information.</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>3 https://tei-c.org/ 4 http://rdfa.info/</title>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Question generation system</title>
        <p>
          Our AQG component broadly follows the pipeline regularly used by rule-based
question generation systems [
          <xref ref-type="bibr" rid="ref1 ref20 ref23 ref25">1, 20, 23, 25</xref>
          ]. However, it uses a unique combination
of both textual and semantic features as input, and therefore deviates from
existing systems at a number of ways. An overview of our AQG component is
shown in gure 2.
        </p>
        <p>First, the system extracts all sentences from the textbook that are related to
the target domain concepts as de ned in the TEI/XML(+RDFa) model. A range
of Natural Language Processing (NLP) tools is applied to annotated sentences
with syntactic and semantic information. This allows to lter out sentences that
are grammatically incongruous. Each remaining sentence is then rated according
to several criteria that utilize NLP annotations, together with additional
information from the the model about the sentence's target concept. Finally, the best
phrases are used to generate up to ve di erent question types.
The AQG component uses the TEI/XML(+RDFa) model described in section
3 to extract all sentences from the textbook relevant to the target concept. The
model speci es in which sections of a textbook the concepts are introduced (as
de ned in the index) and links them to all the sentences from these sections
that mention the concepts. In addition, the index terms' enrichments are
extracted from the model. This includes related concepts, its DBPedia abstract
and Wikipedia page and its domain speci city. The latter information is used to
lter out concepts (and their corresponding sentences) that are unrelated to the
domain (e.g. terms from other domains used as examples and usecases, such as
epidemic in a statistics textbook). This step results in an initial set of sentences,
corresponding to the target concepts from the main domain of the textbook.
4.2</p>
        <p>
          Preprocessing
In the second step, standard preprocessing common to NLP tasks [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] is
performed. We employ the Stanford CoreNLP5 tool for this purpose, which o ers
a pipeline of NLP annotators: tokenization, sentence splitting, parts-of-speech
(POS) tagging, named entity recognition (NER), lemmatization and dependency
parsing.
        </p>
        <p>Figure 3 displays an example phrase annotated by the Stanford CoreNLP
pipeline. It is a sentence from the statistics textbook OpenIntro Statistics and
has three target concepts: variance, standard deviation and random variable. It
shows each word's part-of-speech (POS) - its function in the sentence - and the
sentence's dependencies, i.e. its grammatical structure and the syntactic relations
between the words.</p>
        <p>Utilizing the above mentioned annotations, the system lters out several
types of sentences from the initial list of phrases. First, sentences that are
grammatically incorrect or of an unusable structure, like questions or imperative
phrases, are removed. Then, sentences that contain verbal references to
previously de ned context are lter out as well. This involves phrases that start with
a discourse connective (e.g. \so", \because") or a personal/possessive pronoun
(e.g. \I", \theirs") and sentences that contain a demonstrative pronoun/adjective
(e.g. \this", \those"). Sentences that refer to visual elements (e.g. a table, graph
or formula), are also removed. Additionally, the component also excludes phrases
that originally served as numerical examples, i.e. ones with a very high ratio
of numbers. Overall, the preprocessing step transforms the initial set of input
phrases into a set of grammatically congruous, standalone (not requiring
additional context) sentences with NLP annotations.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5 https://stanfordnlp.github.io/CoreNLP/</title>
      <p>
        Sentence selection
The remaining sentences are rated according to a set of criteria, shown in table 1.
Every criterion has a weight that indicates its relative importance. To compute
the overall sentence score, the weighted sum of all features is taken, i.e. s =
Pin=1 fi wi, where s denotes the overall sentence score, f a feature score and
w its corresponding weight. Finally, the sentences are compared to a threshold
score, producing a set of potential source phrases for question generation. The
criteria, their weights and the threshold are selected based on existing research
[
        <xref ref-type="bibr" rid="ref1 ref22 ref23 ref25">1, 22, 23, 25</xref>
        ] and our own calibration experiments.
      </p>
      <p>The sentence header similarity feature computes the textual similarity6
between the sentence and the header of its chapter/section, highlighting central
sentences of textbook sections. Complexity counts the number of clauses, i.e. a
subject accompanied by a predicate, of the sentence with score being deducted
exponentially for sentences with more than three clauses. It uses the sentence's
parse tree to do so. Similarly, length considers the number of words of the
sentence, with score being deducted exponentially for sentences with more than 25
or fewer than 10 words. Both features aim to select sentences that contain an
optimal amount of context. Domain speci city utilizes the domain speci city of the
terms present in a sentence. This metric is supplied by the TEI/XML(+RDFa).
The superlatives and comparatives features detect informative sentences that
contain either one or more superlatives or comparatives, using the sentence's
POS tags.
4.4</p>
      <p>Question type selection
The fourth step of the AQG component determines which question types can be
generated from the selected set of remaining sentences. It looks at their structural
and external properties. In systems that generate only a single question type,</p>
    </sec>
    <sec id="sec-4">
      <title>6 https://nlp.stanford.edu/IR-book/html/htmledition/dot-products-1.html</title>
      <p>
        this step is typically incorporated in the sentence selection module as a small
number of additional features (e.g., [
        <xref ref-type="bibr" rid="ref1 ref23">1, 23</xref>
        ]). Our system can generate up to ve
types of questions per source sentence: three types of true-false questions, cloze
questions and multiple-choice questions. This step is also responsible for the
nal removal of sentences that cannot be used to generate at least one type of
questions.
      </p>
      <p>
        The unmodi ed true-false question (TFU) is a standard true-false question
and only requires the phrase to be a declarative sentence. Such sentences follow
a subject-verb-object (SVO) structure. The negated true-false question (TFN)
is a modi ed version of the previous type, where the original phrase is negated.
Such question type requires the source sentence to consist of a single independent
clause to minimize the chance of generating a poorly-worded question [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. The
substituted true-false question (TFS) modi es the original phrase by replacing
the target concept with a di erent concept. It requires the original concept to
be substitutable, which means: its label can occur only once in the sentence
and the rest of the sentence cannot provide cues about it. The choice of the
substitute is also an interesting problem that generally follows the same rules
as the selection of distractors for MCQs (see 4.5). Requirements to the source
sentence for a cloze question (CQ) are similar to TFS: the target concept can
occur only once, and the rest of the sentence should not hint towards it. We
also do not generate CQs for concept labels that are longer than three words to
avoid over-complicating the question [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. Finally, the MCQs are implemented
as a CQ for which the response format is multiple-choice instead of free response.
Hence, it has the same requirements to the sources sentence and an additional
condition that there are at least three generatable distractors for the sentence's
target concept (see 4.5). As an example, the sentence shown in gure 3 meets
all the above requirements and can be used to generate all ve question types.
4.5
      </p>
      <p>Question construction
In the nal step, all questions are constructed from the de nitive input set
of source sentences, to be presented to the student. This requires performing
question type speci c tasks, like stem negation (TFN), term substitution (TFS),
gap-selection (CQ and MCQ) and distractor generation (MCQ). Each subtask
is discussed in the subsections below.</p>
      <p>
        Stem negation and term substitution For a TFU, the source sentence
is directly used as question stem to which the answer is true. To generate a
more diverse set of true-false questions (and answers), the system also generates
negated and the substituted TFQs. For a TFN, the original simple sentence's
positive verb is modi ed to a negative verb and vice versa. It takes into account
di erent verbal structures, by looking at the phrase's POS and dependencies
annotations. For a TFS, the target concept is replaced by a related term. To
not provide any cues to the student, the replacing term matches the original
term's capitalization and the possibly preceding inde nite article, i.e. a or an, is
modi ed to match with the new term. The replacing term is selected using the
same approach as for the distractor generation (see 4.5). As opposed to TFUs,
the answer to both TFN and TFS questions is false. For example, the TFN of
the sentence shown in gure 3 would be: The variance and standard deviation
can not be used to describe the variability of a random variable. (Answer: false).
Gap selection Speci c to the CQ type is gap selection, where the target term
is replaced by a gap. Gap selection is based on three factors: the target concept's
length (at most three words), its domain speci city (only core domain concepts
are used) and its height in the syntactic tree of the sentence (a term higher in
the tree is scored higher as it contains more context in its sub-trees to create
an unambiguous question with a clearer aim [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]). For any term of three words
or less, the average of the other two factors is taken as the overall score. The
highest scoring target concept of a phrase is replaced by a gap and the correct
answer to the CQ is the replaced term. The CQ resulting from the example
sentence would be: The variance and can be used to describe
the variability of a random variable. (Answer: standard deviation).
Distractor generation Our system utilizes a combination of syntactic and
semantic information for the generation of distractors. Rather than using an
external source to retrieve concepts that are semantically similar [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], our approach
uses as candidate distractors concepts related to the target concept as de ned
in the TEI/XML(+RDFa). Table 2 shows an overview of the feature set used to
score and select the most appropriate distractors. Similar to the sentence
selection module, the weighted average is taken to determine the overall distractor
score. Each distractor is ranked according to its score and is selected when it
meets a given threshold, which can vary depending on the number of distractors
required for the question type.
      </p>
      <p>Example distractors for standard deviation, one of the target concepts of the
sentence from gure 3, are standard error, mean and sample statistic. Finally,
note that for MCQs, the selected target concept is replaced by only a single gap.
This is to avoid providing cues about the correct answer to the student.
5</p>
      <sec id="sec-4-1">
        <title>Evaluation</title>
        <p>
          Procedure
The developed AQG component has been evaluated in the domain of
introductory statistics. We have used the Intextbooks platform to extract models from
three university-level textbooks [
          <xref ref-type="bibr" rid="ref15 ref21 ref9">9, 15, 21</xref>
          ] and randomly selected ten core
concepts that co-occurred in all three models. Five of these concepts were used to
automatically generate questions of all ve question types. The other ve
questions were created manually. The sentences for generated questions were selected
by the AQG component from all three textbooks based on the highest scores.
The sentences for manually created questions were selected by an expert who
located corresponding pages according to the textbooks indices and chose the
candidate sentences knowing how the resulting questions should look like.
        </p>
        <p>The resulting set consisted of 25 generated and 25 crafted questions (ten
per question type and ve per concept) and was given to three domain experts
to evaluated them based on several criteria: overall wording (i.e., if a question
is both grammatical correct and naturally formulated), assessment value (i.e.,
if a question is capable to assess the target concept) and di culty (i.e., how
challenging the question is). The experts had to rate all 50 questions according
to these 3 criteria on a 3-point scale (3 = max).</p>
        <p>Such a setup has allowed us to focus on two main research questions:
{ Is our approach potentially sound? In other words, can such a form of AQG
potentially produce high-quality assessment questions of various di culty?
{ Is our approach already capable of producing high-quality assessment items
of various di culty?
If the experts rank manually crafted questions low, this means the approach
needs a conceptual revamp, and these types of questions based on sentences
selected from textbooks simply cannot produce good assessment items. If the
experts rank generated questions low, but manually crafted questions high enough,
this means our approach is potentially sound and its quality can be improved
by ne-tuning the generation algorithm. If the experts rank generated questions
high, this means we have already achieved good results.
5.2</p>
        <p>Results
Fleiss' Kappa metric was computed for each metric to determine the inter-rater
agreement. The results for wording and assessment value were 0.24, 0.27, which
are reasonably low. The agreement for di culty was -0.02. This was rather
expected as di culty of assessment items is a hard metric to estimate objectively.
It is usually calibrated based on data produced by real test takers.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Discussion and future work</title>
        <p>
          This paper has presented an approach towards automated generation of
assessment questions from digital textbooks processed by the Intextbooks system [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
This research shows the potential of textbooks enriched with linked data. The
results from the expert-based validation of the approach show that the approach
requires further work, yet it is potentially capable to generate good quality
questions of various di culty.
        </p>
        <p>There are a number of concerns that need to be resolved before more reliable
results can be obtained. Textbooks are di erent in nature from e.g. Wikipedia
pages or dictionaries and the sentences selected from textbooks may require
more textual transformations to be useful as a question than initially
anticipated. Moreover, little to no domain-speci c information is used in any of the
components of the system, as the goal is to be able to generate (multiple types of)
questions for any textbook of any domain. This might be too ambitious; rather
than aiming for an open-domain system, it could be more feasible to design the
system for a subset of domains, e.g. formal domains exclusively.</p>
        <p>Furthermore, the quality of the system very much relies on the quality of
two external components: the TEI/XML(+RDFa) input model and the NLP
annotation tool. Errors in either of of the two components, e.g. missing or
incorrect input sentences or inaccurate annotations, propagate through the rest of
the system and can have a severe impact on the quality of the output. An
example of this is the Stanford CoreNLP coreference resolution, which we initially
used to detect references from input phrases to their context sentence and to
replace them by their referent. However, early experiments showed that it did
not o er a satisfying solution. For future work, it would be interesting to see its
performance when trained7 on the speci c domain of the input textbook.</p>
        <p>Figure 4 shows how the generated assessment questions could be used within
the Intextbooks system (see the top-right panel).</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>7 https://stanfordnlp.github.io/CoreNLP/coref.html#training-new-models</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannem</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Automatic gap- ll question generation from text books</article-title>
          .
          <source>In: BEA@ACL</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Alpizar-Chacon</surname>
            , I., van der Hart,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiersma</surname>
            ,
            <given-names>Z.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Theunissen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sosnovsky</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Transformation of pdf textbooks into intelligent educational resources</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Intelligent Textbooks</source>
          . vol.
          <volume>2674</volume>
          , pp.
          <volume>4</volume>
          {
          <fpage>16</fpage>
          .
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Alpizar-Chacon</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sosnovsky</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Expanding the web of knowledge: One textbook at a time</article-title>
          .
          <source>In: Proceedings of the 30th ACM Conference on Hypertext and Social Media</source>
          . p.
          <volume>9</volume>
          {
          <fpage>18</fpage>
          . HT '
          <volume>19</volume>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA (
          <year>2019</year>
          ). https://doi.org/10.1145/3342220.3343671, https://doi. org/10.1145/3342220.3343671
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Alpizar-Chacon</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sosnovsky</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Order out of chaos: Construction of knowledge models from pdf textbooks</article-title>
          .
          <source>In: Proceedings of the ACM Symposium on Document Engineering</source>
          <year>2020</year>
          . DocEng '20,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA (
          <year>2020</year>
          ). https://doi.org/10.1145/3395027.3419585, https: //doi.org/10.1145/3395027.3419585
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Alpizar-Chacon</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sosnovsky</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Knowledge models from pdf textbooks</article-title>
          .
          <source>New Review of Hypermedia and Multimedia</source>
          pp.
          <volume>1</volume>
          {
          <issue>49</issue>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Alpizar-Chacon</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sosnovsky</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>What's in an index: Extracting domain models from digital textbooks</article-title>
          .
          <source>In: Proceedings of the 32nd ACM Conference on Hypertext and Social</source>
          Media (submitted).
          <source>HT '21</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Alsubait</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Ontology-based question generation</article-title>
          .
          <source>Ph.D. thesis</source>
          , University of Manchester (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Brown</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frishko</surname>
            ,
            <given-names>G.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eskenazi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Automatic question generation for vocabulary assessment</article-title>
          .
          <source>In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing</source>
          . p.
          <volume>819</volume>
          {
          <fpage>826</fpage>
          . HLT '
          <volume>05</volume>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computational Linguistics, USA (
          <year>2005</year>
          ). https://doi.org/10.3115/1220575.1220678, https://doi.org/10. 3115/1220575.1220678
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Diez</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barr</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mine</surname>
            ,
            <given-names>C.R.</given-names>
          </string-name>
          : OpenIntro statistics. openintro.org (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Ericson</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>An analysis of interactive feature use in two ebooks</article-title>
          . In: Sosnovsky,
          <string-name>
            <given-names>S.A.</given-names>
            ,
            <surname>Brusilovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Baraniuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.G.</given-names>
            ,
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Lan</surname>
          </string-name>
          , A.S. (eds.)
          <source>Proceedings of the First Workshop on Intelligent Textbooks co-located with 20th International Conference on Arti cial Intelligence in Education (AIED</source>
          <year>2019</year>
          ), Chicago, IL, USA, June 25,
          <year>2019</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2384</volume>
          , pp.
          <volume>4</volume>
          {
          <fpage>17</fpage>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2019</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2384</volume>
          /paper01.pdf
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Flor</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riordan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A semantic role-based approach to open-domain automatic question generation</article-title>
          .
          <source>In: Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications</source>
          . pp.
          <volume>254</volume>
          {
          <fpage>263</fpage>
          . Association for Computational Linguistics, New Orleans,
          <source>Louisiana (Jun</source>
          <year>2018</year>
          ). https://doi.org/10.18653/v1/
          <fpage>W18</fpage>
          -0530, https://www.aclweb.org/ anthology/W18-0530
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Hake</surname>
            ,
            <given-names>R.R.</given-names>
          </string-name>
          :
          <article-title>Interactive-engagement versus traditional methods: A six-thousandstudent survey of mechanics test data for introductory physics courses</article-title>
          .
          <source>American journal of Physics</source>
          <volume>66</volume>
          (
          <issue>1</issue>
          ),
          <volume>64</volume>
          {
          <fpage>74</fpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yudelson</surname>
          </string-name>
          , M.,
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brusilovsky</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>A framework for dynamic knowledge modeling in textbook-based learning</article-title>
          .
          <source>In: Proceedings of the 2016 conference on user modeling adaptation and personalization</source>
          . pp.
          <volume>141</volume>
          {
          <issue>150</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Distractor generation for Chinese ll-in-the-blank items</article-title>
          .
          <source>In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications</source>
          . pp.
          <volume>143</volume>
          {
          <fpage>148</fpage>
          . Association for Computational Linguistics, Copenhagen, Denmark (Sep
          <year>2017</year>
          ). https://doi.org/10.18653/v1/
          <fpage>W17</fpage>
          -5015, https://www.aclweb.org/anthology/W17-5015
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Kaltenbach</surname>
            ,
            <given-names>H.M.:</given-names>
          </string-name>
          <article-title>A concise guide to statistics</article-title>
          . Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Karamanis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ha</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mitkov</surname>
          </string-name>
          , R.:
          <article-title>Generating multiple-choice test items from medical text: A pilot study</article-title>
          .
          <source>In: Proceedings of the Fourth International Natural Language Generation Conference</source>
          . pp.
          <volume>111</volume>
          {
          <fpage>113</fpage>
          . Association for Computational Linguistics, Sydney,
          <source>Australia (Jul</source>
          <year>2006</year>
          ), https://www.aclweb.org/anthology/W06- 1416
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Killawala</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khokhlov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reznik</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Computational intelligence framework for automatic quiz question generation</article-title>
          .
          <source>In: 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)</source>
          . pp.
          <volume>1</volume>
          {
          <issue>8</issue>
          (
          <year>2018</year>
          ). https://doi.org/10.1109/FUZZIEEE.
          <year>2018</year>
          .8491624
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>D.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Winchell</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Waters</surname>
            ,
            <given-names>A.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grimaldi</surname>
            ,
            <given-names>P.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baraniuk</surname>
            ,
            <given-names>R.G.</given-names>
          </string-name>
          , Mozer,
          <string-name>
            <surname>M.C.</surname>
          </string-name>
          :
          <article-title>Inferring student comprehension from highlighting patterns in digital textbooks: An exploration of an authentic learning platform (</article-title>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Krathwohl</surname>
            ,
            <given-names>D.R.:</given-names>
          </string-name>
          <article-title>A revision of bloom's taxonomy: An overview</article-title>
          .
          <source>Theory into practice 41(4)</source>
          ,
          <volume>212</volume>
          {
          <fpage>218</fpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Kurdi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sattler</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Al-Emari</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A systematic review of automatic question generation for educational purposes</article-title>
          .
          <source>International Journal of Arti cial Intelligence in Education</source>
          <volume>30</volume>
          (
          <issue>1</issue>
          ),
          <volume>121</volume>
          {204 (Mar
          <year>2020</year>
          ). https://doi.org/10.1007/s40593-019-00186-y, https://doi.org/10.1007/s40593- 019-00186-y
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Madsen</surname>
            ,
            <given-names>B.S.:</given-names>
          </string-name>
          <article-title>Statistic fro non-statisticians</article-title>
          . Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Majumder</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saha</surname>
            ,
            <given-names>S.K.</given-names>
          </string-name>
          :
          <article-title>A system for generating multiple choice questions: With a novel approach for sentence selection</article-title>
          .
          <source>In: Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications</source>
          . pp.
          <volume>64</volume>
          {
          <fpage>72</fpage>
          . Association for Computational Linguistics, Beijing, China (Jul
          <year>2015</year>
          ). https://doi.org/10.18653/v1/
          <fpage>W15</fpage>
          -4410, https://www.aclweb.org/ anthology/W15-4410
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Mitkov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ha</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karamanis</surname>
          </string-name>
          , N.:
          <article-title>A computer-aided environment for generating multiple-choice test items</article-title>
          .
          <source>Nat. Lang. Eng</source>
          .
          <volume>12</volume>
          ,
          <issue>177</issue>
          {
          <fpage>194</fpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Mouri</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suzuki</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shimada</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uosaki</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaneko</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ogata</surname>
          </string-name>
          , H.:
          <article-title>Educational data mining for discovering hidden browsing patterns using non-negative matrix factorization</article-title>
          .
          <source>Interactive Learning</source>
          Environments pp.
          <volume>1</volume>
          {
          <issue>13</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Pino</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heilman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eskenazi</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A selection strategy to improve cloze question quality (05</article-title>
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heilman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Automatic factual question generation from text (</article-title>
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Sosnovsky</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsiao</surname>
            ,
            <given-names>I.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brusilovsky</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Adaptation \in the wild": ontologybased personalization of open-corpus learning material</article-title>
          .
          <source>In: European Conference on Technology Enhanced Learning</source>
          . pp.
          <volume>425</volume>
          {
          <fpage>431</fpage>
          . Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Tarrant</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knierim</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hayes</surname>
            ,
            <given-names>S.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ware</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The frequency of item writing aws in multiple-choice questions used in high stakes nursing assessments</article-title>
          .
          <source>Nurse Education Today</source>
          <volume>26</volume>
          (
          <issue>8</issue>
          ),
          <volume>662</volume>
          {
          <fpage>671</fpage>
          (
          <year>2006</year>
          ). https://doi.org/https://doi.org/10.1016/j.nedt.
          <year>2006</year>
          .
          <volume>07</volume>
          .006, http://www. sciencedirect.com/science/article/pii/S0260691706001067, proceedings from the 1st Nurse Education International Conference
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Weber</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brusilovsky</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Elm-art: An adaptive versatile system for web-based instruction</article-title>
          .
          <source>International Journal of Arti cial Intelligence in Education (IJAIED) 12</source>
          ,
          <fpage>351</fpage>
          {
          <fpage>384</fpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>