<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic Question Generation for Evidence-based Online Courseware Engineering*</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>North Carolina State University</institution>
          ,
          <addr-line>Raleigh NC</addr-line>
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The goal of the current study is to develop an algorithm for generating pedagogically valuable questions. We focus on verbatim questions whose answer, by definition, can be literally identified in a source text. We assume that an important keyphrase relative to a specific learning objective can be identified in a given source text. We then further hypothesize that a pedagogically valuable verbatim question can be generated by converting the source text into a question for which the keyphrase becomes an answer. We therefore propose a model that identifies a keyphrase in a given source text with a linked learning objective. The tagged source text is then converted into a question using an existing model of question generation, QG-Net. An evaluation study was conducted with existing authentic online course materials. Corresponding course instructors judged 66% of the predicted keyphrases were suitable for the given learning objective. The results also showed that 82% of the questions generated by pre-trained QG-Net were judged as pedagogically valuable.</p>
      </abstract>
      <kwd-group>
        <kwd>Question Generation</kwd>
        <kwd>Deep Neural Network</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Learning Engineering</kwd>
        <kwd>MOOC</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Questions plays important roles on learning and teaching. On Massive Open Online
Courses (MOOC), formative questions are essential component to make the courseware
effective. A research demonstrated, for example, that students learn better when they
practice skills by answering questions than by only watching videos or reading text [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
In a broader context, the benefit of answering questions for learning has been shown in
many studies, aka test-enhanced learning [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. However, creating questions that
effectively help students’ learning requires experience and extensive efforts.
      </p>
      <p>
        Although there are several studies on the automation of question generation in the
field of AI in education [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ], little has been discussed about the pedagogical value of
the questions generated. To fill this gap, we propose a method for generating questions
that supposedly ask about the key concepts the students need to learn to attain the
learning objectives. As far as the authors are aware, there have been little study conducted
to generate questions that align with the learning objectives. We propose to develop a
technique called QUADL (QUestion generation with an Application of Deep Learning)
that generates verbatim questions from a pair of a learning objective and a sentence.
The verbatim question is a question for which an answer can be literally identified in a
related instructional text (i.e., source text).
      </p>
      <p>
        Our central hypothesis is that pedagogically valuable verbatim questions can be
generated if source texts are tagged with keyphrases relative to a given learning objective.
Once a source text is tagged, then existing seq2seq technologies for question conversion
can be used (e.g., [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6-8</xref>
        ]). The technological contribution of the current research is
therefore to develop a deep neural-network model to identify a keyphrase given a pair of a
source text and a learning objective.
      </p>
      <p>Accordingly, QUADL consists of the Answer Prediction model and the Question
Conversion model. The Answer Prediction model identifies a keyphrase in a given
source text. The Question Conversion model generates a question by converting the
source text into a question for which the keyphrase becomes the answer.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        The research on the automatic question generation has been growing rapidly among the
AIED researchers. Most of the early studies of question generation adapted the
rulebased models that relied on templates constructed by experts [
        <xref ref-type="bibr" rid="ref10 ref11 ref9">9-11</xref>
        ]. The scalability is,
however, a concern for the rule-based models. They often do not work for complex
sentences. The linguistic diversity in resulted questions is therefore limited.
      </p>
      <p>
        More recent works on question generation take a data-driven approach using neural
networks. Many variants of RNN-based models have been proposed and showed
considerable advances in the question generation task [
        <xref ref-type="bibr" rid="ref12 ref13 ref14 ref15 ref16 ref17">12-17</xref>
        ]. For general-purpose
question generation, large datasets collected from articles in Wikipedia or news media, such
as SQuAD [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], NewsQA [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], and MSMARCO [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], enabled to build neural-network
based models. Wang et al. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] demonstrated that an LSTM-based model, called
QGNet, trained on a general question generation dataset (SQuAD) can be used for
generating questions on educational contents. Questions were generated from textbooks on
Biology, Sociology and History for evaluation and showed the highest BLEU score
among the state-of-the-art techniques. Yet, the pedagogical value of the generated
questions has not been reported.
      </p>
      <p>
        Techniques for keyphrase extraction has been studied to suggest an answer candidate
from a given paragraph text (e.g., [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]). Since our model aims to select target tokens
that are aligned with a given learning objective, our proposed Answer Prediction model
is essentially different from those existing keyphrase extraction models.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Methods</title>
      <p>Question
Conversion</p>
      <p>Verbatim
Question(Q)
Learning objective (LO): Describe metabolic pathways as stepwise chemical
transformations either requiring or releasing energy; and recognize conserved themes in these
pathways.</p>
      <p>Source Text (S): Among the main pathways of the cell are photosynthesis and cellular
respiration, although there are a variety of alternative pathways such as fermentation.
Question (Q): Along with photosynthesis, what are the main pathways of the cell?
Answer: cellular respiration
Notice that the answer is tagged in source text S (underlined in the example above). We
call the tagged answer in the given source text S a target token hereafter. The target
token might contain multiple words as shown in the example above.</p>
      <p>
        The Answer Prediction model identifies the target token index, &lt;Is,Ie&gt;, where Is and
Ie show the index of the start and end of a target token within a given source text S
relative to the learning objective LO. For the Answer Prediction model, we adopted
BERT, Bidirectional Encoder Representation from Transformers [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. In our
application, the learning objective (LO) and the source text (S) were combined as a single input
&lt;LO, S&gt; to the model. The vector representation computed by the BERT model is given
to two different classification models: the one for predicting the start index (Is) and the
other is for the end index (Ie) of the target token. The models may output &lt;Is=0, Ie=0&gt;,
indicating that the given source text is not suitable to generate a question for the given
learning objective. For the rest of the paper, we call source texts that have non-zero
indices (i.e., Is ≠ 0 and Ie ≠ 0) the target source texts, whereas others are referred to as
the non-target source texts (i.e., has the zero token index &lt;0, 0&gt;). The Answer
Prediction model was trained using training data that we created from existing online courses
at Open Learning Initiative† (OLI).
      </p>
      <p>
        The Question Conversion model generates a question for which the target token
becomes the answer, given a source text with the non-zero target token index. We use
QG-Net, a bidirectional-LSTM seq2seq model with attention and copy mechanisms
[
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. We used an existing, pre-trained QG-Net model that was trained on the SQuAD
datasets‡. We could train QG-Net using the OLI course data mentioned above.
However, the OLI courses we used for the current study do not contain a sufficient number
of verbatim questions—many of the questions are fill-in-the-blank and multiple-choice
questions hence not suitable to generate training data for QG-Net.
† https://oli.cmu.edu
‡ https://rajpurkar.github.io/SQuAD-explorer/
(a) A participant judged a target token was suitable, but the question was not suitable.
LO: Explain how the cellular organization of fused skeletal muscle cells allows
muscle tissue to contract properly.
      </p>
      <p>S: Myofibrils are connected to each other by intermediate, or desmin, filaments
that attach to the Z disc.</p>
      <p>Q: What is connected to each other?
(b) A participant judged both a target token and a question were suitable.
LO: Identify and discuss the functions of the large intestine and its structures.
S: The first part of the large intestine is the cecum, a small sac-like region that
is suspended inferior to the ileocecal valve.</p>
      <p>Q: What is the first part of the large intestine?
4</p>
    </sec>
    <sec id="sec-4">
      <title>Evaluation Study</title>
      <p>We investigated the following research questions: RQ1: How well does the Answer
Prediction model identify target tokens (including zero token indices) in a given source
text relative to a given learning objective? RQ2: How well does the pre-trained
QGNet generate questions for a given source text tagged with the target tokens?</p>
      <p>To answer these research questions, we conducted a survey on Amazon Mechanical
Turk (AMT). In AMT, the participants were shown triplets &lt;LO, S&lt;Is, Ie&gt;, Q&gt;. For
each of the triplets, the participants were asked if they agreed or disagreed with the
following two statements: (1) To create a question that helps attain the learning
objective LO, it is adequate to convert the sentence S into a question whose answer is the
token &lt;Is, Ie&gt; highlighted. (2) The question Q is suitable for attaining the learning
objectives LO. Each statement corresponds to each research question. The examples of
triples are shown in Table 1.</p>
      <p>Majority votes are used to consolidate the evaluation from participants. Table 2
summarizes the results for RQ1. The data showed that 49% (166/342) of the total
predictions about the target token index from the Answer Prediction model were accepted by
the participants. For the predictions with a non-zero target index, 88% (155/178) of the
predictions were accepted including tie. As for the non-target source text predictions
(i.e., the Answer Prediction model output the zero &lt;0,0&gt; index), only 41% (68/164)
were accepted. The participants considered 55% (90/164) of the predicted non-target
source texts to be target source texts. These results show that the Answer Prediction
model is rather conservative. When it outputs “positive” predictions (i.e., treating a
given source text as a target source text), 70% of such predictions are appropriate.
However, there is a large number of source texts that should have been predicted as a
target source text but missed. We argue that for the educational purposes, these results
are accepted and pragmatic.</p>
      <p>Table 3-a shows the results for the RQ2. The table shows that participants considered</p>
      <sec id="sec-4-1">
        <title>Accepted</title>
        <p>Tie
Not accepted
Nonsensical
Total
non-zero target index
&lt;Is ≠ 0, Ie ≠ 0&gt;
123 (70%)
32 (18%)
22 (12%)
1
178 (100%)
zero-index
&lt;Is=0, Ie=0&gt;
43 (26%)
25(15%)
90 (55%)
6 (4%)
164 (100%)</p>
      </sec>
      <sec id="sec-4-2">
        <title>Total</title>
        <p>166 (49%)
57 (17%)
112 (33%)
7 (2%)
342 (100%)
that 73% (130/178) of the questions generated by QG-Net were appropriate for
achieving the associated learning objective.</p>
        <p>Notice that the result shown above is influenced by the performance of Answer
Prediction model. To investigate the capability of QG-Net separately from the performance
of the Answer Prediction model, we analyzed the performance of QG-Net given only
the “appropriate” inputs (according to the survey participants). Table 3-b shows the
evaluation of the questions when QG-Net was given only those source texts that the
Answer Prediction model output an “appropriate” target token index according to the
survey participants. There, 123 source texts satisfied this condition, which means that
82% (100/123) of questions generated from “appropriate” source texts were considered
to be suitable for achieving the associated learning objective. This indicates that the
pre-trained QG-Net can generate a fair number of suitable questions for domains other
than the one it was originally trained. Using QG-Net as a building block for QUADL
is therefore an acceptable design option.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>We proposed QUADL for generating questions that are aligned with the given learning
objective. As far as we are aware, there have been no studies that aim to generate
questions that are suitable for attaining the learning objectives. The current study showed
that when the Answer Prediction model output a non-zero index for the target token,
88% of such predictions were accepted as good predictions by the study participants.
Though we admit that the performance should be improved, this is an encouraging
result showing the potential of the proposed model. The data also showed that the
majority of the participants believed that 55% of the target source texts that the Answer
Prediction model identified as not being useful for the learning objective were actually
useful for creating questions. Lowering the amount of “false negative” predictions is
certainly a crucial next step.</p>
      <p>One of the challenges of the current study was a cost for creating the training data.
To train the Answer Prediction model, each target source text paired with a learning
objective has to be annotated to indicate the target token. For the current study, we used
the existing courseware contents taken from OLI. When the training data were created,
target source texts were tagged using answers (extracted from assessment questions) by
exact match—i.e., a non-zero token index was assigned only when the target answer
appeared literally in the source text. Those source texts that included only a part of the
answer or contained synonymous words that were equally plausible as the original
answer were not tagged with appropriate token indices. The current study utilized a
survey on Amazon Mechanical Turk. Evaluating the effectiveness of generated questions
with real students in an authentic context is an important next step to be conducted.
Acknowledgement
The research reported here was supported by the National Science Foundation Grant
No. 2016966 to North Carolina State University.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Koedinger</surname>
            ,
            <given-names>K.R.</given-names>
          </string-name>
          , et al.
          <article-title>Learning is not a spectator sport: Doing is better than watching for learning from a MOOC</article-title>
          .
          <source>in Proceedings of the second</source>
          (
          <year>2015</year>
          )
          <article-title>ACM conference on learning@ scale</article-title>
          .
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Rivers</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          ,
          <article-title>Metacognition about practice testing: A review of learners' beliefs, monitoring, and control of test-enhanced learning</article-title>
          .
          <source>Educational Psychology Review</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Pan</surname>
          </string-name>
          , S.C. and
          <string-name>
            <surname>T.C.</surname>
          </string-name>
          <article-title>Rickard, Transfer of test-enhanced learning: Meta-analytic review and synthesis</article-title>
          .
          <source>Psychological Bulletin</source>
          ,
          <year>2018</year>
          .
          <volume>144</volume>
          (
          <issue>7</issue>
          ): p.
          <fpage>710</fpage>
          -
          <lpage>756</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kurdi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , et al.,
          <article-title>A Systematic Review of Automatic Question Generation for Educational Purposes</article-title>
          .
          <source>International Journal of Artificial Intelligence in Education</source>
          ,
          <year>2020</year>
          .
          <volume>30</volume>
          (
          <issue>1</issue>
          ): p.
          <fpage>121</fpage>
          -
          <lpage>204</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , et al.,
          <article-title>Recent advances in neural question generation</article-title>
          .
          <source>arXiv preprint arXiv:1905.08949</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , et al.
          <article-title>Improving neural question generation using answer separation</article-title>
          .
          <source>in Proceedings of the AAAI Conference on Artificial Intelligence</source>
          .
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Nema</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , et al.,
          <article-title>Let's Ask Again: Refine Network for Automatic Question Generation</article-title>
          . arXiv preprint arXiv:
          <year>1909</year>
          .05355,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , et al.,
          <article-title>Machine comprehension by text-to-text neural question generation</article-title>
          .
          <source>arXiv preprint arXiv:1705</source>
          .
          <year>02012</year>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Mazidi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Tarau</surname>
          </string-name>
          ,
          <article-title>Automatic question generation: from NLU to NLG</article-title>
          .
          <source>International Conference on Intelligent Tutoring Systems</source>
          ,
          <year>2016</year>
          : p. pp.
          <fpage>23</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Mitkov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>Computer-aided generation of multiple-choice tests</article-title>
          .
          <source>in Proceedings of the HLT-NAACL 03 workshop on Building educational applications using natural language processing</source>
          .
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Heilman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>N.A.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <article-title>Question generation via overgenerating transformations and ranking</article-title>
          .
          <source>2009</source>
          , Carnegie-Mellon
          <source>Univ Pittsburgh pa language technologies insT.</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , et al.,
          <article-title>Paragraph-level neural question generation with maxout pointer and gated self-attention networks</article-title>
          ,
          <source>in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Blanco</surname>
          </string-name>
          and W. Lu, Editors.
          <year>2018</year>
          . p.
          <fpage>3901</fpage>
          -
          <lpage>3910</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , et al.,
          <article-title>A multi-agent communication framework for question-worthy phrase extraction and question generation</article-title>
          ,
          <source>in Proceedings of the AAAI Conference on Artificial Intelligence</source>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Stone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.V.</given-names>
            <surname>Hentenryck</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z.-H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , Editors.
          <year>2019</year>
          . p.
          <fpage>7168</fpage>
          -
          <lpage>7175</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , et al.,
          <article-title>Leveraging context information for natural question generation, in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</article-title>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Liu</surname>
          </string-name>
          and T. Solorio, Editors.
          <year>2018</year>
          . p.
          <fpage>569</fpage>
          -
          <lpage>574</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , et al.,
          <article-title>Improving question generation with sentence-level semantic matching and answer position inferring</article-title>
          ,
          <source>in Proceedings of the AAAI Conference on Artificial Intelligence</source>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rossi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Conitzer</surname>
          </string-name>
          , and F. Sha, Editors.
          <year>2020</year>
          . p.
          <fpage>8464</fpage>
          -
          <lpage>8471</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , et al.,
          <article-title>Improving neural question generation using answer separation</article-title>
          ,
          <source>in Proceedings of the AAAI Conference on Artificial Intelligence</source>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Stone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.V.</given-names>
            <surname>Hentenryck</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z.-H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , Editors.
          <year>2019</year>
          . p.
          <fpage>6602</fpage>
          -
          <lpage>6609</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , et al.,
          <article-title>Learning to collaborate for question answering and asking, in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</article-title>
          , T.S. Fei Liu, Editor.
          <year>2018</year>
          . p.
          <fpage>1564</fpage>
          -
          <lpage>1574</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Rajpurkar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jia</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Know what you don't know: Unanswerable questions for SQuAD</article-title>
          . arXiv preprint arXiv:
          <year>1806</year>
          .03822,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Trischler</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , et al.,
          <string-name>
            <surname>Newsqa</surname>
            <given-names>:</given-names>
          </string-name>
          <article-title>A machine comprehension dataset</article-title>
          .
          <source>arXiv preprint arXiv:1611.09830</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Bajaj</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , et al.,
          <article-title>Ms marco: A human generated machine reading comprehension dataset</article-title>
          .
          <source>arXiv preprint arXiv:1611.09268</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , et al.
          <article-title>QG-net: a data-driven question generation model for educational content</article-title>
          .
          <source>in Proceedings of the Fifth Annual ACM Conference on Learning at Scale</source>
          .
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Willis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , et al.
          <article-title>Key phrase extraction for generating educational question-answer pairs</article-title>
          .
          <source>in Proceedings of the Sixth</source>
          (
          <year>2019</year>
          ) ACM Conference on Learning@ Scale.
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al.,
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>arXiv preprint arXiv:1810.04805</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>