<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic Evaluation of World History Essay Using Chronological and Geographical Measures</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kotaro Sakamoto</string-name>
          <email>sakamoto@forest.eis.ynu.ac.jp</email>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Akira Fujita</string-name>
          <email>fujita@ynu.ac.jp</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hideyuki Shibuki</string-name>
          <email>shib@forest.eis.ynu.ac.jp</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yoshinobu Kano</string-name>
          <email>kano@inf.shizuoka.ac.jp</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Madoka Ishioroshi</string-name>
          <email>ishioroshi@nii.ac.jp</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Teruko Mitamura</string-name>
          <email>teruko@cs.cmu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tatsunori Mori</string-name>
          <email>mori@forest.eis.ynu.ac.jp</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Noriko Kando</string-name>
          <email>kando@nii.ac.jp</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Carnegie Mellon University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Institute of Informatics</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>National Institute of Informatics</institution>
          ,
          <addr-line>SOKENDAI</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Shizuoka University</institution>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Yokohama National University</institution>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Yokohama National University, National Institute of Informatics</institution>
        </aff>
      </contrib-group>
      <fpage>20</fpage>
      <lpage>23</lpage>
      <abstract>
        <p>We propose a method for measuring chronological and geographical consistency of the world history essays in Japanese university entrance exams. e experimental result shows a weak positive correlation between the scores measured by the proposed method and the scores estimated by a human expert in world history.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Research on real-world complex question-answering (QA) has
ourished in recent years [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In the QA Lab tasks [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] at the NTCIR
workshop,1 the current problems and solutions in QA
technologies have been investigated using the world history questions in
Japanese university entrance exams and their English translation.
Japanese university entrance exams include various types of
questions such as multiple-choice, ll-in-the-blank, true-or-false and
essay questions. Above all, essay QA is the most challenging, and
still has many open problems, such as the evaluation of essays that
QA systems generated. Although there is a way of evaluation by
human experts in world history, it takes considerable time and cost.
In the case of the QA Lab, evaluation of 46 essays by an expert who
teaches world history took around a month and about 500,000 yen
(4,500 USD). erefore, a new method is required.
      </p>
      <p>
        Because essay generation is regarded as a kind of query-biased
summarization, the measures for evaluating summaries using
goldstandard data can be applied to essay evaluation. In the QA Lab,
the ROUGE family [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and the Pyramid method [
        <xref ref-type="bibr" rid="ref7 ref9">7, 9</xref>
        ] are used for
grading essays besides a human expert’s evaluation. A positive
correlation between these grades and those provided by humans
was between moderate and weak, and the ranking order by the
measures was not always concordant with the ranking order given
by the human marks. erefore, we investigated more appropriate
measures for evaluating world history essays in Japanese university
entrance exams.
      </p>
      <p>For evaluating summaries, the linguistic well-formedness and
the relative responsiveness were used in the DUC workshops.2 e
content, readability/uency, and the overall responsiveness were
used at the Guided Summarization tasks3 in the TAC workshops.
ese measures are important for evaluating world history essays in
university entrance exams. However, the linguistic well-formedness
and readability/uency were scored arbitrarily by human assessors,
while the content was methodologically scored by the ROUGE
family and the Pyramid method, among others. We would like
to methodologically give other scores based on merits other than
the content. For evaluating world history essays, chronological
and geographical consistency is important as a kind of semantic
consistency. However, how to evaluate these is not obvious. In
this paper, we propose a method for measuring chronological and
geographical consistency of world history essays, and examined
the method using essays submied to the QA Lab.</p>
      <p>e main contributions of this paper are as follows: (i) to clarify
the features of well-formed world history essays in terms of the
chronological information and the geographical information, (ii) to
introduce a new scoring method based on the features to evaluate
the well-formedness of world history essays.
2</p>
      <p>
        RELATED WORK
e linguistic well-formedness in the DUC workshop and the
readability/uency in the TAC Guided Summarization tasks were
evaluated in terms of grammaticality, non-redundancy, referential clarity,
focus, and ‘structure and coherence’. Our measures are relative to
the focus and ‘structure and coherence’. Although Barzilay et al.
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and Okazaki et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] researched the chronological ordering,
they did not take account of geographical information. Buscaldi
et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] found that geography is related to semantic similarity,
but they only aimed to measure semantic equivalence between two
2hp://duc.nist.gov/duc2007/tasks.html
3hp://www.nist.gov/tac/2011/Summarization/Guided-Summ.2011.guidelines.html
text snippets. Because Madanani et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] only researched
sentence ordering, the research only applied to the context of a short,
domain-independent summarization. Bauer and Teufe [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] proposed
the extended Pyramid method for timeline summarization, but they
did not focus on the well-formedness. Although Wagner et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
researched the well-formedness, they focused only on grammatical
errors. erefore, there is no research on a methodology for
measuring the focus and the structure and coherence of world history
essays in terms of the chronological and geographical information.
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>ESSAY QUESTION OF WORLD HISTORY</title>
    </sec>
    <sec id="sec-3">
      <title>WELL-FORMED WORLD HISTORY ESSAY 4 4.1</title>
    </sec>
    <sec id="sec-4">
      <title>Structure</title>
      <p>In general, a world history essay is a sequential description of
historical events (HEs). A HE has both chronological information
and geographical information. Let us consider how this is wrien.
While the chronological information can be easily put in a linear
order from the past to the future, the geographical information is
not easy to be determinately put in a linear order because of the
spatial extent. Based on the study of several model answer essays
from past university entrance exam collections, the general
structure of the essays follows one of two approaches: (a) disregarding
geographical information, all HEs are described in chronological
order, and (b) grouping HEs by the geographical information. In
both, information is described in chronological order. If the former
is regarded to be grouped by geographical information from “the
whole world,” there is no dierence between the two manners; that
is, both are descriptions in chronological order for HEs in a
particular area. We dened a sequence of HEs with the same geographical
information as a geographical section (GS). GSs could be nested
hierarchically. For example, a GS of Europe may contain GSs such as
England, France, and Germany, and the GS of England may contain
GSs such as London, Birmingham and Manchester.</p>
      <p>From the above, we built the following hypotheses for the
structure of world history essay.</p>
      <p>(H1) An essay is a GS.
(H2) A GS can consist of more than one sub-GSs that is in the
parent GS.
(H3) HEs in a GS are put in chronological order.</p>
      <p>GSs in a hierarchical structure are classied into terminal and
non-terminal sections. A terminal section means an HE sequence
without hierarchical structure, and likewise a non-terminal section
can be divided into several GSs. We dened a non-terminal section
corresponding to the essay as the root section. A GS s is dened
as a paired HE sequence E = ¹e1; e2; ; em º and GS sequence
SS = ¹s1; s2; ; sn º. If SS is an empty tuple, then the GS is a
terminal section. HEs in a sub-GS are shared with the superordinate
GS, and E of non-terminal sections are not empty. For a question,
the chronological condition CC is dened as a pair of the beginning
time bt and the ending time et , and the geographical condition GC
is dened as a geographical entities set fg1; g2; ; gk g.
4.2</p>
    </sec>
    <sec id="sec-5">
      <title>Uniformity</title>
      <p>Let us consider the uniformity of GSs in a GS. If GSs of the East
Midlands, Paris and German are placed on the same level in a GS
of Europe, they are incongruous even though they are all parts of
Europe. is is because they are in dierent levels of a geographical
category, such as country, region, and city. erefore, well-formed
essay require the uniformity of geographical category level. In
addition, if England is described with hundreds of words while
France and Germany are respectively described with a dozen words,
there is incongruity even though they are in the same geographical
category level. is is because their quantities of description are
imbalanced. erefore, well-formed essay seems to require the
uniformity of quantity.</p>
      <p>We built the following hypotheses for the uniformity of GSs.
(H4) GSs placed on the same level in a GS are in the same level
of geographical category.
(H5) GSs placed on the same level in a GS are described in the
same quantity.</p>
      <p>Although several functions implementing the hypotheses were
come up with, we took simple functions by way of experiment.
e geographical uniformity scGU ¹º and the quantity uniformity
scQU ¹º are calculated by the following functions:
scGU ¹SSº
=
1
sdGU ¹Sº
amGU ¹SSº
(1)
sdGU ¹SSº
amGU ¹SSº
scQU ¹SSº
p¹s; SSº
=
=
=
=
¹depth¹si º</p>
      <p>amGU ¹SSºº2
v
tu 1 jSS j</p>
      <p>Õ
jSS j i=1
1 jSS j</p>
      <p>Õ
jSS j i=1</p>
      <p>depth¹si º
ÍijS=S1 j p¹si ; SSº log2 p¹si ; SSº</p>
      <p>
        log2 jSS j
length¹sº
ÍijS=S1 j length¹si º
(2)
(3)
(4)
(5)
where depth¹sº is a function to return the distance between the
thesaurus root node and the node corresponding to the range of s.
length¹sº is a function to return the number of characters described
in s. We designed the scoring functions to be normalized into the
range [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ].
4.3
      </p>
    </sec>
    <sec id="sec-6">
      <title>Ordering</title>
      <p>Let us consider the ordering of HEs in a GS. HEs in well-formed
essays are generally described in chronological order. Note that
the occurrence order of HEs does not always correspond with the
descriptive order of an essay. Since the chronological information
of an HE has a beginning and ending in a range, the occurrence
order relation between HEs is either non-overlapping, partially
overlapping or inclusive.In all relations, the beginning of the HE
e1 precedes the beginning of the HE e2. However, in the
inclusion relation, e1 may be described aer e2 such as “e Treaty of
Nanking ended the First Opium War.” erefore, we assume that
the describing order of HEs in the inclusion relation is free to the
chronological order. Next, let us consider the ordering of GSs in a
GS. e describing order of GSs is free relative to the chronological
order. However, for example, the describing order of Athens, Rome,
Cairo, Baghdad, Beijing and Shanghai seems to be beer than the
order of Athens, Baghdad, Beijing, Cairo, Rome and Shanghai. is
is because GSs relating to each other are placed closely. We assume
that the relativity is approximated by the geographical distance.</p>
      <p>We built the following hypotheses for the ordering in a GS.
(H6) As an exception to the hypotheses (H3), an HE can be
described both before and aer another HE if they are in
the inclusion relation.
(H7) GSs in a GS are described in the order of short geographical
distance.
e hypothesis (H6) is the complement of the hypothesis (H3).</p>
      <p>e chronological ordering scCO ¹º and the geographical ordering
scGO ¹º are calculated by the following functions:
scCO ¹Eº
scGO ¹Eº
=
=</p>
      <p>K L
K + L</p>
      <p>1
geochange¹Eº + 1
jE j
1
1
jE Õj1
geochange¹Eº =</p>
      <p>distance¹range¹ei º; range¹ei+1ºº(8)
i=1
where K is the number of concordant pairs of HEs in E, and L is
the number of discordant pairs. range¹eº is a function to return
a thesaurus node that is the nearest common node subsuming all
geographical entities included in the HE e, and distance¹ni ; nj º is
(6)
(7)
a function to return the shortest distance between the thesaurus
nodes ni and nj .
4.4</p>
    </sec>
    <sec id="sec-7">
      <title>Cooperability</title>
      <p>Let us consider the cooperability of a world history essay to
question constraints in terms of the chronological and the geographical
information. As described in Section 3, world history essay
questions give chronological and geographical conditions such as “up to
and including the rst half of the 18th century” and “West Europe,
West Asia and East Asia.” In this case, if an essay describes only the
ancient histories of West Europe, West Asia and East Asia, the essay
satises the conditions logically. However, it does not reect the
question intention. Cooperative essay should describe at least one
HE of the 18th century. e geographical information is also similar.
For example, an essay describing only “West Europe and West Asia”
violates the maxim of quantity, and the cooperative essay should
describe at least one HE for each area of the geographical
condition. We assume that the chronological cooperability is observed
in all GSs while the geographical cooperability is observed in only
a GS corresponding to the essay. For a GS, we dened a period
from the beginning of the earliest HE to the end of the latest one
as a period of the GS. e smallest geographical range, including
where the HEs in a GS occurred, was dened as the range of the
GS. We assume that the observance of the maxim of quantity is
approximated to the coverage of the period and the range of GSs.</p>
      <p>We built the following hypotheses for the cooperability on the
chronological and the geographical conditions in questions.
(H8) A period of a GS covers the period of the chronological
condition as justly as possible.
(H9) A range of a GS corresponding to the essay covers the
range of the geographical condition as justly as possible.
e chronological cooperability scCC ¹º and the geographical
cooperability scGC ¹º are calculated by the following functions:
scCC ¹E; CCº = oevxetrelnadp¹¹ppeerriioodd¹¹EEºº;;CCCCºº (9)
scGC ¹E; GCº = 2P ¹E; GCºR¹E; GCº (10)</p>
      <p>P ¹E; GCº + R¹E; GCº
P ¹E; GCº = subsumed¹geoentities¹Eº; GCº (11)
jgeoentities¹Eºj
R¹E; GCº = subsuming¹geoentities¹Eº; GCº (12)
jGC j
where period¹Eº is a function to return a pair of the earliest time
and the latest time in E, overlap¹P1; P2º is a function to return the
length of the overlap period between P1 and P2, and extend¹P1; P2º
is a function to return the length of the period between the
earliest time and the latest time among P1 and P2. geoentities¹Eº is
a function that returns a set of geographical entities included in
E, subsumed¹G1; G2º is a function that returns the number of
geographical entities of G1 subsumed by geographical entities of G2,
and subsuming¹G1; G2º is a function that returns the number of
geographical entities of G2 subsuming geographical entities of G1.
5</p>
    </sec>
    <sec id="sec-8">
      <title>PROPOSED METHOD</title>
      <p>named entities evoke the chronological and/or the geographical
information. Because exam cram books cover such information, we
constructed a database of world history terms based on the world
history glossary published by Yamakawa Shuppan-sha.4 Using the
database, the named entities are converted into chronological and
geographical information. Using both chronological and
geographical information sets, the period and the range of the segment are
respectively determined in the same way as that of the GS described
in 4.4. ey are regarded as the chronological and geographical
information of the HE. en, all hierarchical structures of GSs that
can be goen from the essay are listed. Aer scoring the HEs
for each hierarchical structure, the maximum score is selected as
the nal score for the essay in order to select the most plausible
hierarchical structure.</p>
      <p>Based on the hypotheses described in Section 4, the score sc
for a GS to a question is recursively calculated by the following
functions.</p>
      <p>sc¹E; SS; CC; GCº</p>
      <p>scT ¹E; CCº
scN ¹E; SS; CCº
=
=
=
jS S j
Õ
8 scT ¹E; CCº
&gt;
&gt;&gt;&gt; if it is a terminal section
&gt;
&gt;
&lt;&gt;&gt; scN ¹E; SS; CCºscGC ¹E; GCº</p>
      <p>if it is the root section
jSS j
&gt;
&gt;
&gt;&gt; scN ¹E; SS; CCº
&gt;
&gt;&gt;&gt; otherwise
:
scCO ¹EºscGO ¹EºscCC ¹E; CCº
1
scGU ¹SS ºscQU ¹SS º
(13)
(14)
sc¹events¹si º; sections¹si º; CC; GCº (15)
an HE sequence and a GS sequence included in a GS s.
where events¹sº and sections¹sº are respectively functions to return
6</p>
    </sec>
    <sec id="sec-9">
      <title>EXPERIMENTAL RESULT</title>
      <p>Using essays submied to the QA Lab-2 and the QA Lab-3, we
compared the scores measured by the proposed method and the
scores evaluated by human expert. Although the number of the
essays is 55, they are annotated with the marks granted and taken
4hp://www.yamakawa.co.jp/ (in Japanese)
away besides the total score by a human expert. Basically the marks
awarded take account of the correctness of the content, and the
marks lost account for the ill-formedness. With this, we compared
the scores to the method behind subtracting marks. Note that the
lost marks are caused by not only chronological and geographical
inconsistencies. Figure 3 shows the scaer plot between the scores
by our method and the subtracted marks. e correlation coecient
was 0.21, which indicated a weak positive correlation. Taking into
account that the marks subtracted include other causes than the
chronological and geographical problems, the value seems to be
fairly good.
7</p>
    </sec>
    <sec id="sec-10">
      <title>CONCLUSION</title>
      <p>For world history essays in Japanese university entrance exams,
we proposed a method for measuring the uniformity, ordering and
cooperability in terms of the chronological and the geographical
information. e features of well-formedness are found by
observing several model answer essays. From the experimental result, we
found a weak positive correlation between the scores measured by
our method and the scores estimated by a human expert. We will
investigate more appropriate functions in the future.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Eugene</given-names>
            <surname>Agichtein</surname>
          </string-name>
          , David Carmel,
          <string-name>
            <given-names>Donna</given-names>
            <surname>Harman</surname>
          </string-name>
          , Dan Pelleg, and
          <string-name>
            <given-names>Yuval</given-names>
            <surname>Pinter</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Overview of the TREC 2015 LiveQA Track</article-title>
          . In Proceedings of e TwentyFourth Text REtrieval Conference.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Regina</given-names>
            <surname>Barzilay</surname>
          </string-name>
          , Noemie Elhadad, and
          <string-name>
            <surname>Kathleen R. McKeown</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Inferring Strategies for Sentence Ordering in Multidocument News Summarization</article-title>
          .
          <source>Journal of Articial Intelligence Research</source>
          <volume>17</volume>
          ,
          <issue>1</issue>
          (
          <year>2002</year>
          ),
          <fpage>35</fpage>
          -
          <lpage>55</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Sandro</given-names>
            <surname>Bauer</surname>
          </string-name>
          and
          <string-name>
            <given-names>Simone</given-names>
            <surname>Teufe</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Improving Chronological Sentence Ordering by Precedence Relation</article-title>
          .
          <source>In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing</source>
          , Vol.
          <volume>2</volume>
          .
          <fpage>834</fpage>
          -
          <lpage>839</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Davide</given-names>
            <surname>Buscaldi</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jorge J. Garcia</surname>
            <given-names>Flores</given-names>
          </string-name>
          , Joseph Le Roux, and
          <string-name>
            <given-names>Nadi</given-names>
            <surname>Tomeh</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>LIPN: Introducing a new Geographical Context Similarity Measure and a Statistical Similarity Measure Based on the Bhaacharyya Coecient</article-title>
          .
          <source>In Proceedings of the 8th International Workshop on Semantic Evaluation</source>
          .
          <fpage>400</fpage>
          -
          <lpage>405</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Chin-Yew Lin</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>ROUGE: A Package for Automatic Evaluation of Summaries</article-title>
          .
          <source>In Proceedings of Workshop on Text Summarization Branches Out</source>
          .
          <fpage>74</fpage>
          -
          <lpage>81</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Nitin</given-names>
            <surname>Madnani</surname>
          </string-name>
          , Rebecca Passonneau, Necip Fazil Ayan, John M. Conroy,
          <string-name>
            <given-names>Bonnie J.</given-names>
            <surname>Dorr</surname>
          </string-name>
          , Judith L. Klavans,
          <string-name>
            <surname>Dianne P. O'Leary</surname>
            , and
            <given-names>Judith D.</given-names>
          </string-name>
          <string-name>
            <surname>Schlesinger</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Measuring Variability in Sentence Ordering for News Summarization</article-title>
          .
          <source>In Proceedings of the Eleventh European Workshop on Natural Language Generation</source>
          .
          <fpage>81</fpage>
          -
          <lpage>88</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Ani</given-names>
            <surname>Nenkova</surname>
          </string-name>
          and
          <string-name>
            <given-names>Rebecca J.</given-names>
            <surname>Passonneau</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Evaluating Content Selection in Summarization: e Pyramid Method</article-title>
          .
          <source>In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics</source>
          .
          <fpage>145</fpage>
          -
          <lpage>152</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Naoaki</given-names>
            <surname>Okazaki</surname>
          </string-name>
          , Yutaka Matsuo, and
          <string-name>
            <given-names>Mitsuru</given-names>
            <surname>Ishizuka</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Improving Chronological Sentence Ordering by Precedence Relation</article-title>
          .
          <source>In Proceedings of the 20th International Conference on Computational Linguistics</source>
          .
          <fpage>81</fpage>
          -
          <lpage>88</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Rebecca</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Passonneau</surname>
            , Emily Chen, Weiwei Guo, and
            <given-names>Dolores</given-names>
          </string-name>
          <string-name>
            <surname>Perin</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Automated Pyramid Scoring of Summaries using Distributional Semantics</article-title>
          .
          <source>In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics</source>
          .
          <fpage>143</fpage>
          -
          <lpage>147</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Hideyuki</surname>
            <given-names>Shibuki</given-names>
          </string-name>
          , Kotaro Sakamoto, Madoka Ishioroshi, Akira Fujita, Yoshinobu Kano, Teruko Mitamura, Tatsunori Mori, and
          <string-name>
            <given-names>Noriko</given-names>
            <surname>Kando</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Overview of the NTCIR-</article-title>
          13
          <source>QA Lab-3 Task. In Proceedings of e NTCIR-13 Conference.</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Joachim</surname>
            <given-names>Wagner</given-names>
          </string-name>
          , Jennifer Foster, and Josef van Genabith.
          <year>2007</year>
          .
          <article-title>A Comparative Evaluation of Deep and Shallow Approaches to the Automatic Detection of Common Grammatical Errors</article-title>
          .
          <source>In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning</source>
          .
          <fpage>112</fpage>
          -
          <lpage>121</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>