<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Question Answering System for Entrance Exams in QA4MRE</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>National Institute of Informatics</institution>
          ,
          <addr-line>Tokyo</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes our question answering system for Entrance Exams, which is a pilot task of the Question Answering for Machine Reading Evaluation at Conference and Labs of the Evaluation Forum (CLEF) 2013. We conducted experiments in which participants were provided with documents and multiple-choice questions. Their goals was to select one answer or leave it unanswered for each question. In our system, we developed a component to detect all story characters in the documents and tag all personal pronouns using coreference resolution. For each question, we extracted related sentences and combined them with candidate answers to create inputs for a Recognizing Textual Entailment (RTE) component. The answers were then selected based on the con dence scores from the Recognizing Textual Entailment component. We submitted ve runs in the task and the run that ranked highest obtained a c@1 score of 0.35, which outperformed the baseline c@1 score of 0.25.</p>
      </abstract>
      <kwd-group>
        <kwd>Question Answering Systems</kwd>
        <kwd>Coreference Resolution</kwd>
        <kwd>Recognizing Textual Entailment</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Question Answering for Machine Reading Evaluation (QA4MRE) is a lab that
has been offered in Conference and Labs of the Evaluation Forum (CLEF) since
2011. It is an exercise to develop a methodology for evaluating machine reading
systems through Question Answering and Reading Comprehension Tests [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
In the lab, participants are provided with several documents and questions, the
answer for each question is to be stated or implied in the document. The goal for
participants is to develop a system to extract corresponding knowledge from the
documents, thereby solving questions with them. The main task is composed of
four topics, Aids, Climate Change, Music and Society, and Alzheimer. Besides
the main task, QA4MRE at CLEF 2013 also offers two pilot tasks; Machine
Reading of Biomedical Texts about Alzheimer's Disease and Entrance Exams.
Our discussion mainly focuses on our participation in Entrance Exams.
      </p>
      <p>
        Entrances Exams is a new task for evaluating machine reading systems by
solving problems from Japanese university entrance exams. Similar tasks using
Japanese entrance exams were also held in NTCIR RITE [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The challenge for
this task is to test systems in the same situation in which high school students are
evaluated. While a diverse range of background knowledge, such as Wikipedia
entries, are available to all participants in other tasks, no external sources are
provided in this task; therefore, systems are expected to make use of a high-school
level of common sense. Only reading comprehension exercises were included in
QA4MRE at CLEF 2013. Other types of exercises will be used in the future.
      </p>
      <p>The rest of this paper is composed as follows: Section 2 describes the details
of system's architecture. Section3 presents the results of our participation and
their evaluations, and Section 4 concludes the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>System Architecture</title>
      <p>
        In this section, we describe the detailed architecture of our system in the
Entrance Exams task. Although the task was offered for the rst time in the lab,
the basic approaches do not signi cantly differ from other tasks. We referred to
studies attending other tasks in previous QA4MRE tasks to develop our system
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Our system consists of three components, as shown in Figure 1.
      </p>
      <p>
        The rst component is the Character Resolver, which detects all story
characters appearing in the documents and applies coreference resolution to personal
pronouns. The second component is the Sentence Extractor. Questions are
classi ed into several types in this component then related sentences are extracted
for each question from the document. The last component is Recognizing
Textual Entailment (RTE), in which we calculate the most likely answer for each
question. The following subsections describe the details of these three
components. Besides these three main components, a component was also developed
to process documents and questions with a parser [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Since this is just a simple
preprocessor, we do not discuss it in this paper.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Character Resolver</title>
        <p>Given an actual exam-data from QA4MRE, we found the documents for entrance
examinations were mainly composed of stories and related questions. We also
found that most of the questions were connected with actions of story characters
in these documents. To answer each question correctly, therefore, it is important
to detect which character is responsible for the focused upon action. The
Character Resolver was developed to detect all story characters and mentions of them
in the documents, including coreferential mentions. To achieve this, we divided
the task into two small processes as shown in the following pseudo code. The
rst one is for detecting all nouns for characters and to merge them into groups
if their mentioned characters are identical. In the second process, we unify
personal pronouns into the same person groups.</p>
        <sec id="sec-2-1-1">
          <title>Pseudo Code for Character Resolver</title>
          <p>Initialize personDict to an empty hash table
Initialize personCount to 0
Initialize personGroups to a empty list
Process 1 :</p>
          <p>For each sentence in documents and questions</p>
          <p>For each word in sentence</p>
          <p>If word is detected by Name Entity Recognizer or
If word matches prepared list</p>
          <p>If word is in personDict</p>
          <p>wordID := personDict [word]
Else
personCount := personCount + 1
personDict[word] := personCount
wordID := personCount
Create a newGroup for wordID</p>
          <p>Add the newGroup to personGroups
For each personGroup1, personGroup2 in personGroups</p>
          <p>If personGroup1 and personGroup2 contain the same person
Merge personGroup1 and personGroup2
Process 2 :</p>
          <p>For each sentence in documents, questions</p>
          <p>For each word in sentence</p>
          <p>If word is a personal pronoun
wordID := coreference_resolution (word)
Find personGroup1 containing wordID</p>
          <p>Add word to personGroup1</p>
          <p>
            In Process 1, we combine a prepared words list with the Name Entity
Recognizer from Stanford [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]. First, the documents are divided into sentences and
each sentence is divided into words. Then we use the Name Entity Recognizer
to nd proper names in the documents. Since the Name Entity Recognizer only
detects nouns that seem to be names, it fails to detect common nouns that
indicate general characters such as \teachers" and \mother". To address this
problem, we prepare a list consisting of character reference nouns and search the
documents to determine whether these words appear. Each word or name in the
text is then assigned a person ID for ease of management. In most cases, several
detected nouns might denote the same person, which would cause bad in uence
in the Recognizing Textual Entailment component. Therefore, we also clustered
some words with the same meaning such as \father" and \dad". For example,
\mother" and \mom" in the following text are tagged as the same person.
          </p>
          <p>Her &lt;coref id=\2"&gt; mother &lt;/coref&gt; must have heard the front door
close. &lt;coref id=\1"&gt; Christine &lt;/coref&gt; went in and sat on the sofa.
\How was your exam, dear?", her &lt;coref id=\2"&gt; mom &lt;/coref&gt; asked.</p>
          <p>After character detection, we classify personal pronouns in the documents
and questions. We develop a tool for this task by just following simple rules
such as plural and male/female. The most useful feature of this tool for this task
is that we can to some extent successfully tag words like \I" and \you" from
conversations in the text. General coreference resolution tools do not perform
well in this respect, but this feature is important in the Entrance Exams task
because conversations might occupy a large percentage of the text and it would
be useful to tag these pronoun words. After this application, we generate a tagged
text like the following conversation.</p>
          <p>The &lt;coref id=\3"&gt; professor &lt;/coref&gt; looked at &lt;coref id=\2"&gt; me
&lt;/coref&gt; and smiled. \Oh! &lt;coref id=\2"&gt; You &lt;/coref&gt; speak Japanese
very well."</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Sentence Extractor</title>
        <p>Since the nal goal with our system is to select the correct answer based on the
output from the Recognizing Textual Entailment component. Both Text and
Hypothesis should be generated as input to it. The \Text entails Hypothesis" means
that a person would determine the Hypothesis as almost correct after reading the
Text. In this component, we extract related sentences from documents. This is
achieved by giving sentences relevant scores for each question. Sentences ranked
highest are those we should extract.</p>
        <p>Before the extraction process, all questions are classi ed into six types based
on the interrogative as the following table shows. The classi cation indicates
which sentences we should extract. We can prepare a list of keywords for each
question type. The sentences that contain these expressions are more likely the
sentences we need. For instance, related sentences corresponding to \Why"
questions could possibly include a word such as \because". Preparing all expressions
is impossible, so we mainly depend on the Name Entity Recognizer to nd
expressions for LOCATION, PERSON and DATE. We also does not prepare a list
for questions such as \What" questions, because it is too vague to prepare a list
of relevant words.</p>
        <p>For each keyword we add ve point to the relevant score of sentences
containing them. Then, we measure the similarity between questions and sentences.
First, each word in documents and questions is assigned a tf-idf weight which we
calculated from the Wikipedia corpus in advance. Second, the following features
are combined to add as the relevant score.</p>
        <p>a Word similarity. For each sentence, we calculate its similarity with a
question. This similarity is determined by word similarity using WordNet.</p>
        <p>
          This feature would add at most 20 points to each sentence.
b Dependency similarity. In the preprocess for documents and questions,
dependencies are also generated by the parser [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. One point is added as
the relevant score if the same dependency appears in both a sentence and a
question.
c Character reference. Every story character in the documents is assigned
an ID number in the previous component. We add ve points to sentences if
the same person is referred.
        </p>
        <p>With scores computed, all sentences are ranked for each question and the
sentence that has the highest score is extracted. Except for the highest one, the
sentences surrounding the highest one might also contain important information
for the question. Therefore, we extracted ve sentences in our experiments, the
highest one and four sentences surrounding it. These sentences were used as the
Text in our Recognizing Textual Entailment system.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Recognizing Textual Entailment</title>
        <p>Our Recognizing Textual Entailment component uses semantically annotated
dependency parses of sentences as logical representations, utilizing an inference
engine to perform logical inferences on such representation. The knowledge it
uses in the inference process is obtained from synonym/antonym/hypernym
relations in WordNet. Furthermore, it also uses an abduction component to generate
alignments between small pieces of the Text/Hypothesis pair (T/H pair), which
corresponds to may-be-missing knowledge that can make the logical inference
process go further. These alignments are selected and evaluated by some rough
similarity measure, for which we chose the distributional similarity calculated
from the Google n-gram corpus. A nal score for each T/H pair is given by a
classi er which uses the evaluation of alignments and their contribution to the
inference process as features, together with shallow ones such as word overlap.
The classi er is trained on the PASCAL RTE dataset.</p>
        <p>With the Text obtained in the previous component, we generated four T/H
pairs for each question. The Hypothesis can be extracted easily from the
candidate answers. These pairs are inputs to our Recognizing Textual Entailment
component. We selected the highest pair as the output for our system.
3
3.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results and Evaluations</title>
      <sec id="sec-3-1">
        <title>Test Set and Evaluation Measure</title>
        <p>The test data for evaluation in the Entrance Exams shared task were taken
from the Japanese university entrance examination. Nine test documents were
provided in the task. Eight contained ve questions each, and the other contained
six questions. Four candidate choices were shown for each question. The following
is an example question with its four answer options.</p>
        <p>{ Question. Where did the author's mother sit when one of her children was
away?
{ Answer 1. She didn't change her chair.
{ Answer 2. She moved her own chair next to Dad's.
{ Answer 3. She moved to an empty chair on the side.
{ Answer 4. She sat opposite to Dad.</p>
        <p>The task evaluates each participating system by giving them a score between
0 and 1. The measure is called c@1 and was used in previous QA4MRE tasks.
Systems might obtain higher scores if they leave questions unanswered when
they may possibly be wrong. The measure is de ned as follows.
n1 (nr + nu nnr )
(1)
where:
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Results</title>
        <p>{ n: the total number of questions
{ nr: the number of correctly answered questions
{ nu: the number of unanswered questions
We submitted ve runs, NII1 to NII5, in the task. NII1, NII3 and NII5 answered
all given questions, while NII2 and NII4 left several questions unanswered based
on the score from the Recognizing Textual Entailment component. NII1 and
NII2 generated the Text by using the Sentence Extractor, but NII3 and NII4
generated them just by using all the documents. NII5 combined results from
NII1 and NII3. The evaluation results are listed in Table 2.</p>
        <p>From this table, NII3 had the highest score. It answered all 46 questions and
16 of them were answered correctly. Except for NII3 and NII5, other runs had
scores similar to the random baseline, which is a 0.25 C@1 score. Besides the
main C@1 score, we also applied McNemar's test to nd whether the difference
between baseline and NII3 or NII5 runs would be regarded as signi cant enough,
but we found the p-value was still not sufficient in this case. We expected NII1 or
NII5 to have the best scores before the evaluation since we expected the Sentence
Extractor to remove noise in the documents and contribute to accuracy of the
Recognizing Textual Entailment component. It turned out that the Sentence
Extractor component became bottlenecked in NII1. Compared with NII1, NII3
acquired detailed information from the entire document, which resulted in a
good c@1 score. We analyze error reasons for the Sentence Extractor in the next
subsection.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Error Analysis</title>
        <p>We carried out an error analysis for our system. We described in the previous
section that our system architecture consists of three components, and each one
possibly affected the nal results. The Character Resolver performed well
because most mentions for story characters seemed to have been tagged correctly.
The main reason is that there were few story characters in the documents, so
it was easy to tag them even just based on simple features. The major causes
for errors in this task came from the Sentence Extractor component and
Recognizing Textual Entailment component. Two kinds of errors need to be taken
into account regarding the Sentence Extractor. The rst type is that the
component extracted wrong sentences because there was not enough information in
the question. One example is shown as follows.</p>
        <p>{ Question: What was the purpose of the wooden cat?
{ Extracted sentences: When the touch was repeated a moment later, she
whispered to her husband about it , but he only replied that people do not
bother you at the opera on purpose. (and other four sentences)
{ Correct sentences: Thieves had stolen the jewels and were going to pass
them over to a woman. In order to identify her, they placed the cat in the
booth and told her to pick it up.</p>
        <p>In the example, very few words are contained in the question. As stated in
the previous section, our approach extracts sentences mainly depending on the
similarity between sentences and questions, so we give the extracted sentences
a very high score if they contain the same words appearing in the question, i.e.,
\purpose". In this case, the correct sentences also contains the same word as in
the question, i.e., \cat". Since we give each word a tf-idf weight, a common word
like cat contributes very little here. Half of the Sentence Extractor errors belong
to this type. If we can deal with abstract words such as \purpose", we might
obtain better results.</p>
        <p>The other error type of the Sentence Extractor is the extracted sentences
were not sufficient to answer the question, since we restricted the component
to only extract ve sentences. This error occurs when the number of relevant
sentences is more than ve sentences. An example of such cases is shown below.
{ Question: What happened when the author used a cash machine?
{ Extracted sentences: When I rst tried to use a cash machine in a bank,</p>
        <p>I had an unpleasant experience. (and four other sentences)
{ Correct sentences: When I rst tried to use a cash machine in a bank, I
had an unpleasant experience. (and 7 other sentences)</p>
        <p>For this example, our system extracted some of the necessary sentences, but
the questions asked about the information which corresponded to a larger range.
We failed to acquire all necessary information in similar cases, which might have
reduced the accuracy for the Recognizing Textual Entailment component. These
two reasons degraded the performance of our NII1 and NII2.</p>
        <p>
          The error for the Recognizing Textual Entailment component is a common
problem. Even in the recent workshop [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], a state-of-the-art system could not
solve difficult questions. As described in the introduction section, the Entrance
Exams task requires a high level of common sense, which high school students
are expected to have, but our component has not reached such a level. The
following is an example.
        </p>
        <p>{ Text: Our plane would have to y hundreds of miles out of our way to get
around it. If we ew through the cloud, the engines might get full of ash and
stop.
{ Hypothesis:
safety.</p>
        <sec id="sec-3-3-1">
          <title>The pilot had to y off the regular course for the sake of</title>
          <p>A human would easily come to the conclusion that the hypothesis is correct,
but when it comes to the Recognizing Textual Entailment component, it seems
that it is difficult to solve this kind of problems. This also reduced the accuracy
of our system.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>We described our system that was used in the Entrance Exams task of QA4MRE
CLEF 2013. Our system consists of three components, Character Resolver,
Sentence Extractor and Recognizing Textual Entailment. In our developed system,
the documents are processed by the Character Resolver to tag each story
character an ID. The Sentence Extractor then extracts related sentences for each
question and creates a Hypothesis and Text. Finally it inputs this T/H pair into
the Recognizing Textual Entailment system to select an answer.</p>
      <p>The best run of our system was NII3, which obtained 0.35 in c@1 score. This
run used the entire document text as a Text, which helped to collect useful
information for our Recognizing Textual Entailment component. The errors resulting
from the Sentence Extractor and Recognizing Textual Entailment components
negatively affected the accuracy of our system. Mitigating the limitations in
these two components will be our future work.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Pen~as,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Hovy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Forner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Rodrigo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Sutcliffe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Forascu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            , &amp;
            <surname>Sporleder</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <source>Overview of QA4MRE at CLEF</source>
          <year>2011</year>
          :
          <article-title>Question answering for machine reading evaluation</article-title>
          .
          <source>Working Notes of CLEF</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bhaskar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pakray</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banerjee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banerjee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bandyopadhyay</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Gelbukh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2012</year>
          ,
          <article-title>September)</article-title>
          .
          <article-title>Question answering system for qa4mre@ clef 2012</article-title>
          . In CLEF (Online Working Notes/Labs/Workshop).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Iftene</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , G^nsca,
          <string-name>
            <given-names>A-L.</given-names>
            ,
            <surname>Moruz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.A.</given-names>
            ,
            <surname>Trandabat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Husarciuc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Boros</surname>
          </string-name>
          , E. \
          <article-title>Enhancing a Question Answering System with Textual Entailment for Machine Reading Evaluation." CLEF (Online Working Notes</article-title>
          /Labs/Workshop)
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Shima</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kanayama</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>C. W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mitamura</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miyao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , ... &amp;
          <string-name>
            <surname>Takeda</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2011</year>
          , December).
          <article-title>Overview of ntcir-9 rite: Recognizing inference in text</article-title>
          .
          <source>In Proceedings of the 9th NII Test Collection for Information Retrieval Workshop (NTCIR'11)</source>
          (pp.
          <fpage>291</fpage>
          -
          <lpage>301</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Klein</surname>
            , Dan, and
            <given-names>Christopher D.</given-names>
          </string-name>
          <string-name>
            <surname>Manning</surname>
          </string-name>
          . \
          <article-title>Accurate unlexicalized parsing." Proceedings of the 41st Annual Meeting on Association for Computational LinguisticsVolume 1</article-title>
          . Association for Computational Linguistics,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Finkel</surname>
          </string-name>
          , Jenny Rose, Trond Grenager, and Christopher Manning. \
          <article-title>Incorporating non-local information into information extraction systems by gibbs sampling." Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics</article-title>
          .
          <source>Association for Computational Linguistics</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peirsman</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chambers</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Surdeanu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Deterministic coreference resolution based on entity-centric, precisionranked rules</article-title>
          .
          <source>Computational Linguistics</source>
          , (
          <issue>Just Accepted)</issue>
          ,
          <fpage>1</fpage>
          -
          <lpage>54</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>De</surname>
            <given-names>Marneffe</given-names>
          </string-name>
          , Marie-Catherine, Bill MacCartney, and
          <string-name>
            <surname>Christopher</surname>
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Manning</surname>
          </string-name>
          . \
          <article-title>Generating typed dependency parses from phrase structure parses</article-title>
          .
          <source>" Proceedings of LREC</source>
          . Vol.
          <volume>6</volume>
          .
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Dzikovska</surname>
            ,
            <given-names>M. O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nielsen</surname>
            ,
            <given-names>R. D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brew</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leacock</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giampiccolo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bentivogli</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , ... &amp;
          <string-name>
            <surname>Dang</surname>
            ,
            <given-names>H. T.</given-names>
          </string-name>
          (
          <year>2013</year>
          , June). SemEval
          <article-title>-2013 task 7: The joint student response analysis and 8th recognizing textual entailment challenge</article-title>
          .
          <source>In Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval</source>
          <year>2013</year>
          ),
          <article-title>in conjunction with the Second Joint Conference on Lexical and Computational Semantcis (* SEM 2013)</article-title>
          , Atlanta, Georgia, USA, June. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>