<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Can Models of Author Intention Support Quality Assessment of Content?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>A J Casey</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bonnie Webber</string-name>
          <email>bonnieg@inf.ed.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dorota Glowacka</string-name>
          <email>glowacka@cs.helsinki.fi</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Edinburgh</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Helsinki</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Academics seek to nd, understand and critically review the work of other researchers through published scienti c articles. In recent years, the volume of available information has signi cantly increased, partly due to technological advancements and partly due to pressures on academics to `publish or perish'. This amount of papers presents a challenge not only for the peer-review process but also for readers, particularly inexperienced readers, to nd publications of high quality. Whilst one might rely on citation or journal rankings to help guide this decision, this approach may not be completely reliable due to biased peer-review processes and the fact that the citation count of an article does not per se indicate its quality. Here, we analyse how expected author intentions in a Related Work section can be used to indicate its quality. We show that author intentions can predict the quality with reasonable accuracy and propose that similar approaches could be used in other sections to provide an overall picture of quality. This approach could be useful in supporting peer-review processes and for a reader in prioritising articles to read.</p>
      </abstract>
      <kwd-group>
        <kwd>Article Quality</kwd>
        <kwd>Author Intentions</kwd>
        <kwd>Supporting peer-review</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Recent years have seen an increase in the volume of scienti c publications. The
amount of published material poses a challenge for the reader, in particular an
inexperienced one, who must navigate this overwhelming wealth of material to
nd relevant and high quality content. Another challenge is for the peer-review
process. There is only a limited pool of experts to undertake peer-review and
the high volume of submitted material puts pressure on this limited resource.
Having automated ways to assess quality could support the peer-review process
and help the overwhelmed reader to prioritise their ever growing reading list.</p>
      <p>
        Automating judgement of quality in research is challenging as it requires
knowledge. Bridges [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] describes this judgement of research quality as a
connoisseurship which draws on one's own knowledge and experience of the eld. This, in
turn, not only allows one to comment on speci c features but also gives one the
ability to appreciate the overall composition of the text. It is recognised that
it would be di cult, if not impossible, to try to emulate this level of human
judgement in an automated fashion. We propose that considering how
argument intentions are represented linguistically and quantifying the depth of this
representation may help to build quality indicators that could prove useful in
supporting the peer-review process or to help readers identify better reading
material. The intuition behind using argument elements to de ne quality has
support in existing literature with essay scores shown to be linked to
argumentative elements identi ed through discourse analysis [
        <xref ref-type="bibr" rid="ref15 ref4">4, 15</xref>
        ].
      </p>
      <p>
        Based on this premise, we consider Related Work sections from published
papers as a case study. We assess these sections rating them as Good (G),
Average (Avg) or Poor (P). We use Related Work sections annotated with author
intentions designed to give content feedback [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. We analyse the relationship of
these author intentions and the quality ratings, showing that quality and author
intention occurrence are related, predicting with reasonable accuracy the quality
rating of a Related Work.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Peer-review, generally accepted as the gold standard of assessing quality, is not
without issue. There are problems of bias, publication delays, problems with
detecting fraud and/or errors, and unethical practices [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Metrics, such as
citations and download counts, have also been considered as indicators of quality.
But these too have known issues such as dependence of the size of discipline, and
they take time to accumulate. Authors and research teams have been known to
carry out unnecessary self-citations to increase their own citations [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Despite
these problems, we do not believe peer-review or using citation measures should
be replaced. Rather, we see our work as an additional tool. It could, for example,
be used for triage: if our tool rates a paper Poor or Good, perhaps it needs only
one reviewer to con rm it, with a second one only needed if the rst reviewer
disagrees with the automated assessment. Papers rated Average would always
have two reviewers. This indication of quality could also be used alongside such
measures as citation count to help a reader in prioritising which papers to read
rst.
      </p>
      <p>
        Automated recognition of author intentions contained in scienti c publication
has been successful in the past, as in Argument Zoning (AZ) [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Also supporting
our idea that author intentions can be linked to better Related Work sections
is other recent work [
        <xref ref-type="bibr" rid="ref14 ref7">7, 14</xref>
        ]. These works show that author goals (intentions)
identi ed within a text can be reliably linked to human essay scores. Burstein et
al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] take this a step further and use discourse analysis to label what they call
essay-speci c goals, e.g. thesis aim or conclusion. They propose missing labels
could be used by students to identify aspects that need improvement in their
essay. This relates to our idea that missing author intentions may point to poorer
quality material. Whilst these works use the individual labels within their schema
to highlight speci c missing intentions, our work could be seen as an extension,
using the combination of author intentions to suggest an overall indication of
quality.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Methods</title>
      <sec id="sec-3-1">
        <title>Author Intentions in Related Work</title>
        <p>
          The author intention labelled data we use is from [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. They use a data-set from
[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] consisting of scienti c published papers from the ACL anthology [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The
labels, based on qualities that Kamler and Thomson [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] have argued should be
present in Related Work, try to encapsulate neutral citations, those that provide
mere description compared to those that highlight gaps or problems, along with
identifying where an author talks about their own work and how this relates to
the cited work or background in general.
        </p>
        <p>The author intention labels used can be found in Table 1. Certain labels
from the original schema were rare and were collapsed into frequent categories.
These included sentences positive about a citation/ eld, works that author's
work builds on, uses or is similar to; and comparison of two cited works as
described in the description eld of Table 1.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Assessing Quality</title>
        <p>
          An experiment was set up to rate the quality of each Related Work section from
the data set in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Participants were presented with the Title, Abstract and
Related Work section and asked to rate the quality into Poor(P), Good(G) or
Average(Avg). Besides this, they were asked (i) if there was enough previous
work material; (ii) how well the author related their work to the previous work;
and (iii) whether it was clear how the author's work di ered from previous work.
However, for this work we only use the quality rating given by the participants.
Guidance given to participants suggested that it was not enough to list previous
work, but that authors should demonstrate the relation of cited work to their own
work. This guidance also indicated that conference papers are usually limited in
length so an in-depth explanation of state of the art is not expected.
        </p>
        <p>There were six assessors: four experts and two PhD students { all in the
computational linguistics except one student in computer vision. One assessor
rated all items, the others rated ten each. Assessor agreement considered the
di erences between the ve assessors and the main assessor who looked at all
the articles. Four out of the ve assessors were in good agreement with the main
assessor; two were in complete agreement and two agreed on seven out of the
ten papers. The other assessor only agreed in four instances, which is likely due
to them being a PhD student in another area and having less experience with
ACL papers. All disagreements were discussed and agreement reached resulting
in 50 double rated papers and 44 done by one assessor only. This resulted in a
nal data set of 94 papers with P-(36%), G-(31%) and Avg-(33%).
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Mean Label Occurrence in Rated Sections</title>
      <p>Table 2 shows the mean number of times a label occurs in each section,
grouped by quality rating with variance in brackets. Our intuition is that the
occurance of some labels will vary between the di erent types of ratings. We use
Welch's t-test, correct for unequal variances, to test if di erences are signi cant
between the means in the groupings. Each group is tested in order of P/Avg,
Avg/P and P/G, where * denotes the test was signi cant (p &lt;0.05).</p>
      <p>
        Our background label with evidence (BG-EP) in our P sections is found to
be signi cantly di erent to those that occur in Avg or G rated sections. There
is a signi cant di erence in the number of background statements in Avg rated
sections compared to G sections that provide no evidence (BG-NE). Work is not
meant to be cited because it is on the same topic as the citing work, rather it
should be cited because it has implications for the author's study [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and the
author should say what these implications are. The ndings in Table 2 support
this in terms of signi cant di erences between the mean sentences in a G rated
section that describe how the authors work is di erent to a cited work (A-CW-D)
and how the author's work lls a gap (A-GAP). Additionally, we see a signi cant
di erence in the number of sentences that describe an author's work (A-DESC)
in P rated sections compared to both Avg and G sections.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Predicting Quality from Annotated Data</title>
      <p>
        Related Work quality is classi ed into P, Avg or G. We trained a classi er,
experimenting with: SVM (linear kernel), Decision Tree (C4.5) and Linear
Logistic Regression(LLR) [
        <xref ref-type="bibr" rid="ref12 ref16 ref6">6, 12, 16</xref>
        ]. We use feature sets of our annotated labels
only. Whilst there are many other features that we could include, our focus here
is to understand how well our author intentions relate to quality ratings. We use
10-fold cross validation and a majority classi er as our baseline. We report on
how our features rank in terms of importance in our best performing classi er.
      </p>
      <p>
        Table 3 shows precision, recall and accuracy from all three classi ers and
our majority class baseline. To ensure consistency of results, we ran our models
over 10 iterations and report on mean performance (variance in brackets). We
test for any di erences between our classi ers using corrected t-test, (p &lt;0.05)
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. All classi ers outperform our baseline signi cantly. Unsurprisingly, SVM
and LLR produce similar results. However, SVM displays marginally less
variation in runs, although there is no signi cant di erence between SVM and LLR.
Accuracy between SVM and LLR is signi cantly di erent to that of the decision
tree method. One of the reasons for the latter's poor performance may be that
the label features are not exclusive. For example, although author gap and
differences (A-GAP, A-CW-DIFF) are rare in P examples, they are not completely
absent. We do not have any direct systems to compare to but e-rater 2.0 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
report agreement between system and human score of essays at 97%. e-rater is,
however, a commercial system built on multiple elements not just author
intentions. Whilst we do not achieve this level of accuracy, our results are promising
as a rst step and with the addition of other features we could improve the
accuracy. For example, we experimented with adding sentence counts and citation
counts and we were able to consistently improve the accuracy by 4%.
      </p>
      <p>
        Table 4 ranks labels in terms of importance in SVM, showing that an author
highlighting a di erence of their work to a cited work or how their work addresses
a gap are the most important labels for distinguishing between Quality ratings.
This seems plausible as we observe these do occur more in "better" Related Work
sections. These supports the idea from Maxwell [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] who states that cited work
needs to be shown to have implications for the study. It seems that if this type
of connection is missing then the work is rated as poorer.
      </p>
      <p>Finally, for our best performing model SVM, we checked the confusion matrix
for all 10 iterations. We were interested to see if mis-classi cation was occurring
in the nearest group i.e. G were mis-classi ed as Avg and not P. We observed
that out of 10 iterations this happened twice { one P section being classi ed as
G { and 6 times one G document was classi ed as P. We speculate that we could
improve performance by studying patterns of labels occurring together. When
we considered the mean occurrence and variance of labels in Table 2, we saw
that it is not simply a case of a P section not having any sentences about the
author's work or never mentioning a gap. We believe there may be more to learn
about patterns that happen with labels occurring together that support better
classi cation of the di erent ratings.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>Using Related Work sections, we have shown that some author intentions di er
signi cantly across sections rated P, Avg and G. These author intentions show
promise as being viable indicators of quality of the content. We speculate that
these di erent rated sections will have co-occurrence patterns of labels that may
provide stronger indications of di erences between the quality ratings { an aspect
we intend to investigate in the future. Our study does have limitations of the
small sample size { 94 papers and only one domain is considered. Our choice
of section Related Work is also one that does not occur in every domain. Our
prediction of quality rating is consistently accurate at 70% with only author
intentions as features. Whilst this does not match commercial tool accuracy, such
as e-rater (97%), it is a very promising result that could possibly be improved
with additional features. Reaching human level of judgement for peer-review in
scienti c papers is most likely impossible. For example, it is hard to tell what
is missing, speci cally what has not been addressed or identify something that
is incorrect { these aspects might still require a human expert. Nonetheless,
we believe that this type of quality rating, if developed at a section speci c
level, could prove useful in supporting peer-review, directing where reviewers
time should be focused and on which papers. In addition, it could help a reader
prioritise their reading list of papers.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bird</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dale</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dorr</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gibson</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joseph</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kan</surname>
            ,
            <given-names>M.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Powley</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radev</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>Y.F.</given-names>
          </string-name>
          :
          <article-title>The ACL anthology reference corpus: A reference dataset for bibliographic research in computational linguistics</article-title>
          .
          <source>In: LREC</source>
          <year>2008</year>
          (
          <year>2008</year>
          ), http://www.lrec-conf.org/proceedings/lrec2008/pdf/445 paper.pdf
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bridges</surname>
          </string-name>
          , D.:
          <article-title>Research quality assessment in education: impossible science</article-title>
          ,
          <source>possible art? British Educational Research Journal</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Burstein</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marcu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knight</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Finding the WRITE stu : Automatic identication of discourse structure in student essays</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Burstein</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chodorow</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leacock</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Automated essay evaluation: The criterion online writing service</article-title>
          .
          <source>AI</source>
          Magazine
          <volume>25</volume>
          ,
          <issue>27</issue>
          {
          <volume>36</volume>
          (09
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Casey</surname>
            ,
            <given-names>A.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Webber</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glowacka</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>A framework for annotating related works, to support feedback to novice writers</article-title>
          .
          <source>In: Proceedings of the 13th Linguistic Annotation Workshop held in conjunction with ACL</source>
          <year>2019</year>
          (
          <article-title>LAW-XIII 2019)</article-title>
          .
          <article-title>Association for Computational Linguistics</article-title>
          , Florence,
          <source>Italy (Aug</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <issue>6</issue>
          .
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>C.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.J.:</given-names>
          </string-name>
          <article-title>LIBSVM: A library for support vector machines</article-title>
          .
          <source>ACM Transactions on Intelligent Systems and Technology</source>
          <volume>2</volume>
          ,
          <issue>27</issue>
          :1{
          <fpage>27</fpage>
          :
          <fpage>27</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ghosh</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khanam</surname>
            , A., Han,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muresan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Coarse-grained argumentation features for scoring persuasive essays</article-title>
          .
          <source>In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)</source>
          . pp.
          <volume>549</volume>
          {
          <fpage>554</fpage>
          . Association for Computational Linguistics, Berlin, Germany (Aug
          <year>2016</year>
          ). https://doi.org/10.18653/v1/
          <fpage>P16</fpage>
          -2089
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Glanzel</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Debackere</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thijs</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schubert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A concise review on the role of author self-citations in information science, bibliometrics and science policy</article-title>
          .
          <source>Scientometrics</source>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kamler</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thomson</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Helping doctoral students write: Pedagogies for supervision</article-title>
          .
          <source>Routledge</source>
          (
          <year>2006</year>
          ). https://doi.org/10.4324/9780203969816
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Maxwell</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          :
          <article-title>Literature reviews of, and for, educational research: A commentary on boote and beile's \scholars before researchers"</article-title>
          .
          <source>Educational Researcher</source>
          <volume>35</volume>
          (
          <issue>9</issue>
          ),
          <volume>28</volume>
          {
          <fpage>31</fpage>
          (
          <year>2006</year>
          ). https://doi.org/10.3102/0013189X035009028
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Nadeau</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Inference for the generalization error</article-title>
          .
          <source>In: Proceedings of the 12th International Conference on Neural Information Processing Systems</source>
          . pp.
          <volume>307</volume>
          {
          <fpage>313</fpage>
          . NIPS'99, MIT Press, Cambridge, MA, USA (
          <year>1999</year>
          ), http://dl.acm.org/citation.cfm?id=
          <volume>3009657</volume>
          .
          <fpage>3009701</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Quinlan</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          :
          <source>C4</source>
          .
          <article-title>5: Programs for Machine Learning</article-title>
          . Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. Schafer, U.,
          <string-name>
            <surname>Spurk</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ste</surname>
            <given-names>en</given-names>
          </string-name>
          , J.:
          <article-title>A fully coreference-annotated corpus of scholarly papers from the ACL anthology</article-title>
          .
          <source>In: Proceedings of COLING 2012: Posters</source>
          . pp.
          <volume>1059</volume>
          {
          <fpage>1070</fpage>
          .
          <string-name>
            <surname>The</surname>
            <given-names>COLING</given-names>
          </string-name>
          2012
          <string-name>
            <given-names>Organizing</given-names>
            <surname>Committee</surname>
          </string-name>
          , Mumbai, India (dec
          <year>2012</year>
          ), https://www.aclweb.org/anthology/C12-2103
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heilman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beigman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deane</surname>
            ,
            <given-names>K.P.</given-names>
          </string-name>
          :
          <article-title>Applying argumentation schemes for essay scoring</article-title>
          .
          <source>In: Proceedings of the First Workshop on Argumentation Mining</source>
          . pp.
          <volume>69</volume>
          {
          <fpage>78</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2014</year>
          ), http://citeseerx.ist.psu.edu/viewdoc/summary?doi
          <source>=10.1.1.672.5185</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heilman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Beigman</given-names>
            <surname>Klebanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Deane</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Applying argumentation schemes for essay scoring</article-title>
          .
          <source>In: Proceedings of the First Workshop on Argumentation Mining</source>
          . pp.
          <volume>69</volume>
          {
          <fpage>78</fpage>
          . Association for Computational Linguistics, Baltimore,
          <source>Maryland (Jun</source>
          <year>2014</year>
          ). https://doi.org/10.3115/v1/
          <fpage>W14</fpage>
          -2110, https://www.aclweb.org/anthology/W14-2110
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Sumner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>Speeding up logistic model tree induction</article-title>
          .
          <source>PKDD LNCS 3721</source>
          ,
          <issue>675</issue>
          {
          <fpage>683</fpage>
          (
          <year>2005</year>
          ), https://hdl.handle.
          <source>net/10289/1446</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Teufel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Argumentative zoning: Information extraction from scienti c text</article-title>
          .
          <source>Ph.D. thesis</source>
          , University of Edinburgh (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Walker</surname>
            , R., da Silva,
            <given-names>P.R.</given-names>
          </string-name>
          :
          <article-title>Emerging trends in peer review-a survey</article-title>
          .
          <source>Frontiers in Neuroscience</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>