<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Influence of Classifiers and Encoders on Argument Classification in Japanese Assembly Minutes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yasutomo Kimura</string-name>
          <email>kimura@res.otaru-uc.ac.jp</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hideyuki Shibuki</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hokuto Ototake</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuzu Uchida</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Keiichi Takamaru</string-name>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kotaro Sakamoto</string-name>
          <xref ref-type="aff" rid="aff7">7</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Madoka Ishioroshi</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Teruko Mitamura</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Noriko Kando</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Carnegie Mellon University</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Fukuoka University</institution>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Hokkai-Gakuen University</institution>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>National Institute of Informatics</institution>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Otaru University of Commerce</institution>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>SOKENDAI</institution>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff6">
          <label>6</label>
          <institution>Utsunomiya Kyowa University</institution>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff7">
          <label>7</label>
          <institution>Yokohama National University</institution>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>1705</fpage>
      <lpage>1714</lpage>
      <abstract>
        <p>We performed a comparative study of the influence of seven different types of classifiers and four types of encoders on argument classification in Japanese assembly minutes using 45 sets of results from the Question Answering Lab for Political Information task at the NTCIR-14 workshop. The more accurate value obtained from a classification of argumentative relations between a speech sentence and a political topic was 0.942 using the support vector machines classifier and one-hot encoding, while the most accurate classification value obtained with the long short-term memory classifier and word embedding was estimated to be 0.934.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Numerous arguments about various topics are conducted at
different assemblies globally. Although the arguments are
valuable for the general public, they are too numerous and
intertwined to be comprehensible. Meanwhile, the demand
for promptly providing the information required by the users
after checking facts to eliminate the fake news from such
arguments has been increasing in recent times. Advanced
question answering (QA) technologies including argument mining
and/or machine comprehension can assist the users to avail
such information. Therefore, argument mining from
assembly minutes is becoming increasingly significant.</p>
      <p>In general, machine learning methods, such as the support
vector machines (SVM), are used to recognize
argumentative relations, including support or attack relations [Stab and
Gurevych, 2017]. However, determining the most suitable
method and relevant design of argument vectors for argument
mining from assembly minutes is a challenge. The QA
LabPoliInfo (Question Answering Lab for Political Information)
task1 [Kimura et al., 2019] at the NTCIR-14 workshop was
held from January 2018–June 2019. This task was a shared
task that focused on recognizing and summarizing the
opinions of assemblymen and their reasons in the Japanese
Regional Assembly Minutes Corpus [Kimura et al., 2016].
Fifteen teams participated and submitted a total of 119 results.
These teams had employed different types of methods, such
as the rule-based classifier vs. machine learning classifier,
one hot encoding vs word embedding, and SVM vs. long
short-term memory (LSTM).</p>
      <p>The QA Lab-PoliInfo task includes the segmentation,
summarization and the classification tasks. The objective of the
classification task is to recognize the classes of the speech of
assemblymen, such as “support”, “against” and “other”, to an
opinion, such as “The Tsukiji Market should move to Toyosu
area”. This is similar to recognizing the argumentative
relations. We investigated the influence of the difference
between classifiers and encoders in recognizing argumentative
relations using the results of the classification task.</p>
      <p>The main contribution of this study is to clarify the
influence of classifiers and encoders of the machine learning
methods on argument classification in the Japanese assembly
minutes based on the results of various empirical systems.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>The comparative study on argument mining is presented in
this section. Aker et al. [Aker et al., 2017] comparatively
analyzed the machine learning methods and feature sets using
persuasive essays and Wikipedia articles in English.
However, the results do not include the current methods, such as
the LSTM. The Japanese assembly minutes include different
characters from the essays and articles.</p>
      <p>Fake News Challenge2 and CLEF-2018 Fact Checking
Lab3 [Nakov et al., 2018] are shared tasks that deal with
political information. The Fake News Challenge conducted the
stance detection task and estimated the relative perspective
(or stance) of two pieces of text relative to a topic, claim,
or issue. The CLEF-2018 Fact Checking Lab conducted two
tasks, which consists of the check-worthiness and the
factuality [Atanasova et al., 2018; Barro´n-Ceden˜o et al., 2018]. As
Japanese arguments are generally more implicit than English,
there is some uncertainty about the effectiveness of the
argument mining methods for English with respect to Japanese
texts.</p>
      <sec id="sec-2-1">
        <title>2http://www.fakenewschallenge.org/ 3http://alt.qcri.org/clef2018-factcheck/</title>
        <p>Stanford Question Answering Dataset (SQuAD)
[Rajpurkar et al., 2016] is used for advanced QA purpose,
including machine comprehension [Wang et al., 2018; Wang
et al., 2017]. While the SQuAD includes 100,000+
questions, the data set used in the QA Lab-PoliInfo task comprises
10,000+ questions. The latter, therefore, is not capable of
providing sufficient amount of training data for general machine
learning methods. However, consistently securing sufficient
amount of training data is considered difficult in a specific
domain like assembly minutes. Researching on the results
obtained from limited amount of data is important on account
of their execution in the real world.
The Japanese Regional Assembly Minutes Corpus [Kimura et
al., 2016] had collected the minutes of plenary assemblies in
47 prefectures of Japan from April 2011–March 2015. These
Japanese minutes resemble a transcript. In the
question-andanswer session, an assemblyman asks several questions at a
time, and a prefectural governor or a superintendent answers
the questions under his/her charge. Any speech is too
extensive to understand its contents at a glance; therefore,
information access technologies, such as the advanced QA and
automated summarization, aid in this process. A subset of the
corpus, which was narrowed down to the Tokyo Metropolitan
Assembly, was used for the QA Lab-PoliInfo task.</p>
        <p>For the gold standard data, 14 political topics, such as “The
Tsukiji Market should move to Toyosu area,” were considered
in advance. After all the sentences including keywords in
a topic, such as “Tsukiji Market,” were extracted from the
corpus, at least three workers annotated the gold standard data
per sentence using cloud services. Finally, a total of 10,291
sentences were used as the training data, and 3,412 sentences
were used as the test data.</p>
        <sec id="sec-2-1-1">
          <title>3.2 Classification task</title>
          <p>The objective of the classification task at the QA Lab-PoliInfo
task is to discover the opinion, which possesses the
factcheckable reasons, in the Japanese assembly minutes. Figure
1 shows an example of the classification task. Firstly, a
political topic was provided. When a speech sentence in the
minutes was provided, the basic factors of classification, which
were relevance, fact-checkability and stance agreeing, were
recognized. Relevance implies checking whether the
sentence provided refers to the specific topic. Fact-checkability
implies checking whether the sentence provided contains
fact-checkable reasons. Stance agreeing implies checking
whether the speaker of the sentence agrees with the topic.
However, we prepared a third stance, called “other”, to
denote that a speaker stands neutral or demonstrates no relation
to the topic. Finally, the sentence was classified into the
following three classes: support with fact-checkable reasons (S),
against with fact-checkable reasons (A), and other (O). All
the data are provided to the participants in JavaScript Object
Notation (JSON) format, as shown in Figure 2.</p>
          <p>As measured from the evaluation, the accuracy of all
classes A is defined as follows.</p>
          <p>1
A =
(1)
jQj q2Q</p>
          <p>X num(q)
3
where Q is a set of sentences provided, and num(q) is the
number of workers, who annotated the classified class as the
gold standard class in the sentence q (maximum value = 3).</p>
          <p>Input: A political topic and a sentence in the minutes
Output: A relevance (existence or absence), a
factcheckability (existence or absence), a stance agreeing
(agree, disagree, or other) and a class (support with
factcheckable reasons, against with fact-checkable reasons,
or other)</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Evaluation: accuracy of all classes</title>
        <sec id="sec-2-2-1">
          <title>3.3 Grouping the methods</title>
          <p>During the classification task, the results of 45 methods from
11 teams were submitted. As the methods were varied,
devising an approach to group them was difficult. As the teams
also submitted their system description, we grouped the
methods according to viewpoints that are shared by many methods,
i.e., based on the type of machine learning classifier and
encoding.</p>
          <p>Although most methods used a machine learning
classifier, there were two rule-based methods. Some methods
employed a combination of classifiers, such as SVM and
decision tree. Therefore, we decided the classifier groups as
follows: rule-based, MaxEnt, three-layered perceptron (3LP),
SVM, LSTM, a combination of SVM and other classifiers
(SVM+), and a combination of LSTM and other classifiers
(LSTM+). There was no method that used a combination of
SVM and LSTM.</p>
          <p>The encoding of the methods using the machine learning
classifier was performed through either one-hot encoding or
word embedding. However, one method was observed to be
an exception, as its encoding included folding a word and its
appearing place into a vector element. The rule-based
classifiers used simple key-phrases without encoding. Therefore,
the encoding groups were decided as follows: key-phrase,
one-hot encoding, word embedding, and unique encoding.
Table 1 lists the numbers in the classifier and the respective
encoding groups.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Result</title>
      <p>Figures 3–10 show the box-and-whisker plots with respect
to the accuracy of classification, relevance, fact-checkability
and stance agreeing in the classifier and encoding groups,
respectively. Table 2 lists the most accurate of all the values.
The accuracy results of the machine learning classifiers were
observed to be better than that of the rule-based classifiers.
The SVM classifier demonstrated the most accurate value of
0.942, while the LSTM classifier demonstrated a value of
0.934. The combinations of classifiers did not work as well
as they were expected. An accuracy of 0.942 with the
onehot encoding was the best, although it was marginally higher
than that of word embedding (0.934). Aker et al. [Aker et
al., 2017] reported that the difference between the classifiers
was marginal, and the results observed in this study exhibited
a similar tendency.</p>
      <p>While comparing the basic factors of classification with
each other, it was observed that the results of fact-checkability
were relatively low. As it is an important factor for a
wellgrounded argument, it can emerge into an issue in the future.
We performed a comparative study of the influence of seven
types of classifiers and four types of encoders on argument
classification in Japanese assembly minutes using 45 sets of
results from the QA Lab-PoliInfo task at the NTCIR-14
workshop. During the classification of argumentative relations
between a speech sentence and a political topic, the most
accurate value obtained using an SVM classifier and one-hot
encoding was estimated to be 0.942. However, the accuracy of
the combination of an LSTM classifier and word embedding
was estimated to be 0.934.
In Linda Cappellato, Nicola Ferro, Jian-Yun Nie, and
Laure Soulier, editors, CLEF 2018 Working Notes.
Working Notes of CLEF 2018 - Conference and Labs of
the Evaluation Forum, CEUR Workshop Proceedings,
Avignon, France, September 2018. CEUR-WS.org.
Sakamoto, Madoka Ishioroshi, Teruko Mitamura, Noriko
Kando, Tatsunori Mori, Harumichi Yuasa, Satoshi Sekine,
and Kentaro Inui. Overview of the ntcir-14 qa lab-poliinfo
task. In Proceedings of the 14th NTCIR Conference,
Tokyo, Japan, June 2019.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Aker et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Ahmet</given-names>
            <surname>Aker</surname>
          </string-name>
          , Alfred Sliwa, Yuan Ma, Ruishen Lui, Niravkumar Borad, Seyedeh Ziyaei, and
          <string-name>
            <given-names>Mina</given-names>
            <surname>Ghobadi</surname>
          </string-name>
          .
          <article-title>What works and what does not: Classifier and feature analysis for argument mining</article-title>
          .
          <source>In Proceedings of the 4th Workshop on Argument Mining</source>
          , pages
          <fpage>91</fpage>
          -
          <lpage>96</lpage>
          , Copenhagen, Denmark,
          <year>September 2017</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Atanasova et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Pepa</given-names>
            <surname>Atanasova</surname>
          </string-name>
          , Llu´ıs Ma`rquez, Alberto Barro´
          <fpage>n</fpage>
          -Ceden˜o, Tamer Elsayed, Reem Suwaileh, Wajdi Zaghouani, Spas Kyuchukov, Giovanni Da San Martino, and
          <string-name>
            <given-names>Preslav</given-names>
            <surname>Nakov</surname>
          </string-name>
          .
          <article-title>Overview of the clef-2018 checkthat! lab on automatic identification and verification of political claims, task 1: Check-worthiness.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>[Barro´n-Ceden˜o et al</article-title>
          .,
          <year>2018</year>
          ]
          <article-title>Alberto Barro´n-Ceden˜o, Tamer Elsayed</article-title>
          , Reem Suwaileh, Llu´ıs Ma`rquez, Pepa Atanasova, Wajdi Zaghouani, Spas Kyuchukov, Giovanni Da San Martino, and
          <string-name>
            <given-names>Preslav</given-names>
            <surname>Nakov</surname>
          </string-name>
          .
          <article-title>Overview of the clef-2018 checkthat! lab on automatic identification and verification of political claims, task 2: Factuality</article-title>
          . In Linda Cappellato, Nicola Ferro,
          <string-name>
            <surname>Jian-Yun Nie</surname>
          </string-name>
          , and Laure Soulier, editors,
          <source>CLEF 2018 Working Notes. Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings</source>
          , Avignon, France,
          <year>September 2018</year>
          .
          <article-title>CEUR-WS.org</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Kimura et al.,
          <year>2016</year>
          ]
          <string-name>
            <given-names>Yasutomo</given-names>
            <surname>Kimura</surname>
          </string-name>
          , Keiichi Takamaru, Takuma Tanaka, Akio Kobayashi, Hiroki Sakaji, Yuzu Uchida, Hokuto Ototake, and
          <string-name>
            <given-names>Shigeru</given-names>
            <surname>Masuyama</surname>
          </string-name>
          .
          <article-title>Creating japanese political corpus from local assembly minutes of 47 prefectures</article-title>
          .
          <source>In Proceedings of the 12th Workshop on Asian Language Resources (ALR12)</source>
          , pages
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
          , Osaka, Japan,
          <year>December 2016</year>
          .
          <article-title>The COLING 2016 Organizing Committee</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Kimura et al.,
          <year>2019</year>
          ]
          <string-name>
            <given-names>Yasutomo</given-names>
            <surname>Kimura</surname>
          </string-name>
          , Hideyuki Shibuki, Hokuto Ototake, Yuzu Uchida, Keiichi Takamaru, Kotaro [Nakov et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Preslav</given-names>
            <surname>Nakov</surname>
          </string-name>
          , Alberto Barro´
          <fpage>n</fpage>
          -Ceden˜o, Tamer Elsayed, Reem Suwaileh, Llu´ıs Ma`rquez, Wajdi Zaghouani, Pepa Atanasova, Spas Kyuchukov, and Giovanni Da San Martino.
          <article-title>Overview of the clef-2018 checkthat! lab on automatic identification and verification of political claims</article-title>
          . In
          <string-name>
            <surname>Josiane Mothe Fionn Murtagh Jian Yun Nie Laure Soulier Eric Sanjuan Linda Cappellato Nicola Ferro Patrice Bellot</surname>
          </string-name>
          , Chiraz Trabelsi, editor,
          <source>Proceedings of the Ninth International Conference of the CLEF Association: Experimental IR Meets Multilinguality</source>
          , Multimodality, and Interaction, Lecture Notes in Computer Science, Avignon, France,
          <year>September 2018</year>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Rajpurkar et al.,
          <year>2016</year>
          ]
          <string-name>
            <given-names>Pranav</given-names>
            <surname>Rajpurkar</surname>
          </string-name>
          , Jian Zhang, Konstantin Lopyrev, and
          <string-name>
            <given-names>Percy</given-names>
            <surname>Liang</surname>
          </string-name>
          . Squad:
          <volume>100</volume>
          ,000+
          <article-title>questions for machine comprehension of text</article-title>
          .
          <source>In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>2383</fpage>
          -
          <lpage>2392</lpage>
          , Austin, Texas,
          <year>November 2016</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[Stab and Gurevych</source>
          , 2017]
          <string-name>
            <given-names>Christian</given-names>
            <surname>Stab</surname>
          </string-name>
          and
          <string-name>
            <given-names>Iryna</given-names>
            <surname>Gurevych</surname>
          </string-name>
          .
          <article-title>Parsing argumentation structures in persuasive essays</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>43</volume>
          (
          <issue>3</issue>
          ):
          <fpage>619</fpage>
          -
          <lpage>659</lpage>
          ,
          <year>September 2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>[Wang</surname>
          </string-name>
          et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Wenhui</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Nan Yang</surname>
            , Furu Wei,
            <given-names>Baobao</given-names>
          </string-name>
          <string-name>
            <surname>Chang</surname>
            , and
            <given-names>Ming</given-names>
          </string-name>
          <string-name>
            <surname>Zhou</surname>
          </string-name>
          .
          <article-title>Gated self-matching networks for reading comprehension and question answering</article-title>
          .
          <source>In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          , pages
          <fpage>189</fpage>
          -
          <lpage>198</lpage>
          , Vancouver, Canada,
          <year>July 2017</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>[Wang</surname>
          </string-name>
          et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Wei</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming Yan</surname>
          </string-name>
          , and
          <article-title>Chen Wu. Multi-granularity hierarchical attention fusion networks for reading comprehension and question answering</article-title>
          .
          <source>In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          : Long Pa-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>