<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Arguments Extracted from Text in Argument Based Machine Learning: A Case Study</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martin Mozina</string-name>
          <email>martin.mozina@fri.uni-lj.si</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudio Giuliano</string-name>
          <email>giuliano@fbk.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivan Bratko</string-name>
          <email>ivan.bratko@fri.uni-lj.si</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FBK-irst</institution>
          ,
          <addr-line>Povo TN 38100</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Ljubljana</institution>
          ,
          <country country="SI">Slovenia</country>
        </aff>
      </contrib-group>
      <fpage>43</fpage>
      <lpage>50</lpage>
      <abstract>
        <p>We introduce a novel approach to cross-media learning based on argument based machine learning (ABML). ABML is a recent method that combines argumentation and machine learning from examples, and its main idea is to provide expert's arguments for some of the learning examples. In this paper, we present an alternative approach, where arguments used in ABML are automatically extracted from text with a technique for relation extraction. We demonstrate and evaluate the approach through a case study of learning to classify animals by using arguments extracted from Wikipedia.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Argument Based Machine Learning (ABML) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is a recently developed
approach that combines the ideas of argumentation and machine learning.
Argumentation [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] is a branch of logic that mimics human reasoning and discussion
between humans in several ways. In ABML, the idea is to provide expert’s
arguments, or reasons, for some of the learning examples. We require that the
theory induced from the examples explains the examples in terms of the given
reasons. This makes the learning easier because it constrains the search space of
candidate hypotheses.
      </p>
      <p>
        In this paper, we will demonstrate a possible way of extracting arguments
from text and using them in ABML. We will begin with a short introduction
to ABML, then investigate how arguments can be extracted from text, and
demonstrate it on an animal classification problem. We will conclude the paper
with a summary of main findings and pointers for further work.
Argument based machine learning [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is machine learning extended with some
concepts from argumentation. Argumentation is a branch of artificial intelligence
that analyzes reasoning where arguments for and against a certain claim are
produced and evaluated. A typical example of such reasoning is a law dispute at
court, where plaintiff and defendant give arguments for their opposing claims,
and at the end of the process the party with better arguments wins the case.
      </p>
      <p>Arguments are used in ABML to enhance learning examples. Each argument
is attached to a single learning example only, while one example can have several
arguments. There are two types of arguments: positive arguments are used to
explain (or argue) why a certain learning example is in the class as given, and
negative ones give reasons against the class as given.</p>
      <p>In ABML, arguments are usually provided by domain experts who find it
much easier to articulate their knowledge in this manner. While it is generally
accepted that giving domain knowledge usually poses a problem, in ABML they
need to focus on one specific case only at a time and provide knowledge that
seems relevant for this case and does not need to be valid for the whole domain. In
this paper, we suggest an approach where arguments are automatically extracted
from text. This approach thus eliminates the reliance on an expert. The expected
advantage of this idea is similar to that with experts; it should be much easier
to extract from text specific relations in the form of arguments that concern
concrete examples than extracting general theories from text.</p>
      <p>
        An ABML method is required to induce a theory that uses given arguments
to explain the examples. If an ABML method is used on standard examples
only (without arguments), then it should work the same as a normal machine
learning method. We will use method ABCN2 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], an argument based extension
of the well known method CN2, that learns a set of unordered probabilistic rules
from argumented examples. There, the theory (a set of rules) is said to explain
the examples using given arguments, when there exists at least one rule for
each argumented example that contains at least one positive argument in the
condition part.
      </p>
      <p>
        When an expert provides arguments for ABML, it is crucial to present the
expert with only critical examples, as experts are unlikely to be willing to provide
arguments for the whole learning set. This would require too much time and
effort. Critical examples are examples that the learning method can not explain
sufficiently well [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In the case of this paper, arguments are automatically
extracted and they are provided for all learning examples, therefore we do not
have to deal with this constraint regarding human experts. In this study, we will
assume that arguments are provided for all learning examples.
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>Extracting Arguments from Text</title>
      <p>
        The extraction of arguments from text is based on relation extraction from text.
Relation extraction is an important task in natural language processing, with
many practical applications such as question answering, ontology population,
and information retrieval. It requires the analysis of textual documents, with the
aim of recognizing particular types of relations between named entities, nominals,
and pronouns. Reliably extracting relations in natural-language documents is
still a laborious and unsolved problem. Traditionally, relation extraction systems
have been trained to recognize relations between names of people, organizations,
locations, and proteins. In the last two decades, several evaluation campaigns
such as MUC [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], ACE [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], SemEval [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] have helped to understand and properly
formalize the problem, and provide comparative benchmarks.
      </p>
      <p>In this paper, we are interested in finding semantic relations between class
values and descriptive attributes (taken from data), and using them as
arguments. For example, given the class value reptile and the attribute eggs we
are interested in relations such as “Most reptiles lay eggs” and “Reptiles hatch
eggs.” Specifically, the relationships that exist between classes and attributes
are extracted from the whole English Wikipedia,3 an online encyclopedia
written collaboratively by volunteers, that has grown to become one of the largest
online repositories of encyclopedic knowledge, with millions of articles available
for a large number of languages.</p>
      <p>To extract such relations from textual documents, we have to deal with two
major problems. The first concerns the lack of information on the relation type
we are seeking. In relation extraction, we usually know in advance the type of the
relations to be extracted, here we only know class values and attributes, namely
the arguments of a possible relation. Thus, the task is restricted to discover
whether or not a relation exists between the two arguments.</p>
      <p>
        The second problem is related with the lexicalization of the class values and
attribute descriptions. The names of attributes and classes should be
meaningful, or, in other words, should be similar to those used in texts. Using their
background knowledge, humans can naturally interpret the concepts expressed
by class values and attributes, however, due to the variability of natural
language, it can be very difficult to find occurrences of the lexicalizations of such
concepts in the same sentence and, consequently, to determine whether or not
a relation exists. To address the first problem, we do not try to find specific
assertions of relations in text, but rather we exploit the simple idea that if many
different sentences reference both the class value and attribute, then the class
value and attribute are likely to be related. On the other hand, to deal with the
variability of natural language, we generated alternative lexical variants using
a WordNet [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], a lexical database containing semantic relations among words.
Specifically, we generated variants for all class values and attributes using the
following semantic relations in WordNet: synonyms (e.g., breathe → respire) and
morphological derivations (e.g., predator → predators).
      </p>
      <p>
        As most relation extraction systems [
        <xref ref-type="bibr" rid="ref10 ref5 ref9">10, 5, 9</xref>
        ], we identify relations mentioned
in text documents considering only those pairs that are mentioned in the same
sentence. Let c1, . . . , ck be class values in data and a1, . . . , an attributes. Then,
the relation #r(ci, aj ) is defined as the number of sentences across the whole
English Wikipedia, where the class ci and the attribute aj co-occur.
      </p>
      <p>We shall now define the construction of an argument given the number of
relations between class and attribute values. An argument is a conjunction of a
set of reasons, where each reason is related to a single attribute in the domain.
To determine whether and attribute aj is a possible reason for class ci, we
first evaluate whether #r(ci, aj ) is statistically different from the expected value
3 http://en.wikipedia.org
E(#r(ci, aj )), namely is the number could be obtained purely by chance. A
possible method for this task is the standard χ2 test for 2 × 2 matrices.</p>
      <p>When #r(ci, aj ) is statistically different from E(#r(ci, aj )), it can be either
higher or lower. If #r(ci, aj ) &gt; E(#r(ci, aj )), then we say that aj is a positive
reason for ci. Such a positive argument can be given to an example if it is from
class ci and the value of aj is “positive”. The positiveness of attribute values
must be defined prior to learning and it intends to distinguish between values
that should occur more frequently in the class-attribute relations in text than it
is expected. Although, it is impossible to say which of the values will have this
property, we believe that a good heuristics to select positive attribute values is
to select those that ascribe the presence of a property described by aj to the
example. For instance, if an animal has the value of attribute breathes 1, this
value states that then the animal is breathing (presence of this property), and
the value of attribute is positive. If the number of found relations is less than
expected, i.e. #r(ci, aj ) &lt; E(#r(ci, aj )), then we can use such reason only if the
example has negative value of aj .</p>
      <p>
        An argument for a certain example is thus constructed from all positive and
negative reasons consistent with the values of this example. Sometimes, such
an argument will be overly specific. To alleviate this problem all arguments are
pruned with REP (reduced error pruning) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] principle before they are appended
to the example.
4
      </p>
    </sec>
    <sec id="sec-3">
      <title>Case study: Animal Classification</title>
      <p>
        The approach will be illustrated and evaluated on a learning domain named
ZOO taken from the UCI repository [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. It contains descriptions of 101 animals
(instances) with 17 attributes: hair, feathers, eggs, milk, predator, toothed,
domestic, backbone, fins, legs, tail, catsize, airborne, aquatic, breathes, venomous,
and type, which is the class attribute. Type has seven possible values: mammal,
bird, reptile, fish, amphibian, insect, and other.
      </p>
      <p>We began the experiment by learning rules with ABCN2 without any
arguments extracted from text. Induced rules were:
– IF milk=yes THEN type=mammal
– IF feathers=yes THEN type=bird
– IF eggs=yes AND fins=yes THEN type=fish
– IF aquatic=no AND legs=6 THEN type=insect
– IF feathers=no AND eggs=yes AND backbone=yes AND aquatic=no THEN type=r
– IF milk=no AND domestic=no AND hair=no AND tail=yes AND fins=no AND
legs=0 THEN type=reptile
– IF toothed=yes AND legs=4 AND eggs=yes AND aquatic=yes THEN type=amphib
– IF feathers=no AND hair=no AND airborne=no AND backbone=no AND
predator=yes THEN type=other
– IF fins=no AND backbone=no AND legs=no THEN type=other</p>
      <p>In the following step, we sought through Wikipedia for relations between
class values (e.g., mammal) and positive attribute values (e.g., milk=yes).
Table 1 shows the alternative lexical variants for some attributes generated using</p>
      <p>attribute lexical variants attribute lexical variants
hair hairs, fur, furs backbone spine
feather feathers, plumage breathe breathes, respire, respires
egg eggs, spawn, spawns venomous poisonous
milk milking fin fins
airborne winged leg legs, limb, limbs
aquatic aquatics, marine tail tails
predator predators domestic domesticated, pet
toothed tooth, teeth, fang, fangs, fanged catsize
WordNet and morphological derivations. In this search we omitted to use class
other, since it does not represent any actual animal class. Table 2 shows
number of all relations found for the ZOO domain. For example, we found a strong
correlation between the class bird and the attribute feather, merely expanding
the attribute with the lexical variants feathers and plumage. On the other hand,
despite the fact that it is intuitive for humans to answer the question if a reptile
has approximately the same size of a cat, it is almost impossible to find
occurrences of the class reptile and the attribute “catsize” in the same text, due to
the erroneous lexicalization of this attribute introduced for comparing animals
by size. The last row of Table 2 show that the attribute catsize gets scores of
zero for all classes.</p>
      <p>The absolute values #r(ci, aj ) are not strongly related to the correlation
between class ci and attribute aj . For instance, it seems that aquatic is the most
important feature of amphibians. But, is it really, as being aquatic is common
for all classes? On the other hand, the text extraction tool found only 6 relations
between amphibians and breathing. However, there is still a strong positive
relation between them, due to a much lesser presence of the concept breathing and
animal type amphibian in text when compared to other attributes and animals.</p>
      <p>For this reason, we applied the standard χ2 (sig = 0.05) test to determine
whether #r(ci, aj ) is statistically different from E(#r(ci, aj )) (any appropriate
statistical test could be applied here). Table 3 shows results of χ2 test, where:
value 0 means that the relation is not significant;
value 1 denotes positive reasons, and
value -1 denotes negative reasons.</p>
      <p>The original and the new rule are actually very alike. In the latter attributes
toothed and eggs were replaced with breathes and hair. From a point of an expert,
the second rule is better, since it is not entirely true that amphibians do not have
teeth. Most amphibian larvae have tiny teeth. Nevertheless, although most adult
amphibians retain their teeth, teeth can be reduced in size or not present at all.</p>
      <p>The rules for reptiles also changed with the use of arguments:
– IF toothed=yes AND milk=no AND fins=no AND legs=0 THEN type=reptile
– IF feathers=no AND milk=no AND fins=no AND breathes=yes AND
backbone=yes AND aquatic=no THEN type=reptile</p>
      <p>The rules are again similar to the ones above with some differences.
Specifically, the second rule in the original set mentions domestic=no as a condition
for a reptile, although there are many reptiles used as pets (e.g., turtles, snakes,
etc.)</p>
      <p>
        We evaluated the method with 10-times repeated 10-fold cross-validation to
avoid effects of randomness on one split only. In each iteration, all examples in the
learn set were argumented, then a model was built from these and evaluated on
the test set. Using ABCN2 without arguments resulted in 94.51% classification
accuracy, while ABCN2 with arguments scored, on average, 96.75%
classification accuracy. As a comparison, some standard machine learning methods (as
implemented in Orange [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]) scored 90% (SVM), 92.57% (C4.5) and 92.6% (na¨ıve
Bayes).
We developed and implemented a method for combining raw data and text
through the use of argument based machine learning and automatic extraction
of arguments from text. The method was demonstrated and evaluated on the
animal classification domain. Despite the fact that we used the simplest method
for finding relations in a text, namely counting relations, we still obtained very
promising results. However, assuming that if a pair of class value/attribute occurs
very frequently, then this is evidence that there exists a relationship is not correct
since high frequency can be accidental. In the current paper, we used a simple
statistical test for validating the “true” relationship. For future work, we want to
use an information-theoretically motivated measure (e.g., the pointwise mutual
information) for discovering interesting relations, which should result in even
better arguments.
      </p>
      <p>The described combination of data and text could be used also in other
domains. A possible example of such a domain is medicine, where we try to
provide a diagnosis for patients based on clinical values, and the arguments
could be extracted from several scientific papers and other published material
on the particular issue. However, the described approach in this paper is not
the only possible way of extracting arguments from text. Sometimes learning
examples have already attached commentaries given by domain experts that are
written in natural language. In medicine, for instance, doctors usually provide
their explanation of laboratory results. Another example of such a domain are
technical experiments (e.g., efficiency of jet engines), where experts explain
obtained results. We believe that in all domains of this type, sifting through several
documents to find relations between class and attributes is not the best option,
but a careful analysis of the arguments already provided would provide better
results.</p>
      <p>Finally, X-Media4 is a European project that addresses the issue of
crossmedia knowledge management in complex distributed environments. A part of
X-Media is concerned with development of principles where different types of
data (raw data, text, images) are used together to enable learning more accurate
models. We believe that ABML is a promising way of combining raw data with
textual data, whenever we can extract arguments from text.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>The authors are partially supported by the X-Media project
(http://www.x-mediaproject.org), sponsored by the European Commission as part of the Information Society
Technologies (IST) program under EC grant number IST-FP6-026978.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>ACE - Automatic Content</surname>
          </string-name>
          Extraction, http://www.nist.gov/speech/tests/ace.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>2. MUC - Message Understanding Conferences, http://cs.nyu.edu/faculty/grishman/m</mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>3. Semeval-2007, http://nlp.cs.swarthmore.edu/semeval/.</mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>A.</given-names>
            <surname>Asuncion</surname>
          </string-name>
          and
          <string-name>
            <surname>D.J. Newman.</surname>
          </string-name>
          <article-title>UCI machine learning repository</article-title>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Razvan</given-names>
            <surname>Bunescu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Raymond J.</given-names>
            <surname>Mooney</surname>
          </string-name>
          .
          <article-title>Subsequence kernels for relation extraction</article-title>
          .
          <source>In Proceedings of the 19th Conference on Neural Information Processing Systems</source>
          , Vancouver, British Columbia,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>J. Demˇsar</surname>
            and
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Zupan</surname>
          </string-name>
          .
          <article-title>Orange: From experimental machine learning to interactive data mining</article-title>
          . White Paper [http://www.ailab.si/orange],
          <source>Faculty of Computer and Information Science</source>
          , University of Ljubljana,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Christiane</given-names>
            <surname>Fellbaum. WordNet</surname>
          </string-name>
          .
          <article-title>An Electronic Lexical Database</article-title>
          . MIT Press,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Johannes</given-names>
            <surname>Furnkranz</surname>
          </string-name>
          .
          <article-title>Pruning algorithms for rule learning</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>27</volume>
          (
          <issue>2</issue>
          ):
          <fpage>139</fpage>
          -
          <lpage>171</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Claudio</given-names>
            <surname>Giuliano</surname>
          </string-name>
          , Alberto Lavelli, and
          <string-name>
            <given-names>Lorenza</given-names>
            <surname>Romano</surname>
          </string-name>
          .
          <article-title>Relation extraction and the influence of automatic named entity recognition</article-title>
          .
          <source>ACM Transactions on Speech and Language Processing</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ),
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Takaaki</surname>
            <given-names>Hasegawa</given-names>
          </string-name>
          , Satoshi Sekine, and
          <string-name>
            <given-names>Ralph</given-names>
            <surname>Grishman</surname>
          </string-name>
          .
          <article-title>Discoverying relations among named entities from large corpora</article-title>
          .
          <source>In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL</source>
          <year>2004</year>
          ), Barcelona, Spain,
          <year>2004</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Martin</surname>
            <given-names>Moˇzina</given-names>
          </string-name>
          , Matej Guid, Jana Krivec, Aleksander Sadikov, and Ivan Bratko.
          <article-title>Fighting knowledge acquisition bottleneck with argument based machine learning</article-title>
          .
          <source>In Proceedings of 18th European Conference on Artificial Intelligence (ECAI</source>
          <year>2008</year>
          ), Patras, Greece,
          <year>2008</year>
          . IOS Press.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Martin</surname>
            <given-names>Moˇzina</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jure</surname>
            <given-names>Zˇabkar</given-names>
          </string-name>
          , and Ivan Bratko.
          <article-title>Argument based machine learning</article-title>
          .
          <source>Artificial Intelligence</source>
          ,
          <volume>171</volume>
          (
          <issue>10</issue>
          /15):
          <fpage>922</fpage>
          -
          <lpage>937</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>Henry</given-names>
            <surname>Prakken</surname>
          </string-name>
          and
          <string-name>
            <given-names>Gerard</given-names>
            <surname>Vreeswijk</surname>
          </string-name>
          .
          <article-title>Handbook of Philosophical Logic, second edition</article-title>
          , volume
          <volume>4</volume>
          , chapter Logics for Defeasible Argumentation, pages
          <fpage>218</fpage>
          -
          <lpage>319</lpage>
          . Kluwer Academic Publishers, Dordrecht etc,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>