<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Commonsense Reasoning meets Theorem Proving</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ulrich Furbach</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudia Schon?</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>schong@uni-koblenz.de</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Universitat Koblenz-Landau</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>The area of commonsense reasoning aims at the creation of systems able to simulate the human way of rational thinking. This paper describes the use of automated reasoning methods for tackling commonsense reasoning benchmarks. For this we use a benchmark suite introduced in the literature. Our goal is to use general purpose background knowledge without domain speci c hand coding of axioms, such that the approach and the result can be used as well for other domains in mathematics and science. We furthermore report about a preliminary experiment for nding most plausible results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <sec id="sec-1-1">
        <title>1. The sun was rising.</title>
        <p>2. The grass was cut.
13: The pond froze over for the winter. What happened as a RESULT?</p>
      </sec>
      <sec id="sec-1-2">
        <title>1. People skated on the pond. 2. People brought boats to the pond.</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Benchmarks for Commonsense Reasoning</title>
      <p>
        For a long time, no benchmarks in the eld of commonsense reasoning were
available and most approaches were tested only using small toy examples. Recently,
this problem was remedied with the proposal of various sets of benchmark
problems. There is the Winograd Schema Challenge [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] whose problems have a clear
focus on natural language processing whereas background knowledge has an
inferior standing. Another example is the Choice Of Plausible Alternatives (COPA)
challenge1 [21] consisting of 1000 problems equally split into a development and
a test set. Each problem consists of a natural language sentence describing a
scenario and a question. In addition to that two answers are provided in natural
language. The task is to determine which one of these alternatives is the most
plausible one. Figure 1 presents two problems from this benchmark suite. Like
in the two presented examples, the questions always ask either for the cause or
the result of an observation.
      </p>
      <p>Even though for the COPA challenge capabilities for handling natural
language are necessary, background knowledge and commonsense reasoning skills
are crucial to tackle these problems as well, making them very interesting to
evaluate cognitive systems. All existing systems tackling the COPA benchmarks
focus on linguistic and statistical approaches by calculating correlational
statistics on words.</p>
      <p>
        Another set of benchmarks is the Triangle-COPA challenge2 [15]. This is a
suite of one hundred logic-based commonsense reasoning problems which was
developed speci cally for the purpose of advancing new logical reasoning
approaches. The structure of the problems is the same as in the COPA Challenge
however the problems in the Triangle-COPA challenge are not only given in
natural language but also in rst-order logic. Approaches to tackle the
TriangleCOPA challenge can be found in [15], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>In this paper we focus on the COPA challenge. However most parts of our
approach can be used for the problems of the Triangle-COPA challenge as well.
1 Available at</p>
      <p>dev.txt.
2 Available at https://github.com/asgordon/TriangleCOPA/.</p>
      <p>http://people.ict.usc.edu/
gordon/downloads/COPA-questionsCOPA Problem
formulae for
problem description
and two alternatives</p>
      <p>Find Synonyms/
Hypernyms
based on
problem signature
WordNet</p>
      <p>Create Connecting
Formulaes
based on
problem signature
WordNet
OpenCyc</p>
      <p>Select Axioms from
OpenCyc
based on
connecting formulae
use SInE and k-NN</p>
      <p>
        Create Background
Knowledge
combine
connecting formulae
selected axioms
As described afore, the creation of a system for commonsense reasoning requires
the combination of techniques from di erent areas [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Even gathering appropriate background knowledge for a speci c benchmark
problems requires the use of di erent techniques. Figure 2 depicts the di erent
steps necessary to gather suitable background knowledge for a given COPA
problem. When combining an example problem with background knowledge,
several problems have to be solved:
1. If the problem is given in natural language it has to be transformed into a
logical representation.
2. The predicate symbols used in the formalization of the example are unlikely
to coincide with the predicate symbols used in the background knowledge.
3. The background knowledge is too large to be considered as a whole.
The rst problem can be solved using the Boxer [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] system which is able to
transform natural language into rst-order logic formulae. We assume that this
is done before the techniques given in Figure 2 are applied. Please note that this
step is not necessary when benchmarks given in rst-order logic are considered,
like it is the case for the Triangle-COPA challenge.
      </p>
      <p>
        We address the second problem by using WordNet [16] to nd synonyms and
hypernyms of the predicate symbols used in the formalization of the example.
Note that the formalization of the example consists both of the formulae
describing the situation as well as the formulae for the two alternatives. In the
next step, predicate symbols used in OpenCyc [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], which are similar to these
synonyms and hypernyms are determined. With the help of this information, a
connecting set of formulae is created. In this step, it is also necessary to adjust
the arity of predicate symbols which is likely to di er since Boxer only creates
formulae with unary or binary predicates.
      </p>
      <p>
        The third problem is addressed using selection methods. For this, all predicate
symbols occurring in the formalization of the example and in the connecting set
of formulae are used. As selection methods, SInE as well as k-NN as they are
implemented in the E.T. metasystem [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] come to use. The selected axioms are
combined with the connecting set of formulae and the resulting set of formulae
constitutes the background knowledge for the example at hand.
      </p>
      <p>The COPA challenge contains two di erent categories of problems. In the
rst category, a sentence describing an observation is given and it is asked for
COPA Problem
formulae for
problem description</p>
      <p>Background
Knowledge</p>
      <p>Hyper</p>
      <p>Model</p>
      <p>Machine
Learning</p>
      <p>More plausible
alternative is ...!
COPA Problem
formulae for
problem description
two alternatives
the cause of this observation. Probem no. 1 given in Figure 1 is an example for a
question in this category. In this case, the task is to determine which of the two
provided alternatives is more likely to be the cause of the observation described
in the sentence. We call this category the cause category.</p>
      <p>In the other category a sentence describing an observation is given and it
is asked about the result of this observation. In this case the task is to decide
which of the two alternatives is more likely to result from the situation described
in the sentence. We call this second category the result category. Even thought
the category does not in uence the way the background knowledge is selected, it
is necessary to use di erent approaches for the two categories when combining
this background knowledge with automated reasoning methods.</p>
      <p>
        Figure 3 depicts how to tackle a problem from the result category in
order to determine the more plausible alternative result. Please note that the
selected background knowledge does not only consist of axioms stemming from
the knowledge base used as a source for background knowledge but also contains
the connecting formulae which were created as depicted in Figure 2. First, this
background knowledge is combined with the logical formulae representing the
description of the benchmark problem. The resulting set of formulae serves as
input for a theorem prover, in our case the Hyper prover [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Hyper constructs a
model for the set of formulae which, together with the logical representation of
the two alternatives, is used by machine learning techniques to determine which
of the two alternatives is more plausible.
      </p>
      <p>When dealing with problems in the cause category, we have to solve two
reasoning problems. Figure 4 depicts the work ow for this category. In the rst
step, the background knowledge is combined with the logical formulae
representing the rst alternative. The resulting set of formulae serves as an input for the
Hyper theorem prover, which constructs a model M1. In the second step, the
background knowledge is combined with the logical formulae representing the
second alternative and the resulting set is passed to Hyper which constructs a
model M2. Then both models are inspected in order to determine which of the
two models is closer to the formulae representing the description of the problem.
This inspection of the models is still future work. We are planning to accomplish
this with the help of machine learning.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Lessons Learnt so far</title>
      <p>We created a prototypical implementation of the work ow depicted in the
previous Section. Our implementation is able to take a problem from the COPA
challenge, selects appropriate background knowledge, generates a connecting set
of formulae and feeds everything into Hyper. The machine learning component
inspecting the generated model is in an experimental phase and is addressed in
Section 5 below.
4.1</p>
      <sec id="sec-3-1">
        <title>Issues with Inconsistencies</title>
        <p>We performed a very preliminary experiment to test this work ow. From the
COPA benchmark set we selected 100 problems. Feeding these examples into
the work ow resulted nally in 100 prove tasks for Hyper and we learned a
lot | about problems which have to be solved. Hyper found 37 proofs and 57
models; the rest are time-outs. One problem we encountered is that some
contradictions leading to a proof are introduced by selecting too general hypernyms
from WordNet. E.g. the problem description of example 1 given in Figure 1 is
transformed into the following rst-order logic formula by Boxer:
^ vcast(B) ^ nshadow(C) ^ nbody(D) ^ rof (D; C) ^ nperson(C)))):
From WordNet the system extracted the information, that `individual' is a
hypernym of `shadow' and `collection' is a hypernym of `person' leading to the
two connecting formulae:
8X(nshadow(X) ! individual(X))
8X(nperson(X) ! collection(X)):
The selection from OpenCyc resulted among others in the axiom
8X:(collection(X) ^ individual(X)):</p>
        <p>These formulae together lead to a closed tableau |a proof of unsatis ability|
because of a contradiction between a shadow which stems from a person, whereas
WordNet gives that a shadow is an individual, a person is a collection and
together with the Cyc axiom we get the contradiction. This has nothing to do with
one of the alternatives that the sun was rising or the grass was cut.</p>
        <p>
          To remedy this problem, we use a tool called KNEWS [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] 3 to disambiguate
Boxer's output. This tool calls the Babelfy4 [17] service to link entities to
BabelNet5 [18]. Babelfy is a multilingual, graph-based approach to entity linking
and word sense disambiguation. BabelNet is a multilingual encyclopedic
dictionary and a semantic network. Since the BabelNet entries are linked to
Wordnet synsets, this tool provides the suitable Wordnet synset for predicate names
generated by Boxer. In a second run of the experiment we only used the
disambiguated results to construct a bridging set of formulae and to select background
knowledge. It turned out that the selected background knowledge is much more
focused on the problem under consideration. Furthermore, only one of the 100
COPA problems we tested, was inconsistent. So we solved this rst problem by
disambiguating Boxer's output.
        </p>
        <p>The one contradiction which still occurred in the second experiments stems
directly from inconsistencies in the knowledge base used as source for background
knowledge (in our case OpenCyc). E.g. the two formulae
8Xspeed (fqpquantityfnspeed (X))
8X:speed (X)
were selected immediately leading to a contradiction which again does not have
to do anything with the two alternatives about the sun rising or the grass
being cut. This illustrates that we have to nd a way to deal with inconsistent
background knowledge.
4.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Insu cient Background Knowledge</title>
        <p>Another challenge when combining problems with background knowledge is the
lack of appropriate background knowledge. Consider the example number 13
3 Many thanks to Valerio Basile for being so kind to share his tool for disambiguation.</p>
        <p>The tool is available at https://github.com/valeriobasile/learningbyreading
4 Available at: http://babelfy.org
5 Available at: http://babelnet.org
from the COPA challenge which we presented in Figure 1. The background
knowledge selected for this example contains formulae on skating like for
example:</p>
        <p>8X(skateboard (X) ! device usercontrolled (X))
and even on iceskating</p>
        <p>8X(iceskate(X) ! isa(X; c iceskate)):
However the information, that freezing of a pond in winter results in a surface
suitable for skating, is missing. This explains, why not enough inferences were
performed and therefore the constructed model does not contain information on
ice skating.</p>
        <p>In general there are two possible explanations for the lack of inferences:
1. The way the bridging between the vocabulary used in the benchmark
problem and the vocabulary used in OpenCyc is still very prototypical. We
heavily rely on Boxer's output to construct the connecting formulae. However
sometimes Boxer marks adjectives as nouns and vice versa which misguides
the search for synonyms and hypernyms in WordNet. We are planning to
improve this by updating to the most recent Boxer version.
2. Currently, we are only using OpenCyc as a source of background knowledge.</p>
        <p>
          However this is not su cient. We are planning to remedy this situation by
including di erent other sources of background knowledge like ConceptNet
[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], the Suggested Upper Merged Ontology (SUMO) [19, 20], the Human
Emotion Ontology (HEO) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and the Emotion Ontology (EMO) [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
ConceptNet is a semantic network containing large amounts of commonsense
knowledge. This graph consists of labeled nodes and edges. The nodes are
also called concepts and represent words or word senses. The edges are
relations between these concepts and represent common-sense knowledge that
connect the concepts. Relating to the COPA problem described afore,
ConceptNet contains very helpful knowledge like the fact that in winter one likes
to skate.
        </p>
        <p>winter {CausesDesire ! skate
SUMO is a very large upper ontology containing knowledge which could be
helpful as background knowledge. For example SUMO contains the
knowledge, that icing is a subclass of freezing which could be helpful for our
benchmark problem. One very interesting point is that there is a mapping
from SUMO to WordNet synsets. This will be very helpful during the
creation of formulae bridging from the vocabulary of the benchmark problem
and the synonyms and hypernyms to the vocabulary used in SUMO.
Due the fact that the structure of problems of the Triangle-COPA challenge
is very similar to the structure of the problems of the COPA challenge,
our approach can be used for the Triangle-COPA challenge without much
changes. Since the problems given in the Triangle-COPA challenge consist of
descriptions of small episodes on interpersonal relationships, the HEO and
EMO ontology, both containing information on human emotions, provide
useful background knowledge.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Ranking of Proofs and Proof Attempts</title>
      <p>
        When using the automated reasoning system Hyper within the Deep Question
Answering system LogAnswer ([
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) we already had to tackle the problem that
the prover nearly never found a complete proof of the given problem. Instead
we had to delete some subgoals of the proof-task which could not be solved
within a given time-bound | we called this relaxation. In order to nd a best
answer of the system we had to compare several proofs, or rather proof attempts,
because of the relaxations. For this ranking we used machine learning to nd
the best proof rsp. answer. We are planning to use a similar approach for the
afore-mentioned benchmarks.
      </p>
      <p>Like depicted in Figure 4, for the cause category two models are generated.
Each of these models corresponds to knowledge which was derived using one of
the two alternatives. In future work we are planning to use these two models
together with the formulae representing the description of the situation in order
to nd out which alternative is the more plausible one. For this we are planning
to use machine learning techniques to determine if the formulae describing the
situation is rather a logical consequence of the rst or the second model. Up
till now, we focused our research on the result category which is why we cannot
provide further results for the cause category.</p>
      <p>In the sequel we describe how to use machine learning techniques for problems
of the result category and give results of a rst experiment. The situation is such
that we have a problem description P , in our COPA example above the logical
representation of `The pond froze over for the winter.' together with two possible
answers E1 and E2.</p>
      <p>In the work ow described until now, we constructed a tableau for P [ BG,
where BG is the background knowledge as discussed in Section 3. This tableau
may contain open and closed branches. The closed branches are parts of a proof
and the open branches either are a model (and hence no closed tableau exists) or
they are only open because of a time-out for this branch. With the help of this
tableau, we try to decide which of the two alternatives E1 and E2 is `closer' to a
logical consequence of P [ BG. In other words we try to decide if the constructed
tableau can be rather closed by adding alternative E1 or by adding alternative
E2.</p>
      <p>In the LogAnswer system we gave the two answers to humans to decide
which is closer to a logical consequence and we then used this information to
train a machine learning system. For the scenario of the COPA and
TriangleCOPA benchmarks we designed a preliminary study, which aims at using the
information about the tableau created for P [ BG together with information
from formulae of the problem and the background P [ BG to generate examples
for training.</p>
      <p>We restricted our preliminary study to propositional logic and analyzed
tableaux created by the Hyper prover for randomly created sets of clauses. For
each pair of propositional logic variables p and q occurring in a clause set, we
were interested in the question if p or q is `closer' to a logical consequence. We
reduced this question to a classi cation problem: for each pair of variables p
and q, the task is to learn if p &lt; q, p &gt; q or p = q, where p &lt; q means, that
q is `closer' to a logical consequence than p and p = q means that p's and q's
`closeness' to a logical consequence is equal. Consider the following set of clauses:
p0
p4 ! p2 _ p3 _ p7
p0 ! p4
p3 ^ p5 ! p6
p3 ^ p5 ^ p8 ! p1</p>
      <p>p2 ! ?
Clearly, p0 and p4 are logical consequences of this clause set. Therefore p0 = p4
and p0 &gt; q for all other variables q. On the other hand, from p2 it is possible to
deduce a contradiction, which leads to p2 &lt; q for all other variables q. Comparing
p6 and p1 is a little bit more complicated. Neither of these variables is a logical
consequence. However assuming p3 and p5 to be true, allows to deduce p6 but
not p1. In oder to deduce p1 it is necessary to assume not only p3 and p5 to be
true but also p8. Therefore p1 &lt; p6.</p>
      <p>To use machine learning techniques to classify these kind of examples, we
represent each pair of variables (p; q) as an instance of the trainings examples
and we provide the information, which of the three relations &lt;; &gt;; = is correct for
p and q. Each of these instances contains 22 attributes. Some of these attributes
represent information on the clause set like the proportion of clauses with p or
q in the head as well as rudimentary dependencies between the variables in the
clause set. In addition to that, we determine attributes representing information
on the hypertableau for the set of clauses like the number of occurrences of
p and q in open branches. Furthermore, we determine an attribute mimicking
some aspects of abduction by estimating the number of variables which have to
be assumed to be true in order to deduce p or q respectively. This allows us to
perform comparisons like the one between p1 and p6 in the above example. Of
course we also take into account whether one of the two variables is indeed a
logical consequence.</p>
      <p>
        For the rst experiments, 1,000 sets of clauses each consisting of about 10
clauses and containing about 12 variables were randomly generated. These sets of
clauses were analyzed and used to create a training set. For each pair of variables
occurring in one of the clause sets, an instance was generated. All in all this led to
123,246 examples for training purposes. In these examples, the classes &lt; and &gt;
each consists of 57,983 examples and the class = of 7,280 examples. We used the
J48 tree classi er implemented in the Weka [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] system to construct a decision
tree for the training set. This classi er implements the C4.5 algorithm. The
result was a decision tree for our training examples. We tested this decision tree
with a test set which was generated from 100 randomly generated sets of clauses
di erent from the clause sets used for the training examples. This resulted in a
test set consisting of 12,198 instances. The learnt decision tree correctly classi ed
98.02 % instances of our test set. Table 1 provides information on correctly and
incorrectly classi ed instances of the di erent classes.
      </p>
      <p>We are aware that automatically classifying the test set might introduce
errors into the test set and therefore tampers the results. Since it is very
laborintensive to manually generate test data, we only created test instances from two
clause sets manually. For this much smaller test set, depending on the classi er
used, we reached percentages of correctly classi ed instances of up to 80 %.</p>
      <p>In the next step, we are planning to expand our experiments to clause sets
P [ BG and alternatives E1 and E2 given in rst-order logic. When creating
the instances of the trainings examples for rst-order logic, attributes di erent
from the ones in the previous experiment have to be considered. For example it
is not su cient to have an attribute indicating the proportion of open branches
containing a certain subgoal of E1. Since both the open branch and the subgoal
of E1 are given in rst-order logic, uni cation has to be taken into account.
In this case the proportion of open branches containing an atom which can
be uni ed with one of the subgoals of E1 constitutes an interesting attribute.
Another interesting attribute would take relaxation into account: for an open
branch not containing any atom which can be uni ed with one of the subgoals
of E1, it is interesting if this branch contains an atom which can be uni ed with
a generalization of one of the subgoals in E1. This attribute allows us to mimic
the relaxation like it is used in the LogAnswer system.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>We presented an approach to tackle benchmarks for commonsense reasoning.
This approach relies on large existing ontologies as a source for background
knowledge and combines di erent techniques like theorem proving and machine
learning with tools for natural language processing. With the help of a
prototypical implementation of our approach, we conducted some experiments with
problems from the COPA challenge. We presented our experiences made in these
experiments together with possible solutions for the problems occurring in the
examples considered.</p>
      <p>Future work aims at the integration of additional sources of background
knowledge as well as improving the bridging between the vocabulary used in the
benchmarks and the background knowledge.
15. N. Maslan, M. Roemmele, and A. S. Gordon. One hundred challenge problems
for logical formalizations of commonsense psychology. In Twelfth International
Symposium on Logical Formalizations of Commonsense Reasoning, Stanford, CA,
2015.
16. G. A. Miller. WordNet: a lexical database for english. Commun. ACM, 38(11):39{
41, 1995.
17. A. Moro, A. Raganato, and R. Navigli. Entity Linking meets Word Sense
Disambiguation: a Uni ed Approach. Transactions of the Association for Computational
Linguistics (TACL), 2:231{244, 2014.
18. R. Navigli and S. P. Ponzetto. BabelNet: The automatic construction,
evaluation and application of a wide-coverage multilingual semantic network. Arti cial
Intelligence, 193:217{250, 2012.
19. I. Niles and A. Pease. Towards a standard upper ontology. In Proceedings of
the international conference on Formal Ontology in Information Systems-Volume
2001, pages 2{9. ACM, 2001.
20. A. Pease. Ontology: A Practical Guide. Articulate Software Press, Angwin, CA,
2011.
21. M. Roemmele, C. A. Bejan, and A. S. Gordon. Choice of plausible alternatives: An
evaluation of commonsense causal reasoning. In AAAI Spring Symposium: Logical
Formalizations of Commonsense Reasoning, 2011.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          , E. Cabrio, and
          <string-name>
            <given-names>F.</given-names>
            <surname>Gandon</surname>
          </string-name>
          .
          <article-title>Building a general knowledge base of physical objects for robots</article-title>
          .
          <source>In The Semantic Web. Latest Advances and New Domains</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>M.</given-names>
            <surname>Bender</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pelzer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Schon</surname>
          </string-name>
          .
          <article-title>System description: E-KRHyper 1.4 { extensions for unique names and description logic</article-title>
          . In M. P. Bonacina, editor,
          <source>CADE-24, LNCS 7898</source>
          , pages
          <fpage>126</fpage>
          {
          <fpage>134</fpage>
          . Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Curran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Clark</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Bos</surname>
          </string-name>
          .
          <article-title>Linguistically motivated large-scale NLP with C&amp;C and Boxer</article-title>
          .
          <source>In Proceedings of the ACL 2007 Demo and Poster Sessions</source>
          , pages
          <volume>33</volume>
          {
          <fpage>36</fpage>
          , Prague, Czech Republic,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>U.</given-names>
            <surname>Furbach</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          <article-title>Glockner, and</article-title>
          <string-name>
            <given-names>B.</given-names>
            <surname>Pelzer</surname>
          </string-name>
          .
          <article-title>An application of automated reasoning in natural language question answering</article-title>
          .
          <source>AI Commun</source>
          .,
          <volume>23</volume>
          (
          <issue>2-3</issue>
          ):
          <volume>241</volume>
          {
          <fpage>265</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>U.</given-names>
            <surname>Furbach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gordon</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Schon</surname>
          </string-name>
          .
          <article-title>Tackling benchmark problems for commonsense reasoning</article-title>
          .
          <source>In Proceedings of Bridging - Workshop on Bridging the Gap between Human and Automated Reasoning</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>U.</given-names>
            <surname>Furbach</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Schon</surname>
          </string-name>
          .
          <article-title>Commonsense reasoning meets theorem proving</article-title>
          .
          <source>In Proceedings of the 1st Conference on Arti cial Intelligence and Theorem Proving AITP'16</source>
          ,
          <string-name>
            <surname>Obergurgl</surname>
          </string-name>
          , Austria,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Gordon</surname>
          </string-name>
          .
          <article-title>Commonsense interpretation of triangle behavior</article-title>
          . In D. Schuurmans and
          <string-name>
            <surname>M. P.</surname>
          </string-name>
          Wellman, editors,
          <source>Proceedings of the Thirtieth AAAI Conference on Arti cial Intelligence</source>
          ,
          <source>February 12-17</source>
          ,
          <year>2016</year>
          , Phoenix, Arizona, USA., pages
          <volume>3719</volume>
          {
          <fpage>3725</fpage>
          . AAAI Press,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>M.</given-names>
            <surname>Grassi</surname>
          </string-name>
          .
          <source>Biometric ID Management and Multimodal Communication: Joint COST 2101</source>
          and 2102 International Conference,
          <source>BioID MultiComm</source>
          <year>2009</year>
          , Madrid, Spain,
          <source>September 16-18</source>
          ,
          <year>2009</year>
          . Proceedings, chapter
          <source>Developing HEO Human Emotions Ontology</source>
          , pages
          <volume>244</volume>
          {
          <fpage>251</fpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Hall</surname>
          </string-name>
          , E. Frank,
          <string-name>
            <given-names>G.</given-names>
            <surname>Holmes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pfahringer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Reutemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and I. H.</given-names>
            <surname>Witten</surname>
          </string-name>
          .
          <article-title>The WEKA data mining software: an update</article-title>
          .
          <source>SIGKDD Explorations</source>
          ,
          <volume>11</volume>
          (
          <issue>1</issue>
          ):
          <volume>10</volume>
          {
          <fpage>18</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>J. Hastings</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Ceusters</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Smith</surname>
            , and
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Mulligan</surname>
          </string-name>
          .
          <article-title>Modeling and Using Context: 7th International</article-title>
          and Interdisciplinary Conference,
          <string-name>
            <surname>CONTEXT</surname>
          </string-name>
          <year>2011</year>
          , Karlsruhe, Germany,
          <source>September 26-30</source>
          ,
          <year>2011</year>
          . Proceedings, chapter The Emotion Ontology:
          <article-title>Enabling Interdisciplinary Research in the A ective Sciences</article-title>
          , pages
          <volume>119</volume>
          {
          <fpage>123</fpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>C. Kaliszyk</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Schulz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Urban</surname>
            , and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Vyskocil</surname>
          </string-name>
          . System description: E.T.
          <volume>0</volume>
          .1.
          <string-name>
            <surname>In</surname>
            <given-names>A. P.</given-names>
          </string-name>
          <string-name>
            <surname>Felty</surname>
            and
            <given-names>A</given-names>
          </string-name>
          . Middeldorp, editors,
          <source>Proceedings of CADE-25</source>
          , Berlin, Germany,
          <year>2015</year>
          , volume
          <volume>9195</volume>
          <source>of LNCS</source>
          . Springer,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>D. B. Lenat</surname>
          </string-name>
          .
          <article-title>Cyc: A large-scale investment in knowledge infrastructure</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>38</volume>
          (
          <issue>11</issue>
          ):
          <volume>33</volume>
          {
          <fpage>38</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Levesque</surname>
          </string-name>
          .
          <article-title>The Winograd Schema Challenge</article-title>
          .
          <source>In Logical Formalizations of Commonsense Reasoning, Papers from the 2011 AAAI Spring Symposium</source>
          ,
          <source>Technical Report SS-11-06</source>
          , Stanford, California, USA, March
          <volume>21</volume>
          -23,
          <year>2011</year>
          . AAAI,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. H. Liu and
          <string-name>
            <given-names>P.</given-names>
            <surname>Singh</surname>
          </string-name>
          .
          <article-title>ConceptNet | a practical commonsense reasoning tool-kit</article-title>
          .
          <source>BT Technology Journal</source>
          ,
          <volume>22</volume>
          (
          <issue>4</issue>
          ):
          <volume>211</volume>
          {
          <fpage>226</fpage>
          ,
          <string-name>
            <surname>Oct</surname>
          </string-name>
          .
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>