Tackling Benchmark Problems of
                   Commonsense Reasoning

           Ulrich Furbach1 , Andrew S. Gordon2 , and Claudia Schon1?
            1
                Universität Koblenz-Landau, {uli,schon}@uni-koblenz.de
                2
                  University of Southern California, gordon@ict.usc.edu


        Abstract. There is increasing interest in the field of automated com-
        monsense reasoning to find real world benchmarks to challenge and to
        further develop reasoning systems. One interesting example is the Tri-
        angle Choice of Plausible Alternatives (Triangle-COPA), which is a set
        of problems presented in first-order logic. The setting of these problems
        stems from the famous Heider-Simmel film used in early experiments in
        social psychology. This paper illustrates with two logical approaches—
        abductive logic programming and deonitc logic—how these problems can
        be solved. Furthermore, we propose an idea of how to use background
        knowledge to support the reasoning process.


1     Introduction
In his influential 1958 paper, entitled “Programs with Common Sense” [14],
John McCarthy set in motion his research agenda for Artificial Intelligence. He
proposed the use of logic and deduction to overcome the difficult challenges of
commonsense reasoning. His own pursuits led him later introduce the logic of cir-
cumscription [15], to handle the non-monotonic nature of human inference. In the
intervening decades, numerous other approaches have been proposed by different
researchers, e.g. based on probability theory or on argumentation frameworks.
Progress on varied approaches was recently demonstrated, in dramatic fashion,
in the success of IBM’s Watson system in the Jeopardy challenge [3]. Subse-
quently, there has been considerable effort to investigate the varied techniques
of the Watson system as a new programming paradigm, cognitive computing, and
apply these techniques to diverse research and commercial problems, including
eHealth, cancer research, and even supporting culinary chefs.
    Although the Jeopardy challenge served to demonstrate the potential of new
technologies, it does not provide the most appropriate benchmark problems for
testing and evaluating individual research methods and approaches. Watson’s
success required a large engineering team, integrating technologies across many
different fields of computer science. Logic-based approaches to commonsense
reasoning may increasingly play a role in future cognitive programming applica-
tions, but the Jeopardy challenge is too ambitious as a tool for benchmarking
?
    Work supported by DFG FU 263/15-1 ‘Ratiolog,’ and by the U.S. Office of Naval
    Research, grant N00014-13-1-0286.
        Fig. 1: One frame of the film used by Heider and Simmel in their study.


progress in this area. Over the years, logic-based approaches have been slow to
move beyond the ubiquitous Tweety and Emu example problems to demonstrate
their usefulness, although specialized benchmarking suites are increasingly being
used in sub-disciplines of automated reasoning, e.g. in first-order theorem prov-
ing, answer set programming, and SAT solving. Recently, new sets of benchmark
problems have been proposed for commonsense reasoning, such as the Winograd
Schema Challenge [12] and the Choice Of Plausible Alternatives challenge [21].
Both of these challenges, however, require substantial capabilities for handling
natural language (English), which complicates their use by researchers hoping
to focus specifically on logic-based reasoning approaches.
    The Triangle-COPA challenge1 [13] provides a suite of one hundred logic-
based commonsense reasoning problems, and was developed specifically for the
purpose of advancing new logical reasoning approaches. Based on an influential
psychology experiment from the 1940’s, Triangle-COPA serves as a useful tool
for studying the differences between human and logical reasoning. In the sections
that follow, we describe the Triangle-COPA challenge problems and demonstrate
that they can be solved using very different approaches to automated logical
reasoning—first using a probabilistic form of logical abduction, and second using
deontic logic—and discuss the challenges of authoring or acquiring the necessary
background knowledge.


2     The Triangle-COPA Benchmarks

In an early and influential study of human social perception, psychologists Fritz
Heider and Marianne Simmel [7] presented subjects with a short animated film
depicting the movements of two triangles and a circle in and around a box with
a hinged opening (Figure 1). Asked what they saw in the film, subjects each re-
sponded with similar narratives that anthropomorphized the moving shapes as
intentional characters with beliefs, goals, and emotions. The simplicity of the film
1
    Available at https://github.com/asgordon/TriangleCOPA/
was in sharp contrast with the richness of the subjects’ narratives, highlighting
the role of knowledge and personal experience in the process of interpretation.
Heider [6] later argued that the interpretation of intentional behavior was driven
by commonsense theories of psychology and sociology, and was the basis of hu-
man social interaction.
    How could we build a software system that was capable of interpreting the
Heider-Simmel film in the same manner as the study’s subjects? Researchers
in artificial intelligence and cognitive science have sought to construct such a
system. Thibadeau [23] takes a symbolic approach, representing the coordinates
of each object in each frame of original film, which are matched to defined action
schemas, such as opening the door or going outside the box. Pautler et al. [18]
follows a related approach, beginning with object trajectory information from
an animated recreation of the Heider-Simmel film. An incremental chart parsing
algorithm with a hand-authored action grammar is then applied to recognize
character actions as well as their intentions.
    These earlier attempts highlight several problems for the use of the original
Heider-Simmel film as a challenge problem by automated reasoning researchers.
First, any system must overcome the difficult challenge of recognizing actions
in the visual scenes, e.g. by first extracting quantitative trajectory information
from the image data. Contemporary gesture recognition methods may be suitable
for this task, using models trained on copious amounts of annotated examples.
However, the effort involved in apply these techniques shifts research attention
away from the central automated reasoning task of interpretation. Second, the
original Heider-Simmel film provides a compelling input as a challenge problem,
but the correct output is unspecified. Precisely because the input is “open to in-
terpretation” is it difficult to compare the relative performance of two competing
approaches, or even of the same approach as it develops over time.
    The Triangle Choice of Plausible Alternatives (Triangle-COPA) set of one
hundred challenge problems is a recent attempt to overcome these two problems
with the original Heider-Simmel movie [13]. Each of the one hundred questions in
this problem set describes, in English and in first order logic, a short sequence of
events involving the characters of the original Heider-Simmel film: two triangles
and a circle moving around a box with a hinged opening. This description ends
with a question that requires the interpretation of the action sequence, and pro-
vides a choice of two possible answers, also in both English and logical form. The
task is to select which of the two options would be selected by a human, where
the correctness of the choice has been validated by teams of human volunteers.
Three examples of Triangle-COPA questions are as follows:

44: The triangle opened the door, stepped outside and started to shake. Why
    did the triangle start to shake?
    (and (exit’ E1 LT) (shake’ E2 LT) (seq E1 E2))
 a. The triangle is upset.
    (unhappy’ e3 LT)
 b. The triangle is cold.
    (cold’ e4 LT)
58: A circle and a triangle are in the house and are arguing. The circle punches
    the triangle. The triangle runs out of the house. Why does the triangle leave
    the house?
    (and (argueWith’ E1 C LT) (inside’ E2 C) (inside’ E3 LT)
    (hit’ E4 C LT) (exit’ E5 LT) (seq E1 E4 E5))
 a. The triangle leaves the house because it wants the circle to come fight it
    outside.
    (and (attack’ e6 C LT) (goal’ e7 e6 LT))
 b. The triangle leaves the house because it is afraid of being further assaulted
    by the circle.
    (and (attack’ e8 C LT) (fearThat’ e9 LT e8))

83: A small triangle and big triangle are next to each other. A circle runs by
    and pushes the small triangle. The big triangle chases the circle. Why does
    the big triangle chase the circle?
    (and (approach’ E1 C LT) (push’ E2 C LT) (chase’ E3 BT C)
    (seq E1 E2 E3))
 a. The big triangle is angry that the circle pushed the small triangle, so it tries
    to catch the circle.
    (angryAt’ e4 BT C)
 b. The big triangle and circle are friends. The big triangle wants to say hello
    to the circle.
    (and (friend’ e5 BT C) (goal’ e6 e7 BT)
    (greet’ e7 BT C))

    As a benchmark set of challenge problems for automated reasoning systems,
Triangle-COPA has a number of attractive characteristics. By providing first-
order logic representations as inputs and outputs, Triangle-COPA focuses the
efforts of competitors specifically on the central interpretation problem. At the
same time, it places no constraints on the particular reasoning methods that are
actually used to select the correct answer, affording comparisons between systems
that use radically different knowledge resources and reasoning algorithms. The
relational vocabulary of Triangle-COPA literals are fixed [13], but the semantics
of these predicates are not tied to any one ontology or theory. The correct answers
of Triangle-COPA are randomly sorted, so the quality of any given system can be
gauged between that of random guessing (50%) and human performance (near
100%).
    Thus far, only Maslan et al. [13] has demonstrated an approach to solving
Triangle-COPA problems. Using five axioms and an implementation of weighted
abduction [10], the authors demonstrated that the least-cost proof of the ob-
servables in Question 83 (above) entailed answer “a”, that the big triangle (BT)
was angry at the circle (C).
    In the following two sections, we show two alternative approaches to solving
the scenario described in Question 83. Our aim is to demonstrate that this bench-
mark set of questions can serve as a grounds for comparison of different logical
formalisms, algorithms, and knowledge bases, and help the larger automated
reasoning community make progress on the difficult challenges of automated
commonsense reasoning.


3    Probabilistic Abductive Reasoning

Triangle-COPA problems can be viewed as a choice between two alternative in-
terpretations of a sequence of observable actions. Hobbs et al. [8] describes how
interpretation of natural language can be cast as a problem of logical abduc-
tion, and solved using automated abductive reasoning technologies. Abduction,
as distinct from logical deduction or induction, is a form of logical reasoning
that identifies a hypothesis that, if it were true, would logically entail the given
input. In classical logic, abduction is not a sound inference mechanism; asserting
the truth of an antecedent given an observable consequent is a logical fallacy,
“affirming the consequent.” Still, automated abductive reasoning is a natural
fit for many commonsense reasoning problems in artificial intelligence, including
the interpretation problems in Triangle-COPA.
     Automated abductive reasoning requires two mechanisms: a means of gener-
ating sets of hypotheses that entail the input, and a scoring function for preferen-
tial ordering these hypotheses. Hobbs et al. [8] described “Weighted Abduction,”
where hypotheses are generated by backchaining from the given input using the
implicature form of knowledge base axioms, unifying literals across different an-
tecedents wherever possible. The process generates an and-or proof graph similar
to that created when searching for first-order proofs by backchaining, but where
every solution in the and-or graph identifies a set of assumptions that, if true,
would logically entail the given observables. Weighted Abduction orders these
hypotheses by computing the combined cost of all assumed literals (those without
justification), through a mechanism of propagating initial costs to antecedents
during backchaining. Maslan et al. [13] demonstrated how Weighted Abduction
can be used to solve Triangle-COPA problems by searching for the least-cost set
of assumptions that entailed the literals in one of the two alternatives.
     Several researchers have pursued probabilistic reformulations of Weighted
Abduction, eschewing the use of ad-hoc weights for probabilities that might
be learned from empirical data. Ovchinnikova et al. [17] and Blythe et al. [2]
describe two recent probabilistic reformulations, each casting the and-or proof
graph as a Bayesian network whose posterior probabilities can be calculated
using belief propagation algorithms for graphical models. These efforts help to
position abductive reasoning among current approaches to uncertain inference,
and to take advantage of recent advances and tools for reasoning with Markov
Logic Networks [20]. However, a simpler formulation of probabilistic abduction
may be more appropriate when the task is only to rank possible hypotheses.
     As in other probabilistic reasoning tasks, the calculation of the joint probabil-
ity of a set of events is trivially easy if we assume that they are all conditionally
independent: the joint probability of the conjunction is the product of their prior
probabilities. If we know the prior probabilities of all assumed literals in an ab-
ductive proof (those without justification), then the naive estimate of their joint
probability is simply their product [19]. This calculation can be applied to any
solution in an and-or graph created by backchaining from the given input, giving
us a convenient means of ranking hypotheses.
     This approach allows us to use standard first-order logic and familiar tech-
nologies of lifted backchaining instead of belief propagation in graphical models.
However, by using logical inference (rather than uncertain inference) we require
that the consequent of an implication is always true when the antecedent holds,
i.e. the probability of the consequent given the antecedent is always one. Hobbs et
al. [8], building on McCarthy’s [15] formulation of circumscription, describes how
defeasible first-order axioms can be authored by the inclusion of a special etcetera
literal (etc) as a conjunct in the antecedent. These literals are constructed with
a unique predicate name that appears nowhere else in the knowledge base, and
therefore can only be assumed (via abduction), never proved. The arguments of
this predicate are all of the other variables that appear in the axiom, restricting
its unification with other etcetera literals of the same predication that may be
assumed in the proof.
     The probabilities of etcetera literals can be quantified if we interpret them
as being an unspecified conjunction of all of the unknown factors of the world
that must also be true for the antecedent to imply the consequent. Etcetera
literals are true in exactly the cases where the remaining antecedent literals and
the consequent are all true. As such, their prior probabilities are equal to the
conditional probability of the consequent given the remaining conjuncts in the
antecedent.
     Using etcetera literals, we can author defeasible versions of the five axioms
used by Maslan et al. [13] to correctly solve Triangle-COPA Question 83, above.
Here the prior probabilities of the etcetera literals are encoded directly in the
literal as its first argument, appearing a numerical constant.

 – Push: Maybe you are attacking
   (implies (and (attack e1 x y)
                      (goal e2 e1 x)
                      (etc1 push 0.9 e1 e2 x y))
             (push e3 x y))
 – Approach: Maybe you want to attack
   (implies (and (goal e1 e2 x)
                      (attack e2 x y)
                      (etc1 approach 0.9 e1 e2 x y))
            (approach e4 x y))
 – AngryAt: Maybe they attacked someone you like
   (implies (and (attack e1 y z)
                      (like e2 x z)
                      (etc1 angryAt 0.9 e1 e2 x y z))
               (angryAt e x y))
 – Attack: Maybe you are angry at them
   (implies (and (angryAt e1 x y)
                      (etc1 attack 0.9 e1 x y))
                (attack e x y))
 – Chase: Maybe you want to attack
   (implies (and (attack e1 x y)
                      (goal e2 e1 x)
                      (etc1 chase 0.9 e1 e2 x y))
                (chase e3 x y))


    Etcetera literals also afford a means of encoding the prior probabilities of
other literals directly in the knowledge base. Below we provide eight additional
axioms, one for each predicate used in either the Triangle-COPA question or in
the axioms above, where an etcetera literal is the antecedent of each predicate
form. By adding these axioms to the knowledge base, we can conduct our search
for unique sets of assumptions by backtracking on all axioms to construct an and-
or graph that terminates with etcetera literals. The probability of any solution
in this and-or graph (assuming conditional independence) is simply the product
of the priors of each etcetera literal.


    (implies (etc0_push 0.01 e x y) (push_ e x y))
    (implies (etc0_approach 0.01 e x y) (approach_ e x y))
    (implies (etc0_angryAt 0.01 e x y) (angryAt_ e x y))
    (implies (etc0_attack 0.01 e x y) (attack_ e x y))
    (implies (etc0_chase 0.01 e x y) (chase_ e x y))
    (implies (etc0_goal 0.9 e x y) (goal_ e x y))
    (implies (etc0_like 0.9 e x y) (like_ e x y))
    (implies (etc0_seq3 1.0 x y z) (seq x y z))


    Figure 2 shows a visual representation of the most probable proof (P r =
0.0043) of the given observables of Triangle-COPA Question 83, found amongst
a set of 6038 possible proofs found by backchaining on the axioms listed above.
The approach happens because the circle (C) had the goal to attack the little
triangle (LT). The push happens for this same reason, and these explanations
are unified. The chase happens because the big triangle (BT) had the goal to
attack the circle, because it was angry at the circle, because the circle’s attack
on someone that the big triangle likes. The attacks are unified, and we infer that
the big triangle likes the little triangle. Left unexplained are why the circle had
the goal of attacking the little triangle, why the big triangle likes the circle, why
attacking was the goal chosen by the big triangle, and why these eventualities
happened in this sequence. The correct alternative appears in the most-probable
proof, namely that the big triangle is angry at the circle.
          Fig. 2: The most-probable proof of Triangle-COPA Question 83.


4   Standard Deontic Logic

Now we consider a different approach, namely deontic logic, to tackle the Triangle-
COPA benchmarks. This approach appears to be promising, since in [4] it was
demonstrated that deontic logic is very well-suited for modelling different kinds
of human reasoning. There are interesting examples from cognitive psychology,
e.g. the Wason-selection task, or the suppression task, which can be formalized
in a way that they are accessible for automated reasoning systems. We are using
the tableau prover Hyper [1], a first order refutational theorem prover, which is
able to decide standard deontic logic as well.
    Standard deontic logic (SDL) is obtained from the well-known modal logic
K by adding the seriality axiom D:

                                 D : P → ♦P

In this logic, the -operator is interpreted as “it is obligatory that” and the
♦ as “it is permitted that”. The ♦-operator can be defined by the following
equivalence:
                                  ♦P ≡ ¬¬P
    The additional axiom D: P → ♦P in SDL states that if a formula has to
hold in all reachable worlds, then there exists such a world. With the deontic
reading of  and ♦ this means: Whenever the formula P ought to be, then there
exists a world where it holds. In consequence, there is always a world, which is
ideal in the sense that all the norms formulated by the ‘ought to be’-operator
hold.
    SDL can be used in a natural way to describe knowledge about norms or
licenses. The use of conditionals for expressing rules which should be considered
as norms seems likely, but holds some subtle difficulties. If we want to express
that if P then Q is a norm, an obvious solution would be to use
                                     (P → Q)
which reads it is obligatory that Q holds if P holds. An alternative would be
                                      P → Q
meaning if P holds, it is obligatory that Q holds. In [24] there is a careful dis-
cussion which of these two possibilities should be used for conditional norms.
The first one has severe disadvantages. The most obvious disadvantage is that
P together with (P → Q) does not imply Q. This is why we prefer the latter
method, where the -operator is in the conclusion of the conditional. For a more
detailed discussion of such aspects we refer to [5].
    For the examples in Triangle-COPA we argue that one can understand norms
as expectation—many emotions in everyday life can be explained with unmet
expectations. The husband not bringing flowers on the wedding anniversary and
the friend arriving delayed to a date are only two examples, where unmet expec-
tations cause negative feelings. On the other hand, expectations met can cause
positive feelings. The husband helping with the dishes causes the wife to be con-
tent. We consider the scenario described in Question 83 from Triangle-COPA
corresponding to the following set of facts:
                                 approach(e1, c, lt).                              (1)
                                 push(e2, c, lt).                                  (2)
                                 chase(e3, bt, c).                                 (3)
                                 seq(e1, e2, e3).                                  (4)
The last fact states that the eventualities e1, e2 and e3 constitute a sequence of
events.
    The question we are asking is “How does the little triangle feel?”. The two
alternatives provided are as follows:
a. The little triangle feels relieved:
   relief (e4 , lt, e3 )
b. The little triangle is angry at the big triangle:
   angryAt(e5 , lt, bt)
   The notion of fulfilled expectations can be helpful to answer this question.
The big triangle observes the circle attacking the little triangle. The little triangle
expects the big triangle to defend it. The big triangle chases the circle away from
the little triangle which corresponds to defending it. The little triangle is relieved
that the big triangle hurried to its defense.
   We need some background knowledge in this example:
 – Pushing someone means attacking someone:

                             push(E, X, Y ) → attack (E, X, Y ).                               (5)

 – Chasing an attacker means defending the person under attack:

      attack (E, X, Y ) ∧ chase(E 0 , Z, X) ∧ after (E, E 0 ) → defend (E 0 , Z, Y ). (6)

Where after is a transitive predicate, stating that one eventuality occurs after
another. after (e1 , e2 ) means that event e2 occurs after e1.
    It is possible to model expectations with the help of deontic logic. Normative
statements are used to model expected behavior. In our example, we use deontic
logic to model the fact that one should defend someone who is attacked by
someone else. This set of deontic formulae is the set of ground instances of the
following formula:

                   attack (E, Z, X) → defend (E, Y, X) ∨ Z = Y.                               (7)

Formula (7) is not a SDL formula. However, we use it as an abbreviation for its
set of ground instances. The ground instance interesting for our example is:

                   attack (e2, c, lt) → defend (e2, bt, lt) ∨ c = bt.                         (8)

With the help of formula (8), it is possible to derive that the big triangle ought
to defend the little triangle in event e2.
    Formula 8 states that in the ideal world following eventuality e2, the big tri-
angle defends the little triangle. Another possibility to express this, would be to
use the eventuality, which is part of every atom. We could state defend (e9, bt, lt)
for some new eventuality e9 and add some information stating that eventuality
e9 is the ideal successor of e2. For this it would be necessary to introduce a new
relation, connecting eventualities with its ideal successor. Since this is rather
cumbersome, we use standard deontic logic instead.
    Ground instances of the following formula can be used to deduce that some-
one is relieved if someone ought to be defended by someone and is actually
defended:

                                                                    ^
(defend (E , X , Y )∧defend (E 0 , X , Y )∧after (E, E 0 ))) → (           relief (E 00 , Y, E 0 ))
                                                                    ∀E 00
                                                              after (E 0 ,E 00 )

   A ground instance interesting for our example is:

             (defend (e2 , bt, lt) ∧ defend (e3 , bt, lt) ∧ after (e2, e3) →
             (relief (e4, lt, e3) ∧ relief (e5, lt, e3))                                       (9)

We want to use a theorem prover in order solve example 83 together with the
above introduced question. To accomplish this, the following formulae are com-
bined to one set of formulae S:
 – Formulae (1) - (4) describing the scenario,
 – the background knowledge given (5) and (6),
 – some additional formulae formalizing the after predicate,
 – the deontic logic formulae (8) and (9) stating the information about expec-
   tations and
 – some formulae stating that bt, lt and c are pairwise different.
We use Hyper to solve example 83 with the question introduced above. It is
possible to deduce that the little triangle is relieved in e4 by transforming this
reasoning task into a satisfiability test. Hyper constructs a closed hyper tableau
for S ∪ {¬relief (e4, lt, e3)} which implies that relief (e4, lt, e3) is entailed by S.
    Referring to the question “How does the little triangle feel?” formulated
before, we can use the derived relief (e4, lt, e3) to show that the second alternative
given is the correct one.
    Of course, it is not desirable to formalize all rules manually. Rules like (9) can
be generated automatically by formalizing a metarule stating that: whenever x
and y are friends and y is obliged to do something for x and actually does it, x
is relieved. This metarule can then be instantiated by the respective obligation.


5    Integration of Background Knowledge
In the previous section we used standard deontic logic to tackle one of the exam-
ples from Triangle-COPA. In addition to the formulae for normative statements,
we used formulae (5) and (6) stating some essential background knowledge. In
order to solve all Triangle-COPA benchmarks, an extensive background knowl-
edge on psychology is essential. It is labor intensive and error-prone to state the
whole background knowledge manually. Therefore it is desirable to use existing
knowledge bases. There are several detailed ontologies like Yago [22], Cyc [11],
and Sumo [16], stating knowledge about common sense. The very size of these
ontologies however forbids to use these ontologies entirely. For example Yago
contains more than 10 million entities (like persons, organizations, cities, etc.)
and contains more than 120 million facts about these entities. ResearchCyc con-
tains more than 500,000 concepts, forming an ontology in the domain of human
consensus reality. Nearly 5,000,000 assertions (facts and rules) using more than
26,000 relations interrelate, constrain, and, in effect, (partially) define the con-
cepts. And even the smallest version of Cyc, OpenCyc, still contains more than
3 million formulae.
    Therefore it is necessary to extract relevant parts from these ontologies. How-
ever brute-force extraction by selecting for example all assertions from OpenCyc
containing the word “attack” results in a set of 13,184 assertions. The vast ma-
jority of these assertions contains irrelevant information. For example assertions
about the movie “Mars attacks” are selected. These irrelevant assertions poten-
tially thwart the reasoning process, making it worthwhile to invest some effort
into carefully selecting assertions suitable as background knowledge. Partition-
ing techniques used to handle large theories with theorem provers like the SInE
(Sumo Inference Engine) [9] metaprover could be helpful to address this problem.
6   Discussion

Benchmark problems have helped to spur new ideas and compare technologies
across many areas of computer science and beyond. For researchers interested
in logical approaches to automated reasoning, as in other fields, the most useful
benchmarks will be those that focus specifically on the core research challenge,
but are not prejudice for or against any one technical approach. In this paper
we have argued that the Triangle-COPA set of challenge problems is a useful
tool for exploring the relationship between human and logical reasoning. We de-
scribed two different logical approaches for solving Triangle-COPA questions, a
probabilistic form of logical abduction and deontic logic. In so doing, we demon-
strate that the questions are agnostic to the particular logic framework that is
used to solve them. By tackling the same questions with different approaches,
we gain new insights into both the similarities and differences afforded by differ-
ent techniques. We encourage other research groups in our community to apply
their unique approaches to the same questions, to consider the similarities and
differences among approaches that go beyond the shallow characteristics of var-
ious logical notation, and to focus their efforts on overcoming the enormous
challenges of humanlike commonsense reasoning.


References
 1. Peter Baumgartner, Ulrich Furbach, and Björn Pelzer. Hyper tableaux with equal-
    ity. In Frank Pfennig, editor, CADE 21, volume 4603 of LNCS, 2007.
 2. James Blythe, Jerry R. Hobbs, Pedro Domingos, Rohit J. Kate, and Raymond J.
    Mooney. Implementing weighted abduction in markov logic. In Proceedings of
    the Ninth International Conference on Computational Semantics, IWCS ’11, pages
    55–64, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics.
 3. David Ferrucci, Anthony Levas, Sugato Bagchi, David Gondek, and Erik T.
    Mueller. Watson: Beyond jeopardy! Artificial Intelligence, 199–200(0):93 – 105,
    2013.
 4. Ulrich Furbach and Claudia Schon. Deontic logic for human reasoning. In Thomas
    Eiter, Hannes Strass, Miroslaw Truszczynski, and Stefan Woltran, editors, Ad-
    vances in Knowledge Representation, Logic Programming, and Abstract Argumen-
    tation - Essays Dedicated to Gerhard Brewka on the Occasion of His 60th Birthday,
    volume 9060 of Lecture Notes in Computer Science, pages 63–80. Springer, 2014.
 5. D. Gabbay, J. Horty, X. Parent, R. van der Meyden, and L. van der Torre, editors.
    Handbook of Deontic Logic and Normative Systems. College Publications, 2013.
 6. Fritz Heider. The Psychology of Interpersonal Relations. Hillsdale, NJ: Lawrence
    Erlbaum Associates, 1958.
 7. Fritz Heider and Marianne Simmel. An experimental study of apparent behavior.
    American Journal of Psychology, 57(2):243–259, 1944.
 8. Jerry R. Hobbs, Mark E. Stickel, Douglas E. Appelt, and Paul Martin. Interpre-
    tation as abduction. Artif. Intell., 63(1-2):69–142, October 1993.
 9. Krystof Hoder and Andrei Voronkov. Sine qua non for large theory reasoning. In
    Nikolaj Bjørner and Viorica Sofronie-Stokkermans, editors, Automated Deduction
    - CADE-23 - 23rd International Conference on Automated Deduction, Wroclaw,
    Poland, July 31 - August 5, 2011. Proceedings, volume 6803 of Lecture Notes in
    Computer Science, pages 299–314. Springer, 2011.
10. Naoya Inoue and Kentaro Inui. Ilp-based inference for cost-based abduction on
    first-order predicate logic. Journal of Natural Language Processing, 20(5):629–656,
    2013.
11. Douglas B Lenat. Cyc: A large-scale investment in knowledge infrastructure. Com-
    munications of the ACM, 38(11):33–38, 1995.
12. Hector J. Levesque. The Winograd Schema Challenge. In Logical Formalizations of
    Commonsense Reasoning, Papers from the 2011 AAAI Spring Symposium, Techni-
    cal Report SS-11-06, Stanford, California, USA, March 21-23, 2011. AAAI, 2011.
13. Nicole Maslan, Melissa Roemmele, and Andrew S. Gordon. One hundred chal-
    lenge problems for logical formalizations of commonsense psychology. In Twelfth
    International Symposium on Logical Formalizations of Commonsense Reasoning,
    Stanford, CA, 2015.
14. John McCarthy. Programs with common sense. In Semantic Information Process-
    ing, pages 403–418. MIT Press, 1968.
15. John McCarthy. Circumscription—a form of non-monotonic reasoning. Artificial
    Intelligence, 13:27–39, 1980.
16. Ian Niles and Adam Pease. Towards a standard upper ontology. In Proceedings of
    the international conference on Formal Ontology in Information Systems-Volume
    2001, pages 2–9. ACM, 2001.
17. Ekaterina Ovchinnikova, Andrew S. Gordon, and Jerry. R. Hobbs. Abduction
    for discourse interpretation: A probabilistic framework. In Joint Symposium on
    Semantic Processing, pages 42–50, 2013.
18. David Pautler, Bryan L. Koenig, Boon-Kiat Quek, and Andrew Ortony. Using
    modified incremental chart parsing to ascribe intentions to animated geometric
    figures. Behavior Research Methods, 43(3):643–665, 2011.
19. David Poole. Representing bayesian networks within probabilistic horn abduction.
    In Bruce D’Ambrosio and Philippe Smets, editors, UAI ’91: Proceedings of the
    Seventh Annual Conference on Uncertainty in Artificial Intelligence, University of
    California at Los Angeles, Los Angeles, CA, USA, July 13-15, 1991, pages 271–278.
    Morgan Kaufmann, 1991.
20. Matthew Richardson and Pedro M. Domingos. Markov logic networks. Machine
    Learning, 62(1-2):107–136, 2006.
21. Melissa Roemmele, Cosmin Adrian Bejan, and Andrew S. Gordon. Choice of
    plausible alternatives: An evaluation of commonsense causal reasoning. In AAAI
    Spring Symposium: Logical Formalizations of Commonsense Reasoning, 2011.
22. Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. Yago: A large ontol-
    ogy from wikipedia and wordnet. Web Semant., 6(3):203–217, September 2008.
23. Robert H. Thibadeau. Artificial perception of actions. Cognitive Science,
    10(2):117–149, 1986.
24. Frank von Kutschera. Einführung in die Logik der Normen, Werte und Entschei-
    dungen. Alber, 1973.