<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Ontologies in a Cognitive-Grounded System: Automatic Action Recognition in Video Surveillance</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alessandro Oltramari</string-name>
          <email>aoltrama@andrew.cmu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christian Lebiere</string-name>
          <email>cl@cmu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Psychology, Carnegie Mellon University</institution>
          ,
          <addr-line>Pittsburgh, Pennsylvania 15217</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>-This article presents an integrated cognitive system for automatic video surveillance: in particular, we focus on the task of classifying the actions occurring in a scene. For this purpose, we developed a semantic infrastructure on top of a hybrid computational ontology of actions. The article outlines the core features of this infrastructure, illustrating how the processing mechanisms of the cognitive system benefit from knowledge capabilities in fulfilling the recognition goal. Ultimately, the paper shows that ontologies can enhance a cognitive architecture's functionalities, allowing for high-level performance in complex task execution.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>I. INTRODUCTION</p>
      <p>
        The automatic detection of anomalous and threatening
behaviour has recently emerged as a new area of interest in video
surveillance: the aim of this technology is to disambiguate the
context of a scene, discriminate between different types of
human actions, eventually predicting their outcomes. In order
to achieve this level of complexity, state-of-the-art computer
vision algorithms [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] need to be complemented with
higherlevel tools of analysis involving, in particular, knowledge
representation and reasoning (often under conditions of
uncertainty). The goal is to approximate human visual intelligence
in making effective and consistent detections: humans evolved
by learning to adapt and properly react to environmental
stimuli, becoming extremely skilled in filtering and generalizing
over perceptual data, taking decisions and acting on the basis
of acquired information and background knowledge.
In this paper we first discuss the core features of human
‘visual intelligence’ and then describe how we can simulate
and approximate this comprehensive faculty by means of an
integrated framework that augments ACT-R cognitive
architecture (see figure 1) with background knowledge expressed by
suitable ontological resources (see section III-B2). ACT-R is
a modular framework whose components include perceptual,
motor and memory modules, synchronized by a procedural
module through limited capacity buffers (refer to [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for more
details). ACT-R has accounted for a broad range of cognitive
activities at a high level of fidelity, reproducing aspects of
human data such as learning, errors, latencies, eye movements
and patterns of brain activity. Although it is not our purpose
in this paper to present the details of the architecture, two
specific mechanisms need to be mentioned here to sketch how
the system works: i) partial matching - the probability that
two different knowledge units (or declarative chunks) can be
associated on the basis of an adequate measure of similarity
(this is what happens when we consider, for instance, that a
bag is more likely to resemble to a basket than to a wheel);
ii) spreading of activation - when the same knowledge unit
is part of multiple contexts, it contributes to distributionally
activate all of them (like a chemical catalyst may participate
in multiple chemical transformations). Section 7 will show
in more details how these two mechanisms are exploited by
the cognitive system to disambiguate action signals:
henceforth, we will refer to this system as the Cognitive Engine.
As much as humans understand their surroundings coupling
perception with knowledge, the Cognitive Engine can mimic
this capability by leveraging scene-parsing and disambiguation
with suitable ontology patterns and models of actions, aiming
at identifying relevant actions and spotting the most anomalous
ones.
      </p>
      <p>In the next sections we present the different aspects of the
Cognitive Engine, discussing the general framework alongside
specific examples.</p>
    </sec>
    <sec id="sec-2">
      <title>II. THE CONCEPTUAL FEATURES OF VISUAL</title>
      <p>INTELLIGENCE</p>
      <p>
        The territory of ‘visual intelligence’ needs to be explored
with an interdisciplinary eye, encompassing cognitive
psychology, linguistics and semantics: only under these conditions
can we aim at unfolding the variety of operations that visual
intelligence is responsible for, the main characteristics of
the emergining representations and, most importantly in the
present context, at reproducing them in an artificial agent.
As claimed in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],“events are understood as action-object
couplets” (p. 456) and “segmenting [events as couplets] reduces
the amount of information into manageable chunks” (p. 457),
where the segment boundaries coincide with achievements
and accomplishments of goals (p.460). Segmentation is a
keyfeature when the task of disambiguating complex scenarios is
considered: recognition doesn’t correspond to the process of
making an inventory of all the actions occurring in a scene: a
selection process is performed by means of suitable ‘cognitive
schemas’ (or gestalts, e.g. up/down, figure/ground, force, etc.),
which carve visual presentations according to principles of
mental organization and optimize the perceptual effort” [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
Besides cognitive schemas, conceptual primitives have also
been studied: in particular, [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] applied Hayes’ na¨ıve physics
theory [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to build an event logic. Within the adopted common
sense definitions, we can mention i) substantiality (objects
generally cannot pass through one another); ii) continuity
(objects that diachronically appear in two locations must have
moved along the connecting path); iii) ground plane (ground
acts as universal support for objects).
      </p>
      <p>
        As far as action-object pairs are central to characterize the
‘ontology of events’, verb-noun ‘frames’ are also relevant at
the linguistic level1; in particular, identifying roles played
by objects in a scene is necessary to disambiguate action
verbs and highlight the underlying goals. In this respect,
studies of event categorization revealed that events are always
packaged, that is distinctly equipped with suitable semantic
roles [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]: for example, the events which are exemplified by
motion verbs like walk, run, fly, jump, crawl, etc. are generally
accompanied with information about source, path, direction
and destination/goal, as in the proposition “John ran out of
the house (source), walking south (direction) along the river
(path), to reach Emily‘s house (destination/goal)”; conversely,
verbs of possession such as have, hold, carry, get, etc. require
different kind of semantic information, as in the proposition
“John (owner) carries Emily’s bag (possession)”. Note that it
is not always the case that all possible semantic roles are filled
by linguistic phrases: in particular, path and direction are not
necessarily specified when motion is considered, while source
1We refer here to the very broad notion of ‘frame’ introduced by Minsky:
“frames are data-structure for representing a stereotyped situation, like being
in a certain kind of living room, or going to a child‘s birthday party” [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
and destination/goal are (we do not focus here on agent and
patient which are the core semantic roles).
      </p>
      <p>As this overview suggests, there is an intimate connection
between linguistics, cognition and ontology both at the level of
scene parsing (mechanism-level) and representation
(contentlevel). In particular, in order to build a visual intelligent system
for action recognition, three basic functionalities are required:
Ontology pattern matching - comparing events on the
basis of the similarity between their respective pattern
components: e.g., a person’s burying an object and a
person’s digging a hole are similar because they both
include some basic body movements as well as the act
of removing the soil;
Conceptual packaging - eliciting the conceptual
structure of actions in a scene through the identification of
the roles played by the detected objects and trajectories:
e.g. if you watch McCutchen hitting an homerun, the
Pittsburgh Pirates’ player number 22 is the ‘agent’, the
ball is the patient, the baseball bat is the ‘instrument’,
toward the tribune is the ‘direction’, etc.).</p>
      <p>Causal selectivity: attentional mechanisms drive the
visual system in picking the causal aspects of a scene, i.e.
selecting the most distinctive actions and discarding
collateral or accidental events (e.g., in the above mentioned
homerun scenario, focusing on the movements of the first
baseman is likely to be superfluous).</p>
      <p>In the next section we describe how the Cognitive Engine
realizes the first two functionalites by means of combining the
architectural features of ACT-R with ontological knowledge,
while Causal selectivity will be addressed in future work.</p>
    </sec>
    <sec id="sec-3">
      <title>III. BUILDING THE COGNITIVE ENGINE</title>
      <sec id="sec-3-1">
        <title>A. The Context</title>
        <p>
          The Cognitive Engine represents the core module of the
Extended Activity Reasoning system (EAR) in the
CMUMinds Eye architecture (see figure 2). Mind’s Eye is the name
of the DARPA program2 for building AI systems that can filter
surveillance footage to support human (remote) operators, and
automatically alert them whenever something suspicious is
recognized (such as someone leaving a package in a parking
lot and running away – see also [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]). In this framework,
visual intelligent systems play the role of filtering computer
vision data, suitably coupling relevant signals with background
knowledge and – when feasible – searching for a ‘script’
that ties together all the most salient actions in a scene.
This comprehensive capability requires intensive information
processing at interconnected levels: basic optical features
(lowlevel), object detection (mid-level) and event classification
(high-level). EAR has been conceived to deal with the last
one: in particular the Cognitive Engine receives outputs from
the Immediate Activity Recognition module (IAR), which
collects the results of different pre-processing algorithms and
adopts learning–based methods to output action probability
distributions [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>2http://www.darpa.mil/Our Work/I2O/Programs/Minds Eye.aspx</p>
        <p>Specific parsing functions are included in EAR to convert
the IAR output into sequences of quasi-propositional
descriptions of atomic events to be fed to the Cognitive Engine.
For example, the sample video strip in figure 3 can be
converted into (a):
(a) Person1 Holds Bag2 + Person1 Bends Over + Person1 Drags
Bag2 + Person1 Stops.</p>
        <p>
          These sequences reflect the most likely atomic events
(so called ‘micro-actions’, ‘micro-states’ and‘micro-poses’)
occurring in the environment, detected and thresholded by
machine vision algorithms. The addition symbol exemplifies
temporal succession while numbers stand for entity unique
identifiers. For the sake of readability, we omit here the
temporal information about start and end frames of the single
atomic-events, as well as spatial coordinates of the positions
of objects. Leveraging the semantic properties of sequences
like (a), the Cognitive Engine aims at generalizing over action
components and distill the most likely ‘unifying story’: for
instance, figure 3 depicts a person hauling an object to the top
left side of the scene. Ontology patterns [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] of action play
a key-role in the process of sequence disambiguation: in this
regard, III-B reviews some of the core patterns we adopted
in the recognition mechanisms of the Cognitive Engine and
outlines the basic classes and properties of the ontology of
actions used for high-level reasoning. The benefits of using
ontologies for event recognition in the context of the Mind’s
Eye program have been also discussed in [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], although our
two approaches differ both in the theoretical underpinnings
(as the next sections will show, we propose a hybridization of
linguistic and ontological distinctions rather than embracing
ontological realism) and in the general system design (in
[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] the authors outline a framework in which ontological
knowledge is directly plugged into visual algorithms, while in
our proposal ACT-R is exploited as an intermediate module
to bridge the vision and the knowledge levels, stressing the
role of cognitive mechanisms in action understanding).
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>B. The Knowledge Infrastructure 1) Ontology patterns of actions: In recent years,‘Ontology</title>
        <p>
          Design Patterns’ (or just ‘ontology patterns’) have become
an important resource in the areas of Conceptual Modeling
and Ontology Engineering: the rationale is to identify some
minimal conceptual structures to be used as the building
blocks for designing ontologies [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Ontology patterns are
small models of entities and their basic properties: the notion
originates in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], where the author argues that a good
(architectural) design can be achieved by means of a set of rules
that are packaged in the form of patterns, such as ‘windows
place’, or ‘entrance room’. Design patterns are then assumed
as archetypal solutions to design problems in a certain context.
Ontology patterns are built and formalized on the basis of a
preliminary requirement analysis, which can be driven either
by applications tasks or by specific problems in the domain of
interest. In our context, ontology patterns enable the
classification of actions by means of pinpointing the basic semantic
roles and constituent atomic events of relevant actions. In these
regards, table I shows the composition of the core ontology
patterns used in the Cognitive Engine: e.g. an instance of the
action-type ‘pick-up’ depends on the occurrence of at least
four basic components (C1-C4), namely ‘bend-over’,
‘lowerarm’, ‘stand-up’ (necessary body-movements) and ‘holding’
(referring to the interaction between a person and an object);
moreover, those action-verbs require specific conceptual roles
to be exemplified, respectively, protagonist for the first and
the third component, agent for the second and the fourth
(which includes also ‘patient’ as object-role). But what did
inspire our modeling choices? How could we identify those
roles and atomic events? Which rules/principles allowed us
to assemble them in that very fashion? In order to answer to
these questions, in the next section we introduce HOMINE,
the Hybrid Ontology for the Mind’s Eye project.
        </p>
        <p>
          2) Ontology of actions: Ontologies play the role of
‘semantic specifications of declarative knowledge’ in the framework
of cognitive architectures [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. As [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]
demonstrate, most research efforts have focused on designing
methods for mapping large knowledge bases to the ACT-R
declarative module. Here we commit on taking a different
approach: instead of tying to a single monolithic large
knowledge base, we built a hybrid resource that combines different
semantic modules, allowing for high scalability and
interoperability. Our proposal consists in suitably linking distinctive
lexical databases, i.e. WordNet [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] and FrameNet [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] with
a computational ontology of actions, plugging the obtained
semantic resource in the dynamic mechanisms of the
ACTR cognitive architecture (see IV). Accordingly, HOMINE is
built on the top-level of DOLCE-SPRAY [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], a simplified
version of DOLCE [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]: we used DOLCE-SPRAY as a general
model for aligning WordNet (WN) and FrameNet (FN) –
following the line of research of [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]: figure 4 shows some
selected nodes of DOLCE backbone taxonomy. The root of the
hierarchy of DOLCE-SPRAY is ENTITY, which is defined as
anything which is identifiable by humans as an object of
experience or thought. The first distinction is among
CONCRETEENTITY, i.e. objects located in definite spatial regions, and
ABSTRACT-ENTITY, whose instances don’t have spatial
properties. In the line of [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], CONCRETE-ENTITY is further split
in CONTINUANT and OCCURRENT, namely entities without
inherent temporal parts (e.g. artifacts, animals, substances)
and entities with inherent temporal parts (e.g. events, actions,
states) respectively. The basic ontological distinctions are
maintained: DOLCE’s ENDURANT and PERDURANT match
DOLCE-SPRAY’s CONTINUANT and OCCURRENT. The main
difference of DOLCE-SPRAY’s top level with respect to DOLCE,
is the merging of DOLCE’s ABSTRACT and
NON-PHYSICALENDURANT categories into the DOLCE-SPRAY’s category of
ABSTRACT-ENTITY. Among abstract entities, DOLCE-SPRAY’s
top level distinguishes CHARACTERIZATION, defined as
mapping of n-uples of individuals to truth values. Individuals
belonging to CHARACTERIZATION can be regarded to as
‘reified concepts’, and the irreflexive, antisymmetric relation
CHARACTERIZE associates them with the objects they denote.
Whether CHARATERIZATION is formally a metaclass, and
whether CHARACTERIZE bears the meaning of set membership
is left opaque in this ontology.
        </p>
        <p>HOMINE’s linguistic-semantic layer is based on a partition
of WN related to verbs of action, such as ‘haul’, ‘pick-up’,
‘carry’, ‘arrive’, ‘bury’ etc. WN is a semantic network whose
nodes and arcs are, respectively, synsets (“sets of synonym
terms”) and semantic relations. Over the years, there has
been an incremental growth of the lexicon (the latest version,
WordNet 3.0, contains about 120K synsets), and substantial
enhancements aimed at facilitating computational tractability.
In order to find the targeted group of relevant synsets, we
basically started from two pertinent top nodes3, move #1 and
3AKA Unique Beginners (Fellbaum 1998).
move#24. As one can easily notice, the former synset denotes
a change of position accomplished by an agent or by an
object (with a sufficient level of autonomy), while the latter is
about causing someone or something to move (both literally
and figuratively). After extracting the sub–hierarchy of synsets
related to these generic verbs of action, we introduced a
topmost category ‘movement-generic’, abstracting from the two
senses of ‘move’ (refer to figure 5 for the resulting taxonomy
of actions).</p>
        <p>
          FrameNet (FN) is the additional conceptual layer of
HOMINE. Besides wordnet-like databases, a computational
lexicon can be designed from a different perspective, for
example focusing on frames, to be conceived as orthogonal
401835496 move#1, travel#1, go#1, locomote#1 (change location; move,
travel, or proceed) “How fast does your new car go?”; “The soldiers moved
towards the city in an attempt to take it before night fell”. 01850315 move#2,
displace#4 (cause to move or shift into a new position or place, both in a
concrete and in an abstract sense) “Move those boxes into the corner, please”;
“The director moved more responsibilities onto his new assistant”.
to domains. Inspired by frame semantics [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ], FN aims at
documenting “the range of semantic and syntactic combinatory
possibilities (valences) of each word in each of its senses”
through corpus-based annotation. Different frames are evoked
by the same word depending on different contexts of use: the
notion of ‘evocation’ helps in capturing the multi-dimensional
character of knowledge structures underlying verbal forms.
For instance, if you consider the bringing frame, namely
an abstraction of a state of affairs where sentient agents
(e.g., persons) or generic carriers (e.g. ships) bring something
somewhere along a given path, you will find several ‘lexical
units’ (LUs) evoking different roles (or frame elements - FEs):
i.e., the noun ‘truck’ instantiates the ‘carrier’ role. In principle,
the same Lexical Unit (LU) may evoke distinct frames, thus
dealing with different roles: ‘truck’, for example, can be also
associated to the vehicle frame (‘the vehicles that human
beings use for the purpose of transportation’). FN contains
about 12K LUs for 1K frames annotated in 150000 sentences.
WN and FN are based on distinct models, but one can benefit
from the other in terms of coverage and type of information
conveyed. Accordingly, we have analyzed the evocation-links
between the action verbs we have extracted from WN and the
related FN frames: those links can be generated through ‘FN
Data search’, an on–line navigation interface used to access
and query FN5. Using a specific algorithm [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ], WordNet
synsets can be associated with FrameNet frames, ranking the
results by assigning weights to the discovered connections
[
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]. The core mechanism can be resumed by the following
procedure: first of all the user has to choose a term and look for
the correspondent sense in WordNet; once the correct synset
is selected, the tool searches for the corresponding lexical
units (LUs) and frames of FrameNet. Afterwards, all candidate
frames are weighted according to three important factors:
the similarity between the target word (the LU having some
correspondence to the term typed at the beginning) and the
wordnet relative (which can be the term itself - if any - and/or
its synonyms, hypernyms and antonyms); a variable boost
factor that rewards words that correspond to LU as opposed
to those that match only the frame name; the spreading factor,
namely the number of frames evoked by that word:
similarity(wordnet relative;target word) BoostF actor
spreading f actor(wordnet relative)
        </p>
        <p>
          If DOLCE-SPRAY provides the axiomatic basis for the formal
characterization of HOMINE6, and WN and FN computational
lexicons populate the ontology with linguistic knowledge,
SCONE is the selected framework of implementation7.
SCONE is an open–source knowledge-base system intended
for use as a component in many different software
applications: it provides a LISP-based framework to represent and
reason over symbolic common–sense knowledge. Unlike most
diffuse KB systems, SCONE is not based on OWL (Ontology
Web Language8) or Description Logics in general [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]: its
inference engine adopts marker–passing algorithms [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]
(originally designed for massive parallel computing) to perform
fast queries at the price of losing logical completeness and
decidability. In particular, SCONE represents knowledge as a
semantic network whose nodes are locally weighted (marked)
and associated to arcs (wires9) in order to optimize basic
reasoning tasks (e.g. class membership, transitivity, inheritance
5https://framenet.icsi.berkeley.edu/fndrupal/index.php?q=luIndex
6For instance, DOLCE adapts Allen‘s temporal axioms [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ], which are
considered as state of the art in temporal representation and reasoning.
7http://www.cs.cmu.edu/ sef/scone/
8http://www.w3.org/TR/owl-features/
9In general, a wire can be conceived as a binary relation whose domain
and range are referred to, respectively, as A-node and B-node.
of properties, etc. ). The philosophy that inspired SCONE is
straightforward: from vision to speech, humans exploit the
brain’s massive parallelism to fulfill all recognition tasks; if
we want to build an AI system which is able to deal with
the large amount of knowledge required in common-sense
reasoning, we need to rely on a mechanism which is fast
and effective enough to simulate parallel search. Accordingly,
SCONE implementation of marker–passing algorithms aims
at simulating a pseudo-parallel search by assigning specific
marker bits to each knowledge unit. For example, if we want
to query a KB to get all the parts of cars, SCONE would
assign a marker M1 to the A-node CAR and search for all
the statements in the knowledge base where M1 is the A-wire
(domain) of the relation PART-OF , returning all the classes
in the range of the relation (also called ‘B-nodes’). SCONE
would finally assign the marker bit M2 to all B-nodes, also
retrieving all the inherited subclasses10. The modularization
and implementation of HOMINE with SCONE allows for
an effective formal representation and inferencing of core
ontological properties of events, such as: i) participation of
actors and objects in actions; ii) temporal features based on the
notions of ‘instant’ and ‘interval’; iii) common-sense spatial
information.
        </p>
        <p>
          The Cognitive Engine is the result of augmenting ACT-R
with HOMINE: in general we refer to ACT-R including the
SCONE extra-module as ACT-RK, meaning ‘ACT-R with
improved Knowledge capabilities’ (the reader can easily notice
the evolution from the original ACT-R architecture – figure 1
– to the knowledge-enabled one – figure 6). We engineered a
SCONE-MODULE as a bridging component between the
cognitive architecture and the knowledge resource: this integration
allows for dynamic queries to be automatically submitted to
HOMINE by ACT-RK whenever the visual information is
incomplete, corrupted or when reasoning with common-sense
knowlege is needed to generalize over actor and actions in a
scene. In this way, the Cognitive Engine is able to overcome
situations with missing input: ACT-R mechanisms of partial
matching and spreading activation [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] can fill the gap(s) left
by the missing atomic events and retrieve the best–matching
ontology pattern. In the last section of the paper we describe
how Cognitive Engine performs action-recognition task for the
example orginally sketched in figure 3.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>IV. USING THE Cognitive Engine FOR ACTION</title>
      <p>RECOGNITION: AN EXAMPLE</p>
      <p>
        In the context of the Mind’s Eye program, a visual
intelligent systems is considered to be successful if it is able to
process a video-dataset of actions11 and output the probability
distribution (per video) of a pre-defined list of verbs,
including ‘walk’, ‘run’, ‘carry’, ‘pick-up’, ‘haul’, ‘follow’, ‘chase’,
etc12. Performance is measured in terms of consistency with
10Far from willing to deepen a topic that is out of scope to treat in this
manuscript, we refer the reader to [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] for details concerning marker–passing
algorithms.
      </p>
      <p>
        11http://www.visint.org/datasets.html.
12This list has been provided in advance by DARPA.
human responses to stimuli (Ground-Truth): subjects have
to acknowledge the presence/absence of every verb in each
video. In order to meet these requirements, we devised the
Cognitive Engine to work in a human-like fashion (see section
II), trying to disambiguate the scene in terms of the most
reliable conceptual structures. Because of space limitations,
we can’t provide here the details of a large-scale evaluation:
nevertheless, we can discuss the example depicted earlier in
the paper (figure 3) in light of the core mechanisms of the
Cognitive Engine. Considering figure 7, the Cognitive Engine
parses the atomic events extracted by IAR, namely ‘hold’
(micro-state) and ‘bend-over’, ‘drag’, ‘stop’ (micro-actions),
associating frames and roles to visual input from the videos.
This specific information is retrieved from the FrameNet
module of HOMINE: frames and frame roles are assembled
in suitable knowledge units and encoded in the declarative
memory of ACT-RK. As with human annotators performing
semantic role labeling [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ], the Cognitive Engine associates
verbs denoting atomic events to corresponding frames. When
related mechanisms are activated, the Cognitive Engine
retrieves the roles played by the entities in the scene, for each
atomic event: for example, ‘hold’ evokes the manipulation
frame, whose core role agent can be be associated to ‘person1’
(as showed in light-green box of the figure). In order to prompt
a choice within the available ontology patterns of action (see
table I), sub-symbolic computations for spreading activation
are executed [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Spreading of activation from the contents
of frames and roles triggers the evocation of related ontology
patterns. As mentioned in the introduction, partial matching
based on similarity measures and spreading of activation based
on compositionality are the main mechanisms used by
Cognitive Engine: in particular, we constrained semantic similarity
within verbs to the ‘gloss-vector’ measure computed over
WordNet synsets [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ]. Base-level activations of verbs actions
have been derived by frequency analysis of the American
National Corpus: in particular, this choice reflects the fact that
the more frequent is a verb, the more is likely to be activated
by a recognition system. Additionally, strengths of associations
are set (or learned) by the architecture to reflect the number
of patterns to which each atomic event is associated, the
socalled ‘fan effect’ controlling information retrieval in many
real-world domains [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ].
      </p>
      <p>Fig. 7. A Diagram of the Recognition Task performed by the Cognitive
Engine. The horizontal black arrow represents the sequence time framing while
the vertical one represents the interconnected levels of information processing.
The light-green box displays the results of semantic disambiguation of the
scene elements, while the gray box contains the schema of the output, where
importance reflects the number of components ina detected pattern (1-4) and
observed is a boolean parameter whose value is 1 when a verb matches an
IAR detection and 0 when the verbs is an actual result of EAR processing.</p>
      <p>The core sub-symbolic computations performed by the
Cognitive Engine through ACT-RK can be expressed by the
equation in figure 8:</p>
      <p>Fig. 8. Equation for Bayesian Activation Pattern Matching
1st term: the more recently and frequently a chunk i has
been retrieved, the higher its activation and the chances
of being retrieved. In our context i can be conceived as
a pattern of action (e.g., the pattern of HAUL), where tj
is the time elapsed since the jth reference to chunk i and
d represents the memory decay rate.
2nd term: the contextual activation of a chunk i is set by
the attentional weight Ski given the element k, the element
i and the strength of association between an element k and
the i. In our context, k can be interpreted as the value
BEND-OVER of the pattern HAUL in figure 7.
3rd term: under partial matching, ACT-RK can retrieve
the chunk l that matches the retrieval constraints i to the
greatest degree, computing the similarity Simli between l
and i and the mismatch score MP (a negative score that
is assigned to discriminate the ‘distance’ between two
terms). In our context, for example, the value PULL could
have been retrieved, instead of DRAG. This mechanism is
particularly useful when verbs are continuosly changing
- as in the case of a complex visual input stream.
4th term: randomness in the retrieval process by adding
Gaussian noise.</p>
      <p>Last but not least, the Cognitive Engine can output the
results of extra-reasoning functions by means of suitable queries
submitted to HOMINE via the SCONE-MODULE. In the
example in figure 7, object classifiers and tracking algorithms
could not detect that ‘person1’ is dragging ‘bag2’ by pulling
a rope: this failure in the visual algorithms is motivated by
the fact that the rope is a very tiny and morphologically
unstable artifact, hence difficult to be spotted by
state-of-theart machine vision. Nevertheless, HOMINE contains an axiom
stating that:
“For every x,y,e,z such that P(x) is a person, GB(y) is a
Bag and DRAG(e,x,y,T) is an event e of type DRAG (whose
participants are x and y) occurring in the closed interval of
time T, there is at least a z which is a proper part of y and
that participates to e”13.</p>
      <p>Moreover, suppose that in a continuation of the video, the
same person drops the bag, gets in a car and leaves the
scene. The visual algorithms would have serious difficulties
in tracking the person while driving the car, since the person
would become partially occluded, assume an irregular shape
and would be no more properly lightened. Again, the Cognitive
Engine could overcome these problems in the visual system
by using SCONE to call HOMINE and automatically perform
the following schematized inferences:</p>
      <p>Cars move;
Every car needs exactly one driver to move14;
Drivers are persons;
A driver is located inside a car;
If a car moves then the person driving the car also moves
in the same direction.</p>
      <p>Thanks to the inferential mechanisms embedded in its
knowedge infrastructure, the Cognitive Engine is not bound
to visual input as an exclusive source of information: in a
human-like fashion, the Cognitive Engine has the capability of
coupling visual signals with background knowledge,
performing high-level reasoning and disambiguating the original input
perceived from the environment. In this respect, the Cognitive
Engine can be seen as exemplifying a general perspective on
artificial intelligence, where data-driven learning mechanisms
are integrated in a knowledge–centered reasoning framework.</p>
    </sec>
    <sec id="sec-5">
      <title>V. CONCLUSION</title>
      <p>
        In this paper we presented the knowledge infrastructure of
a high-level artificial visual intelligent system, the Cognitive
Engine. In particular we described how the conceptual
specifications of basic action types can be driven by an hybrid
semantic resource, i.e. HOMINE and its derived ontology
patterns: for each considered action verb, the Cognitive
Engine can identify typical FrameNet roles and corresponding
lexical fillers (WordNet synsets), logically constraining them
13Note that here we are paraphrasing an axiom that exploits Davidsonian
event semantics [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ] and basic principles of formal mereology (see [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] and
[
        <xref ref-type="bibr" rid="ref36">36</xref>
        ]). Also, this axiom is valid if every bag has a rope: this is generally true
when considering garbage bags like the one depicted in figure7, but exceptions
would need to be addressed in a more comprehensive scenario.
14With some exceptions, especially in California, around Mountain View!
to a computational ontology of actions encoded in
ACTRK through the SCONE Knowledge-Base system. Future work
will be devoted to improve the Cognitive Engine and address
causal selectivity (see II) using (1) reasoning and statistical
inferences to derive and predict goals of agents and (2)
mechanisms of abduction to focus on the most salient information
from complex visual streams. We also plan to extend the
system functionalities in order to support a wider range of
action verbs and run tests on a large video dataset.
      </p>
    </sec>
    <sec id="sec-6">
      <title>ACKNOWLEDGMENTS</title>
      <p>This research was sponsored by the Army Research
Laboratory and was accomplished under Cooperative Agreement
Number W911NF-10-2-0061. The views and conclusions
contained in this document are those of the authors and should
not be interpreted as representing the official policies, either
expressed or implied, of the Army Research Laboratory or
the U.S. Government. The U.S. Government is authorized
to reproduce and distribute reprints for Government purposes
notwithstanding any copyright notation herein.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Forsyth</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Ponce</surname>
          </string-name>
          , Computer Vision,
          <string-name>
            <given-names>A Modern</given-names>
            <surname>Approach</surname>
          </string-name>
          . Prentice Hall,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Anderson</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Lebiere</surname>
          </string-name>
          ,
          <source>The Atomic Components of Thought. Erlbaum</source>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Tversky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zachs</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Martin</surname>
          </string-name>
          , “
          <article-title>The structure of experience,” in Understanding events: From Perception to Action, T</article-title>
          . Shipley and T. Zacks, Eds.,
          <year>2008</year>
          , pp.
          <fpage>436</fpage>
          -
          <lpage>464</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Albertazzi</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. Van Tonder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and D.</given-names>
            <surname>Vishwanath</surname>
          </string-name>
          , Eds.,
          <source>Perception Beyond Inference. The Information Content of Visual Processes</source>
          . The MIT Press,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Siskind</surname>
          </string-name>
          , “
          <article-title>Grounding language in perception</article-title>
          ,
          <source>” Artificial Intelligence Review</source>
          , vol.
          <volume>8</volume>
          , pp.
          <fpage>371</fpage>
          -
          <lpage>391</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Hayes</surname>
          </string-name>
          , “
          <article-title>The second na¨ıve physics manifesto,” in Formal Theories of the Common Sense World</article-title>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hobbes</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Moore</surname>
          </string-name>
          , Eds. Ablex Publishing Corporation,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Minsky</surname>
          </string-name>
          , “
          <article-title>A framework for representing knowledge,” in Mind Design</article-title>
          , P. Winston, Ed. MIT Press,
          <year>1997</year>
          , pp.
          <fpage>111</fpage>
          -
          <lpage>142</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Majid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Boster</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Bowerman</surname>
          </string-name>
          , “
          <article-title>The cross-linguistic categorization of everyday events: a study of cutting and breaking</article-title>
          ,” Cognition, vol.
          <volume>109</volume>
          , pp.
          <fpage>235</fpage>
          -
          <lpage>250</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P. W.</given-names>
            <surname>Singer</surname>
          </string-name>
          , Wired for War. The Penguin Press,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Maitikanen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sukthankar</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Hebert</surname>
          </string-name>
          , “
          <article-title>Feature seeding for action recognition</article-title>
          ,”
          <source>in Proceedings of International Conference on Computer Vision</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Poveda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Suarez-Figueroa</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Gomez-Perez</surname>
          </string-name>
          , “
          <article-title>Ontology analysis based on ontology design patterns,”</article-title>
          <source>in WOP 2009 Workshop on Ontology Patterns at the 8th International Semantic Web Conference (ISWC</source>
          <year>2009</year>
          ).
          <source>Proceedings of the WOP</source>
          <year>2009</year>
          .,
          <string-name>
            <surname>W. . . W.</surname>
          </string-name>
          <source>on Ontology Patterns at the 8th International Semantic Web Conference (ISWC</source>
          <year>2009</year>
          ), Ed.
          <source>WOP 2009 Workshop on Ontology Patterns at the 8th International Semantic Web Conference (ISWC</source>
          <year>2009</year>
          ),
          <year>2009</year>
          . [Online]. Available: http://sunsite.informatik.rwth-aachen. de/Publications/CEUR-WS/Vol-
          <volume>516</volume>
          /pap05.pdf
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>W.</given-names>
            <surname>Ceusters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Corso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Petropoulos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Krovi</surname>
          </string-name>
          , “
          <article-title>Introducing ontological realism for semi-supervised detection and annotation of operationally significant activity in surveillance videos,” in the 5th</article-title>
          <source>International Conference on Semantic Technologies for Intelligence</source>
          , Defense,and
          <string-name>
            <surname>Security</surname>
          </string-name>
          (STIDS
          <year>2010</year>
          ),
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gangemi</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Presutti</surname>
          </string-name>
          , “
          <article-title>Ontology design patterns,” in Handbook on Ontologies, ser</article-title>
          . 2nd
          <string-name>
            <surname>Edition</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Staab</surname>
            and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Studer</surname>
          </string-name>
          , Eds. Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>C.</given-names>
            <surname>Alexander</surname>
          </string-name>
          ,
          <source>The Timeless Way of Building</source>
          . Oxford Press,
          <year>1979</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Oltramari</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Lebiere</surname>
          </string-name>
          , “
          <article-title>Mechanism meet content: Integrating cognitive architectures and ontologies</article-title>
          ,”
          <source>in Proceedings of AAAI 2011 Fall Symposium of ”Advances in Cognitive Systems”</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ball</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rodgers</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Gluck</surname>
          </string-name>
          , “
          <article-title>Integrating act-r and cyc in a largescale model of language comprehension for use in intelligent systems,” in Papers from the AAAI workshop</article-title>
          . AAAI Press,
          <year>2004</year>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>25</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Douglas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ball</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Rodgers</surname>
          </string-name>
          , “
          <article-title>Large declarative memories in act-</article-title>
          r,”
          <source>in Proceedings of the 9th International Conference of Cognitive Modeling</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>B.</given-names>
            <surname>Best</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gerhart</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Lebiere</surname>
          </string-name>
          ,
          <article-title>“Extracting the ontological structure of cyc in a large-scale model of language comprehension for use in intelligent agents</article-title>
          ,”
          <source>in Proceedings of the 17th Conference on Behavioral Representation in Modeling and Simulation</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>B.</given-names>
            <surname>Edmond</surname>
          </string-name>
          , “
          <article-title>Wn-lexical: An act-r module built from the wordnet lexical database</article-title>
          ,”
          <source>in Proceedings of the 7th International Conference of Cognitive Modeling</source>
          ,
          <year>2006</year>
          , pp.
          <fpage>359</fpage>
          -
          <lpage>360</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>C.</given-names>
            <surname>Fellbaum</surname>
          </string-name>
          , Ed.,
          <source>WordNet, An Electronic Lexical Database</source>
          . MIT Press, Boston,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ruppenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ellsworth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Petruck</surname>
          </string-name>
          , and C. Johnson, “Framenet:
          <article-title>Theory and practice</article-title>
          ,”
          <year>June 2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>G.</given-names>
            <surname>Vetere</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oltramari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Chiari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jezek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Vieu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Zanzotto</surname>
          </string-name>
          , “
          <article-title>Senso comune, an open knowledge base for italian,” TAL - Traitement Automatique des Langues</article-title>
          , vol.
          <volume>39</volume>
          , no.
          <source>Forthcoming</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>C.</given-names>
            <surname>Masolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gangemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Guarino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oltramari</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Schneider</surname>
          </string-name>
          , “
          <article-title>WonderWeb Deliverable D17: The WonderWeb Library of Foundational Ontologies,”</article-title>
          <string-name>
            <surname>Tech. Rep.</surname>
          </string-name>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gangemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Guarino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Masolo</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Oltramari</surname>
          </string-name>
          , “
          <article-title>Sweetening wordnet with dolce</article-title>
          ,
          <source>” AI Magazine</source>
          , vol.
          <volume>3</volume>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>24</lpage>
          ,
          <year>Fall 2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>P.</given-names>
            <surname>Simons</surname>
          </string-name>
          , Ed.,
          <string-name>
            <surname>Parts</surname>
            <given-names>:</given-names>
          </string-name>
          <article-title>a Study in Ontology</article-title>
          . Clarendon Press, Oxford,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Fillmore</surname>
          </string-name>
          , “
          <article-title>The case for case,” in Universals in Linguistic Theory</article-title>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Bach</surname>
          </string-name>
          and T. Harms, Eds. New York: Rinehart and Wiston,
          <year>1968</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>A.</given-names>
            <surname>Burchardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Erk</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Frank</surname>
          </string-name>
          , “
          <article-title>A wordnet detour to framenet,” in Sprachtechnologie, mobile Kommunikation und linguistische Resourcen., ser</article-title>
          .
          <source>Computer Studies in Language and Speech</source>
          ,
          <string-name>
            <given-names>B. S.</given-names>
            <surname>Bernhard</surname>
          </string-name>
          <string-name>
            <given-names>Fisseni</given-names>
            ,
            <surname>Hans-Christian Schmitz</surname>
          </string-name>
          and P. Wagner, Eds.
          <source>Frankfurt am Main: Peter Lang</source>
          ,
          <year>2005</year>
          , vol.
          <volume>8</volume>
          , pp.
          <fpage>408</fpage>
          -
          <lpage>421</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>A.</given-names>
            <surname>Oltramari</surname>
          </string-name>
          , “
          <article-title>Lexipass methodology: a conceptual path from frames to senses and back,” in LREC 2006 (Fifth International Conference on Language Resources and Evaluation)</article-title>
          .
          <source>Genoa (Italy): ELDA</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Allen</surname>
          </string-name>
          , “
          <article-title>An interval based representation of temporal knowledge,”</article-title>
          <source>in 7th International Joint Conference on Artificial Intelligence</source>
          . Vancouver: IJCAI, Morgan Kaufmann,
          <year>1983</year>
          , pp.
          <fpage>221</fpage>
          -
          <lpage>226</lpage>
          , vol.
          <volume>1</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Mcguinness</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nardi</surname>
          </string-name>
          , and P. F. PatelSchneider, Eds.,
          <source>The Description Logic Handbook : Theory, Implementation and Applications</source>
          . Cambridge University Press,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>S.</given-names>
            <surname>Fahlman</surname>
          </string-name>
          , “
          <article-title>Using scones multiple-context mechanism to emulate human-like reasoning</article-title>
          ,” in
          <source>First International Conference on Knowledge Science, Engineering and Management (KSEM'06)</source>
          . Guilin,
          <source>China: Springer-Verlag (Lecture Notes in AI)</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>D.</given-names>
            <surname>Gildea</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          , “
          <article-title>Automatic labelling of semantic roles</article-title>
          ,”
          <source>in Proceedings of 38 th Annual Conference of the Association for Computational Linguistics (ACL-00)</source>
          ,
          <year>2000</year>
          , pp.
          <fpage>512</fpage>
          -
          <lpage>520</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>T.</given-names>
            <surname>Pedersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Patwardhan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Michelizzi</surname>
          </string-name>
          , “Wordnet ::
          <article-title>Similarity: Measuring the relatedness of concepts,” in Demonstration Papers at HLT-</article-title>
          NAACL,
          <year>2004</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>L.</given-names>
            <surname>Schooler</surname>
          </string-name>
          and
          <string-name>
            <surname>J. Anderson</surname>
          </string-name>
          , “
          <article-title>The disruptive potential of immediate feedback</article-title>
          ,”
          <source>in Proceedings of the Twelfth Annual Conference of The Cognitive Science Society</source>
          ,
          <year>1990</year>
          , pp.
          <fpage>702</fpage>
          -
          <lpage>708</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>R.</given-names>
            <surname>Casati</surname>
          </string-name>
          and
          <string-name>
            <surname>A</surname>
          </string-name>
          . Varzi, Eds., Events. Aldershots, USA: Dartmouth,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>R.</given-names>
            <surname>Casati</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Varzi</surname>
          </string-name>
          , Parts and Places.
          <source>The Structure of Spatial Representation</source>
          . Cambridge, MA: MIT Press,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>