<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Goal-Based Person Tracking Using a First-Order Probabilistic Model</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Thomas Geier, Stephan Reuter, Klaus Dietmayer and Susanne Biundo Faculty of Engineering and Computer Science Ulm University</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This work addresses the problem of person tracking using additional background information. We augment a particle lter-based tracking algorithm with a rst-order probabilistic model expressed through Markov Logic Networks to tackle the data association problem in domains with a high occlusion rate. Using a high-level model description allows us to easily integrate additional information like a oor plan or goal information into a joint model and resolve occlusion situations that would otherwise result in the loss of association. We discuss the engineered model in detail and give an empirical evaluation using an indoor setting.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        This work concerns the problem of providing
background-knowledge to the specialized application
of person tracking using a high-level probabilistic
model. We demonstrate that a hand-crafted model,
built using Markov Logic Networks (MLN)
[Ri
        <xref ref-type="bibr" rid="ref1">chardson and Domingos, 2006</xref>
        ], can help in solving the
data association problem in tracking during situations
where high occlusion prevents the correct association
between past and new tracks. By leveraging additional
information, like a oor plan or knowledge about goals
of single persons, we can resolve otherwise opaque
situations.
      </p>
      <p>
        The usage of a rst-order probabilistic model like
MLNs allows for an easier modeling task, because
dependencies are represented by weighted rst-order
logical formulas instead of, e.g., conditional probability
tables for the case of directed models like Bayesian
Networks. In addition, the model is formulated in
a lifted form and can be instantiated for the desired
number of concurrent tracks or persons within the
scene, which is not possible when using completely
propositional models or specialized template models
like dynamic Bayesian networ
        <xref ref-type="bibr" rid="ref8">ks [Murphy, 2002</xref>
        ](which
can scale only along the time axis). A model given in
a lifted representation also makes it possible to
leverage structure information contained within the lifted
formulation for more e cient inference [Gogate and
Domingo
        <xref ref-type="bibr" rid="ref10">s, 2011</xref>
        ]; although this approach is not
investigated here.
      </p>
      <p>We motivate our work in the context of an indoor
situation, where multiple persons move in a two-room
ofce, containing the laser range nder and several areas
of interest like a printer or a co ee maker. We
measure the quality of our model by the extent to which
it is able to correctly associate object tracks emerging
from the tracking algorithm with the correct persons
inside the scene.</p>
      <p>The rest of the paper is laid out as follows. After we
discuss related work, we give an overview of the
applied tracking algorithm and introduce the concept of
Markov Logic Networks. Then, we describe the
investigated problem in detail and discuss the used MLN
model. We give an empirical evaluation of the
described setup and conclude with some possible
extensions to the model and an overall discussion.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related</title>
    </sec>
    <sec id="sec-3">
      <title>Work</title>
      <p>
        In multi-object tracking, state dependent detection
probabilities of objects are disregarded in most
applications. Thus, a track disappears shortly after
entering an occluded area and a new track is created
when the object leaves the occluded area again.
Consequently, one object is represented by di erent track
IDs. Especially in scenarios where persons interact
several times with a system, changed track IDs lead
to the loss of the objects history. The multi-object
Bayes lte
        <xref ref-type="bibr" rid="ref7">r [Mahler, 2007</xref>
        ] allows to integrate state
dependent detection probabilities even if the scenario
is characterized by a high object den
        <xref ref-type="bibr" rid="ref10">sity [Reuter and
Dietmayer, 2011</xref>
        ]. In case of short term occlusions, the
usage of state dependent detection probabilities leads
to an improved track continuity. A direct integration
of goals into the prediction of a persons' state is
crucial, since the persons' action may be contradictory to
the assigned goal.
      </p>
      <p>
        Markov Logic Networks have been u
        <xref ref-type="bibr" rid="ref9">sed by Sadilek and
Kautz [2010</xref>
        ] for multi-agent activity recognition based
on GPS data in a game of capture the ag. While their
work can leverage more expert knowledge (the rules of
the game), they do not encounter the data association
problem present in the tracking scenario, since each
person was carrying a personal GPS
        <xref ref-type="bibr" rid="ref3">receiver. Tran
and Davis [2008</xref>
        ] apply Markov Logic Networks to a
parking lot surveillance scene using video data to
recognize which person enters which car. They also track
pedestrians across a scene and face the problem of data
association. Their sensory information emerges from
image data and their focus lies in integrating di
erent information sources that are all extracted from
the video stream. Markov Logic Networks are also
used by Singla and Do
        <xref ref-type="bibr" rid="ref11">mingos [2006</xref>
        ] for entity
resolution in text mining. This is the problem of inferring
which references refer to the same entity and it is
similar to the data association problem in tracking. The
two latter works use an equals predicate for identity
maintenance, whereas we approach the problem using
an association mapping to underlying entities. When
grounding the model our approach only creates
associations between currently instantiated track IDs and
their corresponding entity, wheres using an equals
predicate will introduce relations between all objects,
which does not seem reasonable in a dynamic domain.
3
      </p>
    </sec>
    <sec id="sec-4">
      <title>Multi-Object Tracking</title>
      <p>
        Standard multi-object tracking algorithms often use
object individual single-object trackers like the
Kalman lter. The drawback of this multi-object
tracking approach is the need of a data association
step which assigns the received measurements to the
trackers using hard decisions or probabilistic
method
        <xref ref-type="bibr" rid="ref2">s [Blackman and Popoli, 1999</xref>
        ]. Especially in
scenarios characterized by a high object density, the data
association is error-prone and degrades the performance
of the tracking system, since false associations are
irreversible.
      </p>
      <p>
        A rigorous approach to multi-object tracking is the
multi-object Bayes lte
        <xref ref-type="bibr" rid="ref7">r proposed by Mahler [2007</xref>
        ].
The multi-object Bayes lter uses the random nite set
statistics to represent the complete environment by a
single lter state. In the innovation step of the
multiobject Bayes lter, a multi-object likelihood function
calculates the a nity between the predicted state set
and the received measurement set. Thus, no data
association is necessary.
      </p>
      <p>Further, the multi-object Bayes lter allows to
integrate state dependent detection probabilities into the
ltering algorithm. In Reuter and Dietmayer [2011],
an approach to calculate state dependent detection
probabilities based on the occupancy grid mapping
approach [Thrun et al., 2005] is proposed. Thus, it is
possible to keep track of an object which is occluded
for the sensor for a short period of time. Using
constant detection probabilities would lead to a track loss,
if an object is not visible to the sensor for a few
measurement cycles. We use the state dependent tracking
algorithm as a comparison for our nal results. But
for input into the high-level model, we use state
independent object tracking. This produces more track
IDs for association, and in particular is less prone to
false association, which we cannot correct in the upper
stage.</p>
      <p>
        An implementation of the multi-object Bayes lter is
possible using Sequential Monte Carlo (SMC) method
        <xref ref-type="bibr" rid="ref10">s
[Reuter and Dietmayer, 2011</xref>
        , Sidenblad
        <xref ref-type="bibr" rid="ref13">h and
Wirkander, 2003</xref>
        , Mahle
        <xref ref-type="bibr" rid="ref7">r, 2007</xref>
        ]. In di erence to well known
SMC implementations of the standard Bayes lter a
particle set, which represents a random nite set
using a nite number of state vectors, is used instead of
a standard particle. Further, the number of state
vectors in the particle set may change at each time step.
In case of a SMC implementation, the integration of
the mentioned constraints is possible by reducing the
weight of a particle set.
      </p>
      <p>
        Since the multi-object Bayes lter does not perform
a measurement to track association, an extraction of
the individual objects out of the multi-object posterior
density function is necessary, e.g. using the k-
        <xref ref-type="bibr" rid="ref11">means
algorithm [Bishop, 2006</xref>
        ].
4
      </p>
    </sec>
    <sec id="sec-5">
      <title>Markov Logic Networks</title>
      <p>
        Markov Logic Networks [Ri
        <xref ref-type="bibr" rid="ref1">chardson and Domingos,
2006</xref>
        ] are a member of the family of rst-order
probabilistic languages [de Salvo B
        <xref ref-type="bibr" rid="ref3">raz et al., 2008</xref>
        ] and their
semantics are based on undirected graphical
models (Markov networks). In contrast to propositional
models like Bayesian networks and Markov Networks,
where every random variable has to be speci ed
explicitly, in rst-order models the random variables are
relations over objects and the model can be scaled
by providing the appropriate number of object
constants. Moreover, MLNs allow the speci cation of
dependencies as weighted rst-order logical formulas.
Higher weights make those interpretations more likely,
in which more groundings of the formula evaluate to
true. We will now brie y cover the formal semantics
of MLNs.
      </p>
      <p>A Markov Logic Network L = f(f1; w1); : : : ; (fn; wn)g
for n 2 N is a set of rst-order formulas f1; : : : ; fn
with given weights w1; : : : ; wn 2 R. Together with
a nite set of constants C, they de ne a
probability distribution over all interpretations (or possible
worlds). An interpretation maps each grounding of
each predicate to a truth value. The interpretation
of functions must be xed. Probabilistic functions
can be emulated using predicates. Let gC (f ) be the
set of groundings of formula f obtained by
replacing the free variables in f by all combinations of
constants from C. Given an interpretation x, then
nC;i(x) d=ef jfg j g 2 gC (fi) and x j= ggj is the
number of groundings of formula fi that are true under x.
Then, the probability distribution PL;C that is de ned
by the MLN L with constants C is given as
PL;C (X = x) d=ef 1 Y exp winC;i(x) ;</p>
      <p>Z i
(1)
where i ranges over all formulas in L, and Z is a
normalizing constant.</p>
      <p>Given a set of constants, a MLN can be converted to
a Markov network, where nodes correspond to atoms
and each ground formula induces a clique over all
nodes whose atoms appear inside this formula. For
practical reasons, a sorted (or typed) logical language
is used to describe MLNs. Using sorted terms, we
can limit the size of the grounded network. Also, in
their basic form, MLNs do only allow restricted usage
of logical functions. Usually functions are simulated
by specially marked predicates, which enforce a
functional dependency of one or more arguments on the
remaining arguments. We notate functional arguments
of predicates by underlining them. Such a predicate
can be translated to a multi-valued random variable.
5</p>
    </sec>
    <sec id="sec-6">
      <title>Problem Description</title>
      <p>
        We consider an indoor scene which resembles an o ce
setting. The corresponding oor plan is depicted in
Figure 1a. A laser range nder is placed in one corner
of the main room and provides distance information
in a plane about one meter above ground. The beam
almost completely covers the main room, but there
exists a second room that has virtually no sensor
coverage. There is only one entrance to the room complex
and the separate room has only a single exit, which is
the door to the main room. The setting contains
several features that may serve as goal destinations for
persons navigating inside the scene; like a printer or
a co ee maker. Knowledge about destinations of
persons is taken as given; although it is easy to motivate
the existence of such information depending on the
application. Issued print jobs could be recognized by a
special program installed on the PC, or visits to the
co ee maker could be predicted from personal habits.
There are one to three persons inside the scene
simultaneously. Major occlusion caused by static objects,
like walls, occurs when people enter the second room.
Minor static occlusion can occur near the co ee maker.
During the scenes with more than one person, dynamic
occlusion occurs when persons are covered by other
persons standing between them and the sensor.
Using only the particle lter-based tracking algorithm
to process the output of the laser range nder,
problems arise when people produce no measures for an
extended period of time because they are inside the
separate room or because they are hidden by another
person. For shorter occlusion durations it is possible to
keep the track of a single occluded person alive for long
enough for the person to reappear and re-association is
completely handled by the tracking algorithm. If two
persons enter the same occlusion area, their estimated
positions begin to mix spatially and once they emerge
again, re-association becomes more and more
arbitrary with increasing occlusion duration. In these
scenarios, a direct integration of the Social Force model
into the prediction of the persons state of the tracking
algorithm may increase the performance of the
        <xref ref-type="bibr" rid="ref9">system [Reuter and Dietmayer, 2010</xref>
        ]. The Social Force
model aims at describing pedestrian movement by
virtual forces exerted by other persons and environmental
features [Helbing an
        <xref ref-type="bibr" rid="ref5">d Molnar, 1995</xref>
        ]. Since the model
heavily depends on the destinations of the person, a
tight integration with a high-level knowledge base, as
described in this work, seems promising for such an
approach.
      </p>
      <p>Figure 1b shows an example of the tracking results for
one of the sequences with three persons. The
trajectories are illustrated by solid lines. Since the results
are generated without the usage of the state dependent
detection probability, the trajectories are interrupted
quite often in the area corresponding to region RA,
where a dynamic occlusion occurs.
6</p>
    </sec>
    <sec id="sec-7">
      <title>Description of the Model</title>
      <p>In this section, we describe the used MLN model and
discuss some of the di culties and design choices we
have encountered in its engineering. The complete
model is factored into four modules. We begin with
a discussion of two concepts that cannot be associated
with distinct model parts but in uence nearly every
aspect { the representation of space and time.
Printer RC</p>
      <p>-4
RB
RA
-3
-4
0</p>
      <p>Co ee
Living</p>
      <p>Exit
Laser Range Finder</p>
      <p>
        -5
(a) Floor plan
-1
-1
0
(b) Example trajectories
From continuous to discrete space. The basic
MLN can only represent discrete random variables.
There exists an extension of MLNs to continuous
va
        <xref ref-type="bibr" rid="ref3">riables [Wang and Domingos, 2008</xref>
        ], but no working
implementation is available. For this work, we reduce the
continuous spatial estimates obtained from the
tracking algorithm to a few discrete regions. For ease of
modeling and processing of data, we choose a
rectangular shape. We do not create a uniform grid, but try
to respect functional aspects of the environment
concerning the problem. For example, it does not make
sense to further split the o ce into smaller areas if
there is no distinction for the sensor (everything is one
connected occluded area) and there is only a single goal
inside. Even having two separate goals like two work
stations might not justify the introduction of separate
regions for each, as long as the rest of the model can
not discriminate between them. We have de ned a
total of eight regions, which are depicted in Figure 1a.
Sticking with a low number of regions also made an
exact evaluation of the nal model feasible. Depending
on the inference approach, there might be no signi
cant overhead when using a larger state space for the
spatial component. Although, the model engineering
may become more intricate when opting for more ne
grained regions. Taking the characteristics of the
sensor into account, a radial layout for regions seems like
a promising approach, but this was not investigated.
Representation of time. In order to model
dynamic domains, we assign a dedicated time sort, whose
constants are elements from the natural numbers. One
usually aims to construct a model that ful lls the
Markov property, i.e., the state at time t + 1 only
depends on the state at time t. This means that formulas
may only contain predicates of at most two di erent
times, which then must be successive. But in the
presented model, the predicates that represent the
association of tracks to persons are not time-indexed, which
makes them static. This makes it di cult to apply
standard dynamic inference algorithms, which usually
assume the Markov property. But a static variable
can be considered as a dynamic variable, for which
the same value is enforced in every time step.
Fortunately, these static association are only referenced
over a limited period of time | the life-time of a track
| so they do not pile up over the course of the
complete sequence. For time resolution we have settled
for the duration of about one second, which seems like
a good compromise between inference complexity and
accuracy for the given problem.
      </p>
      <p>The tracking model. The basic functionality for
interfacing with the tracking algorithm is provided
by a MLN module that contains objects of the sorts
Track and Person. To notate variables of some sort,
we use the initial letter of the sort name in lower
case. For the sort Track, the letter 'm' is used
because of the ambiguity with sort Time. For both
sorts Track and Time, there exist time-dependent
predicates atT : Time Track Region and atP :
Time Person Region that give the current
location of a track or person, respectively. The
timeindependent predicate a : Track ! Person associates
Track objects to Person objects. The correspondence
of tracks to persons inside the MLN is similar to the
correspondence of measurements to tracks inside the
tracking algorithm. The output of the tracking
algorithm is converted to observations of the atT
function and an additional completely observed predicate
act : Time Track, whose purpose is mainly to be
able to prune groundings of formulas that depend on
inactive track IDs. The usage of this predicate is
omitted for clarity. Information about goals of persons can
attach to the location of the person objects. The core
tracking model then consists of the following two
formulas.</p>
      <p>5
3
atT (t; m; r) ^ a(m; p) ) atP (t; p; l)
a(m1; p) ^ a(m2; p) ^ m1 6= m2
(2)
(3)
Formula 2 probabilistically forces a person to be in
the same region as its associated track. By design a
track can only be associated to one person at a time
because the last parameter of the predicate a is
declared functional. Formula 3 probabilistically enforces
the association to also be a one-to-one relation. The
engineering of the formula weights is explained at the
end of this section. This formula is limited to
concurrently instantiated tracks using the act predicate (not
listed).</p>
      <p>The oor plan. We use the static predicate adj :
Region Region to encode the connectedness of the
regions. All instances are fully observed and adhere
to the oor plan given in Figure 1a. A single formula
forces persons to move between regions only according
to the given layout:
1</p>
      <p>atP (t; p; l1) ^ atP (t + 1; p; l2) ) adj(l1; l2) (4)
This formula is deterministic to prevent persons from
\teleporting" through the scene. If regions allow for a
traversal in less than a second (one time step), this rule
becomes invalid. But since the association of tracks to
persons allows for some slack, a person in the model
can \catch up" to the location of its real counterpart
after some time steps, only violating Formula 2.
The occlusion model. To prevent persons without
an associated track from wandering across the scene
(since no track in uences their current location), we
need to express that persons usually have a track
unless they are indeed occluded. In our setting both
static and dynamic occlusions occur, being caused by
walls or other persons, respectively. For our
experiments, only static occlusion information is modeled.
This is done by assigning a certain probability to each
region that it may contain untracked persons. The
probability is larger for areas of high static occlusion,
like the separate room. We also assign a higher
occlusion to regions that are more likely to be dynamically
covered, like the region around the co ee maker. By
assigning low occlusion probabilities to central regions
that have a good sensor coverage we penalize persons
silently slipping past the sensor. Formula 5 is provided
once for each region r. The weight wr is the occlusion
value, which is given in Figure 1a.</p>
      <p>wr</p>
      <p>atP (t; p; r) ^ :9m : (act(t; m) ^ a(m; p)) (5)
Goals and their dynamics. The last model part
handles the goals of persons. We associate goals
with regions and add another time-dependent
function goal : Time Person Region, where goals are
also allowed to assume the special location NULL to
signal that a person currently has no goal. The following
formulas describe the dynamics of goals:
goal(t; p; l) ^ :atP (t; p; l) ) goal(t + 1; p; l)(6)
goal(t; p; l) ^ atP (t; p; l)
) goal(t + 1; p; l) _ goal(t + 1; p; NULL) (7)
goal(t; p; NULL)
(8)
1
1
0:1
Formulas 6 and 7 achieve that goals can only be cleared
when a person reaches their associated region.
Formula 8 encodes the urge of people to clear their goal.
Stating this rule in this particular form results in
persons trying to reach their goal as soon as possible, since
otherwise the penalty accumulates over time. Another
way to make people reach their goals is to make it more
likely for a person to be inside the region of their goal.
Both rules work equally well for our dataset but might
make a di erence when applied to longer sequences or
under a di erent setting.</p>
      <p>
        Elicitation of weights. There exist two major ways
to determine the weights of the probabilistic formulas:
Learning from data and elicitation from experts; where
for common sense domains, like the one we are dealing
with, everyone is usually an expert. Both the
learning of weights and the direct speci cation approaches
have been followed in the literature. For the case of
our related work,
        <xref ref-type="bibr" rid="ref9">Sadilek and Kautz [2010</xref>
        ] and Singla
and Do
        <xref ref-type="bibr" rid="ref11">mingos [2006</xref>
        ] are employing lea
        <xref ref-type="bibr" rid="ref3">rning and Tran
and Davis [2008</xref>
        ] specify the weights by hand. Due to
the limited size and the common sense nature of our
dataset we decided to specify the weights ourself.
The approximation to consider the weight as the
logarithmic odds of the formula being true [Ri
        <xref ref-type="bibr" rid="ref1">chardson and
Domingos, 2006</xref>
        ] can serve as a good starting point,
but it only holds as long as formulas do not share
predicates. After assigning some reasonable initial values,
we iteratively looked at predictions of the model for
selected sequences and adjusted the weights if the
predictions did not conform with our expectations. We
began by adjusting the weights for sequences with only
one persons and switched to larger test sequences once
the model made sensible predictions for the training
data at hand. Since MLNs are based on undirected
graphical models (which means they are locally
unnormalized), there are no absolute correct values, but
the weights of di erent formulas have to be balanced
against each other.
7
      </p>
    </sec>
    <sec id="sec-8">
      <title>Preprocessing of Tracking Information</title>
      <p>We go on and describe how tracking data is processed
for input to the MLN model. After extraction of the
individual objects in the multi-object Bayes lter, we
obtain a set of single object particles Xmt for each track
ID m and time step t. We then apply two data
reduction steps. First, the MLN model works on a coarser
time scale of 1.25 steps per second, while the tracking
algorithm runs with 12.5 steps per second. We drop
the intermediate steps without further processing. A
di erent approach might aggregate them, e.g., by
averaging, but this would also distort the meaning of the
data, because it cannot be considered a snapshot of
the situation anymore.</p>
      <p>Depending on the quality of the tracking algorithm
and the used object model, there can be many false
positive tracks, e.g., when people spread their arms
away from their body, crossing the plane of laser
beams. To reduce these false tracks, we use the
existence probability to eliminate insigni cant tracks. It
is given by jXmtj=N ; the number of particles for track
ID m divided by the total number of particles N . We
drop all tracks from a time step whose existence
probability is below 0:5. For our test sequences, the output
of the tracking algorithm usually contains about thirty
tracks per sequence, but only less than ten remain
after applying both reduction processes.</p>
      <p>For each time step t and each track ID m that survive
the described process we add the track as active to our
MLN model via observation of the act predicate. We
then bin the single object particles into the discrete
regions. Most of the time all particles are contained
in a single region and we create an observation of the
atT function. In cases where the particles of a track
m spread over several regions we re ect this as
probabilistic evidence by adding a formula (wl; atT (t; m; l))
for each location l and calculating the weight as the
logarithmic odds wl = log 1 plpl , where pl is the relative
frequency of a particle of track m being in region l.
8</p>
    </sec>
    <sec id="sec-9">
      <title>The Inference Problem</title>
      <p>
        Markov Logic Networks can be seen as template
models for undirected graphical mo
        <xref ref-type="bibr" rid="ref6">dels [Koller and
Friedman, 2009</xref>
        ]. Their semantics are de ned using the
ground version of these networks. As such the
described MLN represents an undirected version of a
dynamic Bayesian networ
        <xref ref-type="bibr" rid="ref8">k [Murphy, 2002</xref>
        ]. The e ort for
exact inference in such models is usually exponential in
the number of variables within one time slice, because
most variables within one time step become dependent
on each other after some steps in most models.
The model described in this work also su ers from this
problem. The cause that all variables of a time slice
become dependent lies in the probabilistic data
association; which is a hard problem at its core. In our
case the problem of exact inference is exponential in
mT + nP , where mT is the maximum number of
simultaneous tracks and nP is the total number of persons
in the model. Here mT stems from the association
predicates and nP are the instantiations of the atP
predicate for one time step.
      </p>
      <p>
        For our evaluation we perform exact inference on
the model by exploiting context-speci c
indepen
        <xref ref-type="bibr" rid="ref6">dence [Koller and Friedman, 2009</xref>
        , pp. 171]. Given
an assignment to all association variables, the model
factorizes into components for each person and thus
becomes tractable. Our largest sequence contained
only 10 tracks, which results in 37 possible
associations to three persons after observing the correct
association for three initial tracks. After conditioning
on the association variables we calculate the partition
function for each association using variable elimination
along a min-degree variable ordering. This approach is
not suited for online ltering. For this purpose a
raoblackwellized particle lter, which collapses all but the
association variables, seems like a good solution [Koller
an
        <xref ref-type="bibr" rid="ref6">d Friedman, 2009</xref>
        , pp. 526]. Evaluating the
performance of this inference approach on the presented
model is open for future work.
9
      </p>
    </sec>
    <sec id="sec-10">
      <title>Evaluation</title>
      <p>We recorded 9 sequences in total; three sequences with
one, two and three persons, respectively. The duration
of each sequence is about one minute. The course of
events is the same among sequences with the same
number of persons; the sequences vary during the part
where multiple persons wander around the main room.
The setups with one person only feature static
occluSequence</p>
      <p>Persons</p>
      <p>Unassigned Tracks
#pD = c #pD(x)
S1-1532
S1-1640
S1-1737
S2-2056
S2-2207
S2-2329
S3-4628
S3-4734
S3-5306
sion caused by walls, caused by the single person
staying inside the o ce for several seconds. This results
in its track being reinvented upon entering the main
room again. With two persons there is dynamic
occlusion, where one person covers the other person. Both
persons enter the o ce together and thus cannot be
distinguished once they reappear. Goal information
for one person can resolve this issue and we can
obtain a good association again. In the scenes with three
persons, one person enters the o ce while the two
remaining persons stay inside the main room. Dynamic
occlusion occurs while all three persons are walking
around in front of the sensor. Tracks are lost and
recreated often, which can also be observed in Figure 1b
inside the area corresponding to region RA. When two
persons are simultaneously shadowed by the third one,
it is not possible to associate the reappearing tracks to
the correct persons just by means of the sensor. In this
case goal information can be used to identify the
correct association.</p>
      <p>We have evaluated three models that incorporate an
increasing amount of domain knowledge. The rst
model M uses only the basic tracking model and the
oor plan. The second model MO comprises all of the
rst model and the static occlusion model. The third
model MOG adds information about personal goals. One
person visits the co ee maker in each sequence. We
assign the Coffee region as goal for this person in every
sequence. All other persons have no goal assigned.
The goal information is able to resolve confusions that
happen before the designated person visits its goal
region. This does not happen in every sequence.
The MLN is instantiated for three persons in every
setup, regardless of the number of persons appearing
in the scene. For each model and each sequence, we
observe the correct association for the rst track of each
person and evaluate how well we can associate new
tracks. In our dataset, the number of tracks that
remain unassigned after labeling the starter tracks varies
between one and seven.</p>
      <p>
        The results of our evaluation are given in Table 1. For
each sequence, we give the number of false track
assignments of the most probably association. In
addition, we provide the probability of the correct joint
association. This is interesting for cases where no false
associations were made even with a simpler model, and
can show an improved signi cance of the correct
association when using a more sophisticated model.
To provide a baseline, the number of track confusions
and losses of a state-of-the-art multi-object Bayes
lter with state dependent detection probability (pD(x))
and the ones using the same lter with constant
detection probability (pD = c) are gi
        <xref ref-type="bibr" rid="ref4">ven [Reuter and
Dietmayer, 2011</xref>
        ]. For both lters, the number of persons
inside the scene is subtracted from the total number
of signi cant tracks. If the tracking algorithm works
perfectly, this number will be zero. The output of the
pD = c lter is used as input for the MLN stage; so
this number equals the number of associations made
by the high-level model and thus equals the possible
maximum number of false associations.
      </p>
      <p>We observe that the algorithm with the state
dependent detection probability reduces the number of
unassigned tracks dramatically in the scenarios with three
persons, where a lot of short term occlusions occur.
In the scenarios with one or two persons, where the
long-term occlusions due to static objects dominate,
the usage of the state dependent detection probability
has nearly no in uence on the number of unassigned
tracks. The high-level model MO using only the oor
plan and the static occlusion model delivers results
that are at least on par with the state dependent
tracking algorithm. Using goal information for one person
can further improve the results. By our judgment, this
is not possible by relying solely on the data obtained
from the laser range nder in general.</p>
      <p>One of the two person sequences (S2-2207) shows very
bad performance of the MLN model for all three cases.
Our investigation has shown that an outstretched arm
has caused a signi cant track that made it through the
data reduction process. The relatively high weight on
Formula 3 prevented the association of this track to the
owner of the arm. Thus, a third person was forced by
the model to appear at this spot and remained present
over the course of the scene.
10</p>
    </sec>
    <sec id="sec-11">
      <title>Conclusion</title>
      <p>We have described an approach to solving the data
association problem for person tracking with a
highlevel probabilistic model described using Markov Logic
Networks. We showed how to map the output of a
regular tracking algorithm into a discrete spatial
representation, which makes it easy to attach additional
information, e.g., personal goals, and allows the use of
inference techniques for discrete probabilistic models.
Especially in scenarios with long-term occlusions,
where even the multi-object Bayes lter is not able to
continue to track hidden objects, the association using
MLN outperforms the tracking-based approach when
using exact evaluation. On the other hand, a
sophisticated tracking algorithm is adequate in scenarios with
occlusions of no more than one second and might be
able to scale more easily to larger domains.
In future, we plan to integrate more information out
of the knowledge-base directly into the tracking
algorithms like, e.g., probabilistically modeled destinations
or goals of a person.</p>
    </sec>
    <sec id="sec-12">
      <title>Acknowledgements</title>
      <p>This work is done within the Transregional
Collaborative Research Centre SFB/TRR 62
\CompanionTechnology for Cognitive Technical Systems" funded
by the German Research Foundation (DFG).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>C. M. Bishop</surname>
          </string-name>
          .
          <source>Pattern Recognition and Machine Learning (Information Science and Statistics)</source>
          . Springer,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Blackman</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Popoli</surname>
          </string-name>
          .
          <article-title>Design and Analysis of Modern Tracking Systems</article-title>
          . Artech House Publishers,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>R. de Salvo Braz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Amir</surname>
            , and
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Roth</surname>
          </string-name>
          .
          <article-title>A survey of rst-order probabilistic models</article-title>
          . In D. Holmes and L. Jain, editors,
          <source>Innovations in Bayesian Networks, Studies in Computational Intelligence</source>
          . Springer,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>V.</given-names>
            <surname>Gogate</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Domingos</surname>
          </string-name>
          .
          <article-title>Probabilistic theorem proving</article-title>
          .
          <source>In Proceedings of the 27th Conference on Uncertainty in Arti cial Intelligence</source>
          , pages
          <fpage>256</fpage>
          {
          <fpage>265</fpage>
          . AUAI Press,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>D.</given-names>
            <surname>Helbing</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Molnar</surname>
          </string-name>
          .
          <article-title>Social force model for pedestrian dynamics</article-title>
          .
          <source>PHYSICAL REVIEW E</source>
          ,
          <volume>51</volume>
          :
          <fpage>4282</fpage>
          {
          <fpage>4286</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>D.</given-names>
            <surname>Koller</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Friedman</surname>
          </string-name>
          .
          <article-title>Probabilistic Graphical Models: Principles and Techniques</article-title>
          . MIT Press,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>R. P.</given-names>
            <surname>Mahler. Statistical</surname>
          </string-name>
          Multisource-Multitarget Information Fusion. Artech House Inc.,
          <string-name>
            <surname>Norwood</surname>
          </string-name>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>K.</given-names>
            <surname>Murphy</surname>
          </string-name>
          .
          <article-title>Dynamic Bayesian Networks: Representation, Inference and Learning</article-title>
          .
          <source>PhD thesis</source>
          , University of California,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Reuter</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Dietmayer</surname>
          </string-name>
          .
          <article-title>Adapting the state uncertainties of tracks to environmental constraints</article-title>
          .
          <source>In Proceedings of the 13th International Conference on Information Fusion</source>
          , pages
          <fpage>1</fpage>
          <issue>{7</issue>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Reuter</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Dietmayer</surname>
          </string-name>
          .
          <article-title>Pedestrian tracking using random nite sets</article-title>
          .
          <source>In Proceedings of the 14th International Conference on Information Fusion</source>
          , pages
          <fpage>1</fpage>
          <issue>{8</issue>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Richardson</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Domingos</surname>
          </string-name>
          .
          <article-title>Markov logic networks</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>62</volume>
          (
          <issue>1-2</issue>
          ):
          <volume>107</volume>
          {
          <fpage>136</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Sadilek</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Kautz</surname>
          </string-name>
          .
          <article-title>Recognizing multi-agent activities from GPS data</article-title>
          .
          <source>In Proceedings of 24th AAAI Conference on Arti cial Intelligence</source>
          , pages
          <fpage>1134</fpage>
          {
          <fpage>1139</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>H.</given-names>
            <surname>Sidenbladh</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.-L.</given-names>
            <surname>Wirkander</surname>
          </string-name>
          .
          <article-title>Tracking random sets of vehicles in terrain</article-title>
          .
          <source>In Proceedings of the Conference on Computer Vision</source>
          and Pattern Recognition, page
          <volume>98</volume>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>P.</given-names>
            <surname>Singla</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Domingos</surname>
          </string-name>
          .
          <article-title>Entity resolution with markov logic</article-title>
          .
          <source>In Data Mining</source>
          ,
          <year>2006</year>
          . ICDM'
          <fpage>06</fpage>
          . Sixth International Conference on, pages
          <volume>572</volume>
          {
          <fpage>582</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Thrun</surname>
          </string-name>
          , W. Burgard, and
          <string-name>
            <given-names>D.</given-names>
            <surname>Fox</surname>
          </string-name>
          .
          <article-title>Probabilistic Robotics (Intelligent Robotics and Autonomous Agents)</article-title>
          . The MIT Press,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Tran</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Davis</surname>
          </string-name>
          .
          <article-title>Event modeling and recognition using markov logic networks</article-title>
          .
          <source>Computer Vision{ ECCV</source>
          <year>2008</year>
          , pages
          <fpage>610</fpage>
          {
          <fpage>623</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Domingos</surname>
          </string-name>
          .
          <article-title>Hybrid markov logic networks</article-title>
          .
          <source>In Proceedings of the 22th AAAI Conference on Arti cial Intelligence</source>
          , pages
          <fpage>1106</fpage>
          {
          <fpage>1111</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>