Goal-Based Person Tracking Using a First-Order Probabilistic Model


               Thomas Geier, Stephan Reuter, Klaus Dietmayer and Susanne Biundo
                             Faculty of Engineering and Computer Science
                                       Ulm University, Germany
                                    forename.surname@uni-ulm.de


                     Abstract                             scene, which is not possible when using completely
                                                          propositional models or specialized template models
    This work addresses the problem of person             like dynamic Bayesian networks [Murphy, 2002](which
    tracking using additional background infor-           can scale only along the time axis). A model given in
    mation. We augment a particle filter-based            a lifted representation also makes it possible to lever-
    tracking algorithm with a first-order prob-           age structure information contained within the lifted
    abilistic model expressed through Markov              formulation for more efficient inference [Gogate and
    Logic Networks to tackle the data associa-            Domingos, 2011]; although this approach is not inves-
    tion problem in domains with a high occlu-            tigated here.
    sion rate. Using a high-level model descrip-          We motivate our work in the context of an indoor sit-
    tion allows us to easily integrate additional         uation, where multiple persons move in a two-room of-
    information like a floor plan or goal informa-        fice, containing the laser range finder and several areas
    tion into a joint model and resolve occlusion         of interest like a printer or a coffee maker. We mea-
    situations that would otherwise result in the         sure the quality of our model by the extent to which
    loss of association. We discuss the engineered        it is able to correctly associate object tracks emerging
    model in detail and give an empirical evalu-          from the tracking algorithm with the correct persons
    ation using an indoor setting.                        inside the scene.
                                                          The rest of the paper is laid out as follows. After we
1   Introduction                                          discuss related work, we give an overview of the ap-
                                                          plied tracking algorithm and introduce the concept of
This work concerns the problem of providing               Markov Logic Networks. Then, we describe the inves-
background-knowledge to the specialized application       tigated problem in detail and discuss the used MLN
of person tracking using a high-level probabilistic       model. We give an empirical evaluation of the de-
model. We demonstrate that a hand-crafted model,          scribed setup and conclude with some possible exten-
built using Markov Logic Networks (MLN) [Richard-         sions to the model and an overall discussion.
son and Domingos, 2006], can help in solving the
data association problem in tracking during situations    2   Related Work
where high occlusion prevents the correct association
between past and new tracks. By leveraging additional
                                                          In multi-object tracking, state dependent detection
information, like a floor plan or knowledge about goals
                                                          probabilities of objects are disregarded in most appli-
of single persons, we can resolve otherwise opaque sit-
                                                          cations. Thus, a track disappears shortly after en-
uations.
                                                          tering an occluded area and a new track is created
The usage of a first-order probabilistic model like       when the object leaves the occluded area again. Con-
MLNs allows for an easier modeling task, because de-      sequently, one object is represented by different track
pendencies are represented by weighted first-order log-   IDs. Especially in scenarios where persons interact
ical formulas instead of, e.g., conditional probability   several times with a system, changed track IDs lead
tables for the case of directed models like Bayesian      to the loss of the objects history. The multi-object
Networks. In addition, the model is formulated in         Bayes filter [Mahler, 2007] allows to integrate state
a lifted form and can be instantiated for the desired     dependent detection probabilities even if the scenario
number of concurrent tracks or persons within the         is characterized by a high object density [Reuter and
Dietmayer, 2011]. In case of short term occlusions, the     and the received measurement set. Thus, no data as-
usage of state dependent detection probabilities leads      sociation is necessary.
to an improved track continuity. A direct integration
                                                            Further, the multi-object Bayes filter allows to inte-
of goals into the prediction of a persons’ state is cru-
                                                            grate state dependent detection probabilities into the
cial, since the persons’ action may be contradictory to
                                                            filtering algorithm. In Reuter and Dietmayer [2011],
the assigned goal.
                                                            an approach to calculate state dependent detection
Markov Logic Networks have been used by Sadilek and         probabilities based on the occupancy grid mapping ap-
Kautz [2010] for multi-agent activity recognition based     proach [Thrun et al., 2005] is proposed. Thus, it is
on GPS data in a game of capture the flag. While their      possible to keep track of an object which is occluded
work can leverage more expert knowledge (the rules of       for the sensor for a short period of time. Using con-
the game), they do not encounter the data association       stant detection probabilities would lead to a track loss,
problem present in the tracking scenario, since each        if an object is not visible to the sensor for a few mea-
person was carrying a personal GPS receiver. Tran           surement cycles. We use the state dependent tracking
and Davis [2008] apply Markov Logic Networks to a           algorithm as a comparison for our final results. But
parking lot surveillance scene using video data to rec-     for input into the high-level model, we use state in-
ognize which person enters which car. They also track       dependent object tracking. This produces more track
pedestrians across a scene and face the problem of data     IDs for association, and in particular is less prone to
association. Their sensory information emerges from         false association, which we cannot correct in the upper
image data and their focus lies in integrating differ-      stage.
ent information sources that are all extracted from
                                                            An implementation of the multi-object Bayes filter is
the video stream. Markov Logic Networks are also
                                                            possible using Sequential Monte Carlo (SMC) methods
used by Singla and Domingos [2006] for entity resolu-
                                                            [Reuter and Dietmayer, 2011, Sidenbladh and Wirkan-
tion in text mining. This is the problem of inferring
                                                            der, 2003, Mahler, 2007]. In difference to well known
which references refer to the same entity and it is sim-
                                                            SMC implementations of the standard Bayes filter a
ilar to the data association problem in tracking. The
                                                            particle set, which represents a random finite set us-
two latter works use an equals predicate for identity
                                                            ing a finite number of state vectors, is used instead of
maintenance, whereas we approach the problem using
                                                            a standard particle. Further, the number of state vec-
an association mapping to underlying entities. When
                                                            tors in the particle set may change at each time step.
grounding the model our approach only creates asso-
                                                            In case of a SMC implementation, the integration of
ciations between currently instantiated track IDs and
                                                            the mentioned constraints is possible by reducing the
their corresponding entity, wheres using an equals
                                                            weight of a particle set.
predicate will introduce relations between all objects,
which does not seem reasonable in a dynamic domain.         Since the multi-object Bayes filter does not perform
                                                            a measurement to track association, an extraction of
                                                            the individual objects out of the multi-object posterior
3   Multi-Object Tracking                                   density function is necessary, e.g. using the k-means
                                                            algorithm [Bishop, 2006].
Standard multi-object tracking algorithms often use
object individual single-object trackers like the
                                                            4   Markov Logic Networks
Kalman filter. The drawback of this multi-object
tracking approach is the need of a data association
                                                            Markov Logic Networks [Richardson and Domingos,
step which assigns the received measurements to the
                                                            2006] are a member of the family of first-order proba-
trackers using hard decisions or probabilistic meth-
                                                            bilistic languages [de Salvo Braz et al., 2008] and their
ods [Blackman and Popoli, 1999]. Especially in scenar-
                                                            semantics are based on undirected graphical mod-
ios characterized by a high object density, the data as-
                                                            els (Markov networks). In contrast to propositional
sociation is error-prone and degrades the performance
                                                            models like Bayesian networks and Markov Networks,
of the tracking system, since false associations are ir-
                                                            where every random variable has to be specified ex-
reversible.
                                                            plicitly, in first-order models the random variables are
A rigorous approach to multi-object tracking is the         relations over objects and the model can be scaled
multi-object Bayes filter proposed by Mahler [2007].        by providing the appropriate number of object con-
The multi-object Bayes filter uses the random finite set    stants. Moreover, MLNs allow the specification of
statistics to represent the complete environment by a       dependencies as weighted first-order logical formulas.
single filter state. In the innovation step of the multi-   Higher weights make those interpretations more likely,
object Bayes filter, a multi-object likelihood function     in which more groundings of the formula evaluate to
calculates the affinity between the predicted state set     true. We will now briefly cover the formal semantics
of MLNs.                                                      the existence of such information depending on the ap-
                                                              plication. Issued print jobs could be recognized by a
A Markov Logic Network L = {(f1 , w1 ), . . . , (fn , wn )}
                                                              special program installed on the PC, or visits to the
for n ∈ N is a set of first-order formulas f1 , . . . , fn
                                                              coffee maker could be predicted from personal habits.
with given weights w1 , . . . , wn ∈ R. Together with
a finite set of constants C, they define a probabil-          There are one to three persons inside the scene simul-
ity distribution over all interpretations (or possible        taneously. Major occlusion caused by static objects,
worlds). An interpretation maps each grounding of             like walls, occurs when people enter the second room.
each predicate to a truth value. The interpretation           Minor static occlusion can occur near the coffee maker.
of functions must be fixed. Probabilistic functions           During the scenes with more than one person, dynamic
can be emulated using predicates. Let gC (f ) be the          occlusion occurs when persons are covered by other
set of groundings of formula f obtained by replac-            persons standing between them and the sensor.
ing the free variables in f by all combinations of
                                                              Using only the particle filter-based tracking algorithm
constants from C. Given an interpretation x, then
         def                                                  to process the output of the laser range finder, prob-
nC,i (x) = |{g | g ∈ gC (fi ) and x |= g}| is the num-        lems arise when people produce no measures for an
ber of groundings of formula fi that are true under x.        extended period of time because they are inside the
Then, the probability distribution PL,C that is defined       separate room or because they are hidden by another
by the MLN L with constants C is given as                     person. For shorter occlusion durations it is possible to
                             1 Y                              keep the track of a single occluded person alive for long
                       def                      
        PL,C (X = x) =           exp wi nC,i (x) ,     (1)    enough for the person to reappear and re-association is
                             Z i
                                                              completely handled by the tracking algorithm. If two
                                                              persons enter the same occlusion area, their estimated
where i ranges over all formulas in L, and Z is a nor-        positions begin to mix spatially and once they emerge
malizing constant.                                            again, re-association becomes more and more arbi-
Given a set of constants, a MLN can be converted to           trary with increasing occlusion duration. In these sce-
a Markov network, where nodes correspond to atoms             narios, a direct integration of the Social Force model
and each ground formula induces a clique over all             into the prediction of the persons state of the tracking
nodes whose atoms appear inside this formula. For             algorithm may increase the performance of the sys-
practical reasons, a sorted (or typed) logical language       tem [Reuter and Dietmayer, 2010]. The Social Force
is used to describe MLNs. Using sorted terms, we              model aims at describing pedestrian movement by vir-
can limit the size of the grounded network. Also, in          tual forces exerted by other persons and environmental
their basic form, MLNs do only allow restricted usage         features [Helbing and Molnar, 1995]. Since the model
of logical functions. Usually functions are simulated         heavily depends on the destinations of the person, a
by specially marked predicates, which enforce a func-         tight integration with a high-level knowledge base, as
tional dependency of one or more arguments on the re-         described in this work, seems promising for such an
maining arguments. We notate functional arguments             approach.
of predicates by underlining them. Such a predicate           Figure 1b shows an example of the tracking results for
can be translated to a multi-valued random variable.          one of the sequences with three persons. The trajec-
                                                              tories are illustrated by solid lines. Since the results
5    Problem Description                                      are generated without the usage of the state dependent
                                                              detection probability, the trajectories are interrupted
We consider an indoor scene which resembles an office         quite often in the area corresponding to region RA,
setting. The corresponding floor plan is depicted in          where a dynamic occlusion occurs.
Figure 1a. A laser range finder is placed in one corner
of the main room and provides distance information
in a plane about one meter above ground. The beam             6   Description of the Model
almost completely covers the main room, but there
exists a second room that has virtually no sensor cov-
erage. There is only one entrance to the room complex         In this section, we describe the used MLN model and
and the separate room has only a single exit, which is        discuss some of the difficulties and design choices we
the door to the main room. The setting contains sev-          have encountered in its engineering. The complete
eral features that may serve as goal destinations for         model is factored into four modules. We begin with
persons navigating inside the scene; like a printer or        a discussion of two concepts that cannot be associated
a coffee maker. Knowledge about destinations of per-          with distinct model parts but influence nearly every
sons is taken as given; although it is easy to motivate       aspect – the representation of space and time.
                   Office


                                                  Coffee
                                              0            -1
                   Printer RC                     Living


                            -4           -3
                   RB
                                         -4                -1
                   RA                             Exit


                   Laser Range Finder

                                             -5            0
                                 (a) Floor plan                     (b) Example trajectories

Figure 1: (a) Floor plan of the location used for the experiments. In addition the picture shows the eight discrete
regions that are used for the MLN model, how they are connected via the handles and their static occlusion
value in their lower right. (b) Example trajectories of a three person sequence: trajectories are illustrated by
solid lines in different colors. The black stars are measurements of static objects like walls.


From continuous to discrete space. The basic                    Representation of time. In order to model dy-
MLN can only represent discrete random variables.               namic domains, we assign a dedicated time sort, whose
There exists an extension of MLNs to continuous vari-           constants are elements from the natural numbers. One
ables [Wang and Domingos, 2008], but no working im-             usually aims to construct a model that fulfills the
plementation is available. For this work, we reduce the         Markov property, i.e., the state at time t + 1 only de-
continuous spatial estimates obtained from the track-           pends on the state at time t. This means that formulas
ing algorithm to a few discrete regions. For ease of            may only contain predicates of at most two different
modeling and processing of data, we choose a rectan-            times, which then must be successive. But in the pre-
gular shape. We do not create a uniform grid, but try           sented model, the predicates that represent the associ-
to respect functional aspects of the environment con-           ation of tracks to persons are not time-indexed, which
cerning the problem. For example, it does not make              makes them static. This makes it difficult to apply
sense to further split the office into smaller areas if         standard dynamic inference algorithms, which usually
there is no distinction for the sensor (everything is one       assume the Markov property. But a static variable
connected occluded area) and there is only a single goal        can be considered as a dynamic variable, for which
inside. Even having two separate goals like two work            the same value is enforced in every time step. For-
stations might not justify the introduction of separate         tunately, these static association are only referenced
regions for each, as long as the rest of the model can          over a limited period of time — the life-time of a track
not discriminate between them. We have defined a to-            — so they do not pile up over the course of the com-
tal of eight regions, which are depicted in Figure 1a.          plete sequence. For time resolution we have settled
Sticking with a low number of regions also made an ex-          for the duration of about one second, which seems like
act evaluation of the final model feasible. Depending           a good compromise between inference complexity and
on the inference approach, there might be no signifi-           accuracy for the given problem.
cant overhead when using a larger state space for the
spatial component. Although, the model engineering
                                                                The tracking model. The basic functionality for
may become more intricate when opting for more fine
                                                                interfacing with the tracking algorithm is provided
grained regions. Taking the characteristics of the sen-
                                                                by a MLN module that contains objects of the sorts
sor into account, a radial layout for regions seems like
                                                                Track and Person. To notate variables of some sort,
a promising approach, but this was not investigated.
                                                                we use the initial letter of the sort name in lower
                                                                case. For the sort Track, the letter ’m’ is used be-
cause of the ambiguity with sort Time. For both                 static and dynamic occlusions occur, being caused by
sorts Track and Time, there exist time-dependent                walls or other persons, respectively. For our experi-
predicates atT : Time × Track × Region and atP :                ments, only static occlusion information is modeled.
Time × Person × Region that give the current loca-              This is done by assigning a certain probability to each
tion of a track or person, respectively. The time-              region that it may contain untracked persons. The
independent predicate a : Track → Person associates             probability is larger for areas of high static occlusion,
Track objects to Person objects. The correspondence             like the separate room. We also assign a higher occlu-
of tracks to persons inside the MLN is similar to the           sion to regions that are more likely to be dynamically
correspondence of measurements to tracks inside the             covered, like the region around the coffee maker. By
tracking algorithm. The output of the tracking algo-            assigning low occlusion probabilities to central regions
rithm is converted to observations of the atT func-             that have a good sensor coverage we penalize persons
tion and an additional completely observed predicate            silently slipping past the sensor. Formula 5 is provided
act : Time × Track, whose purpose is mainly to be               once for each region r. The weight wr is the occlusion
able to prune groundings of formulas that depend on             value, which is given in Figure 1a.
inactive track IDs. The usage of this predicate is omit-
                                                                   wr   atP (t, p, r) ∧ ¬∃m : (act(t, m) ∧ a(m, p)) (5)
ted for clarity. Information about goals of persons can
attach to the location of the person objects. The core
                                                                Goals and their dynamics. The last model part
tracking model then consists of the following two for-
                                                                handles the goals of persons. We associate goals
mulas.
                                                                with regions and add another time-dependent func-
       5   atT (t, m, r) ∧ a(m, p) ⇒ atP (t, p, l)       (2)    tion goal : Time × Person × Region, where goals are
                                                                also allowed to assume the special location NULL to sig-
     −3       a(m1 , p) ∧ a(m2 , p) ∧ m1 6= m2           (3)
                                                                nal that a person currently has no goal. The following
Formula 2 probabilistically forces a person to be in            formulas describe the dynamics of goals:
the same region as its associated track. By design a             ∞ goal(t, p, l) ∧ ¬atP (t, p, l) ⇒ goal(t + 1, p, l)(6)
track can only be associated to one person at a time
                                                                 ∞             goal(t, p, l) ∧ atP (t, p, l)
because the last parameter of the predicate a is de-
clared functional. Formula 3 probabilistically enforces                 ⇒ goal(t + 1, p, l) ∨ goal(t + 1, p, NULL) (7)
the association to also be a one-to-one relation. The           0.1                  goal(t, p, NULL)                 (8)
engineering of the formula weights is explained at the
end of this section. This formula is limited to concur-         Formulas 6 and 7 achieve that goals can only be cleared
rently instantiated tracks using the act predicate (not         when a person reaches their associated region. For-
listed).                                                        mula 8 encodes the urge of people to clear their goal.
                                                                Stating this rule in this particular form results in per-
                                                                sons trying to reach their goal as soon as possible, since
The floor plan. We use the static predicate adj :
                                                                otherwise the penalty accumulates over time. Another
Region × Region to encode the connectedness of the
                                                                way to make people reach their goals is to make it more
regions. All instances are fully observed and adhere
                                                                likely for a person to be inside the region of their goal.
to the floor plan given in Figure 1a. A single formula
                                                                Both rules work equally well for our dataset but might
forces persons to move between regions only according
                                                                make a difference when applied to longer sequences or
to the given layout:
                                                                under a different setting.
  ∞ atP (t, p, l1 ) ∧ atP (t + 1, p, l2 ) ⇒ adj(l1 , l2 ) (4)
                                                                Elicitation of weights. There exist two major ways
                                                                to determine the weights of the probabilistic formulas:
This formula is deterministic to prevent persons from
                                                                Learning from data and elicitation from experts; where
“teleporting” through the scene. If regions allow for a
                                                                for common sense domains, like the one we are dealing
traversal in less than a second (one time step), this rule
                                                                with, everyone is usually an expert. Both the learn-
becomes invalid. But since the association of tracks to
                                                                ing of weights and the direct specification approaches
persons allows for some slack, a person in the model
                                                                have been followed in the literature. For the case of
can “catch up” to the location of its real counterpart
                                                                our related work, Sadilek and Kautz [2010] and Singla
after some time steps, only violating Formula 2.
                                                                and Domingos [2006] are employing learning and Tran
                                                                and Davis [2008] specify the weights by hand. Due to
The occlusion model. To prevent persons without
                                                                the limited size and the common sense nature of our
an associated track from wandering across the scene
                                                                dataset we decided to specify the weights ourself.
(since no track influences their current location), we
need to express that persons usually have a track un-           The approximation to consider the weight as the loga-
less they are indeed occluded. In our setting both              rithmic odds of the formula being true [Richardson and
                                                                                          pl
Domingos, 2006] can serve as a good starting point,          logarithmic odds wl = log 1−p   l
                                                                                               , where pl is the relative
but it only holds as long as formulas do not share pred-     frequency of a particle of track m being in region l.
icates. After assigning some reasonable initial values,
we iteratively looked at predictions of the model for
selected sequences and adjusted the weights if the pre-
                                                             8    The Inference Problem
dictions did not conform with our expectations. We
began by adjusting the weights for sequences with only       Markov Logic Networks can be seen as template mod-
one persons and switched to larger test sequences once       els for undirected graphical models [Koller and Fried-
the model made sensible predictions for the training         man, 2009]. Their semantics are defined using the
data at hand. Since MLNs are based on undirected             ground version of these networks. As such the de-
graphical models (which means they are locally un-           scribed MLN represents an undirected version of a dy-
normalized), there are no absolute correct values, but       namic Bayesian network [Murphy, 2002]. The effort for
the weights of different formulas have to be balanced        exact inference in such models is usually exponential in
against each other.                                          the number of variables within one time slice, because
                                                             most variables within one time step become dependent
                                                             on each other after some steps in most models.
7    Preprocessing of Tracking
                                                             The model described in this work also suffers from this
     Information                                             problem. The cause that all variables of a time slice
                                                             become dependent lies in the probabilistic data asso-
We go on and describe how tracking data is processed         ciation; which is a hard problem at its core. In our
for input to the MLN model. After extraction of the          case the problem of exact inference is exponential in
individual objects in the multi-object Bayes filter, we      mT + nP , where mT is the maximum number of simul-
                                         t
obtain a set of single object particles Xm for each track    taneous tracks and nP is the total number of persons
ID m and time step t. We then apply two data reduc-          in the model. Here mT stems from the association
tion steps. First, the MLN model works on a coarser          predicates and nP are the instantiations of the atP
time scale of 1.25 steps per second, while the tracking      predicate for one time step.
algorithm runs with 12.5 steps per second. We drop
the intermediate steps without further processing. A         For our evaluation we perform exact inference on
different approach might aggregate them, e.g., by av-        the model by exploiting context-specific indepen-
eraging, but this would also distort the meaning of the      dence [Koller and Friedman, 2009, pp. 171]. Given
data, because it cannot be considered a snapshot of          an assignment to all association variables, the model
the situation anymore.                                       factorizes into components for each person and thus
                                                             becomes tractable. Our largest sequence contained
Depending on the quality of the tracking algorithm           only 10 tracks, which results in 37 possible associa-
and the used object model, there can be many false           tions to three persons after observing the correct as-
positive tracks, e.g., when people spread their arms         sociation for three initial tracks. After conditioning
away from their body, crossing the plane of laser            on the association variables we calculate the partition
beams. To reduce these false tracks, we use the ex-          function for each association using variable elimination
istence probability to eliminate insignificant tracks. It    along a min-degree variable ordering. This approach is
                t
is given by |Xm   |/N ; the number of particles for track    not suited for online filtering. For this purpose a rao-
ID m divided by the total number of particles N . We         blackwellized particle filter, which collapses all but the
drop all tracks from a time step whose existence prob-       association variables, seems like a good solution [Koller
ability is below 0.5. For our test sequences, the output     and Friedman, 2009, pp. 526]. Evaluating the per-
of the tracking algorithm usually contains about thirty      formance of this inference approach on the presented
tracks per sequence, but only less than ten remain af-       model is open for future work.
ter applying both reduction processes.
For each time step t and each track ID m that survive        9    Evaluation
the described process we add the track as active to our
MLN model via observation of the act predicate. We
                                                             We recorded 9 sequences in total; three sequences with
then bin the single object particles into the discrete
                                                             one, two and three persons, respectively. The duration
regions. Most of the time all particles are contained
                                                             of each sequence is about one minute. The course of
in a single region and we create an observation of the
                                                             events is the same among sequences with the same
atT function. In cases where the particles of a track
                                                             number of persons; the sequences vary during the part
m spread over several regions we reflect this as proba-
                                                             where multiple persons wander around the main room.
bilistic evidence by adding a formula (wl , atT (t, m, l))
for each location l and calculating the weight as the        The setups with one person only feature static occlu-
                                    Unassigned Tracks        Model M           Model MO       Model MOG
             Sequence    Persons    ↓pD = c ↓pD (x)         ↓a− ↑ptrue        ↓a− ↑ptrue      ↓a− ↑ptrue
             S1-1532     1          1           1           0        0.34     0      0.99     0      1.00
             S1-1640     1          1           1           0        0.34     0      0.99     0      1.00
             S1-1737     1          1           1           0        0.34     0      1.00     0      1.00
             S2-2056     2          3           2           2        0.10     2      0.27     0      0.54
             S2-2207     2          4           4           3        0.00     4      0.01     3      0.03
             S2-2329     2          3           3           2        0.09     2      0.26     0      0.51
             S3-4628     3          5           1           3        0.25     0      0.45     0      0.89
             S3-4734     3          5           3           2        0.15     0      0.52     0      0.55
             S3-5306     3          7           4           1        0.13     1      0.12     1      0.25

Table 1: For each of the nine sequences used for evaluation, we give the number of invented tracks minus the
number of persons for tracking with constant detection probability pD = c and with state dependent detection
probability pD (x). For the three MLN models, we give the number of false associations a− and the probability
of the true assignment ptrue .


sion caused by walls, caused by the single person stay-         person and evaluate how well we can associate new
ing inside the office for several seconds. This results         tracks. In our dataset, the number of tracks that re-
in its track being reinvented upon entering the main            main unassigned after labeling the starter tracks varies
room again. With two persons there is dynamic occlu-            between one and seven.
sion, where one person covers the other person. Both
                                                                The results of our evaluation are given in Table 1. For
persons enter the office together and thus cannot be
                                                                each sequence, we give the number of false track as-
distinguished once they reappear. Goal information
                                                                signments of the most probably association. In ad-
for one person can resolve this issue and we can ob-
                                                                dition, we provide the probability of the correct joint
tain a good association again. In the scenes with three
                                                                association. This is interesting for cases where no false
persons, one person enters the office while the two re-
                                                                associations were made even with a simpler model, and
maining persons stay inside the main room. Dynamic
                                                                can show an improved significance of the correct asso-
occlusion occurs while all three persons are walking
                                                                ciation when using a more sophisticated model.
around in front of the sensor. Tracks are lost and recre-
ated often, which can also be observed in Figure 1b in-         To provide a baseline, the number of track confusions
side the area corresponding to region RA. When two              and losses of a state-of-the-art multi-object Bayes fil-
persons are simultaneously shadowed by the third one,           ter with state dependent detection probability (pD (x))
it is not possible to associate the reappearing tracks to       and the ones using the same filter with constant detec-
the correct persons just by means of the sensor. In this        tion probability (pD = c) are given [Reuter and Diet-
case goal information can be used to identify the cor-          mayer, 2011]. For both filters, the number of persons
rect association.                                               inside the scene is subtracted from the total number
                                                                of significant tracks. If the tracking algorithm works
We have evaluated three models that incorporate an
                                                                perfectly, this number will be zero. The output of the
increasing amount of domain knowledge. The first
                                                                pD = c filter is used as input for the MLN stage; so
model M uses only the basic tracking model and the
                                                                this number equals the number of associations made
floor plan. The second model MO comprises all of the
                                                                by the high-level model and thus equals the possible
first model and the static occlusion model. The third
                                                                maximum number of false associations.
model MOG adds information about personal goals. One
person visits the coffee maker in each sequence. We as-         We observe that the algorithm with the state depen-
sign the Coffee region as goal for this person in every         dent detection probability reduces the number of unas-
sequence. All other persons have no goal assigned.              signed tracks dramatically in the scenarios with three
The goal information is able to resolve confusions that         persons, where a lot of short term occlusions occur.
happen before the designated person visits its goal re-         In the scenarios with one or two persons, where the
gion. This does not happen in every sequence.                   long-term occlusions due to static objects dominate,
                                                                the usage of the state dependent detection probability
The MLN is instantiated for three persons in every
                                                                has nearly no influence on the number of unassigned
setup, regardless of the number of persons appearing
                                                                tracks. The high-level model MO using only the floor
in the scene. For each model and each sequence, we ob-
                                                                plan and the static occlusion model delivers results
serve the correct association for the first track of each
that are at least on par with the state dependent track-    R. de Salvo Braz, E. Amir, and D. Roth. A survey of
ing algorithm. Using goal information for one person          first-order probabilistic models. In D. Holmes and
can further improve the results. By our judgment, this        L. Jain, editors, Innovations in Bayesian Networks,
is not possible by relying solely on the data obtained        Studies in Computational Intelligence. Springer,
from the laser range finder in general.                       2008.
One of the two person sequences (S2-2207) shows very        V. Gogate and P. Domingos. Probabilistic theorem
bad performance of the MLN model for all three cases.         proving. In Proceedings of the 27th Conference on
Our investigation has shown that an outstretched arm          Uncertainty in Artificial Intelligence, pages 256–
has caused a significant track that made it through the       265. AUAI Press, 2011.
data reduction process. The relatively high weight on       D. Helbing and P. Molnar. Social force model for
Formula 3 prevented the association of this track to the      pedestrian dynamics. PHYSICAL REVIEW E, 51:
owner of the arm. Thus, a third person was forced by          4282–4286, 1995.
the model to appear at this spot and remained present
                                                            D. Koller and N. Friedman. Probabilistic Graphical
over the course of the scene.
                                                              Models: Principles and Techniques. MIT Press,
                                                              2009.
10    Conclusion                                            R. P. Mahler. Statistical Multisource-Multitarget Infor-
                                                              mation Fusion. Artech House Inc., Norwood, 2007.
We have described an approach to solving the data
association problem for person tracking with a high-        K. Murphy. Dynamic Bayesian Networks: Representa-
level probabilistic model described using Markov Logic        tion, Inference and Learning. PhD thesis, University
Networks. We showed how to map the output of a                of California, 2002.
regular tracking algorithm into a discrete spatial rep-     S. Reuter and K. Dietmayer. Adapting the state un-
resentation, which makes it easy to attach additional         certainties of tracks to environmental constraints. In
information, e.g., personal goals, and allows the use of      Proceedings of the 13th International Conference on
inference techniques for discrete probabilistic models.       Information Fusion, pages 1–7, 2010.
Especially in scenarios with long-term occlusions,          S. Reuter and K. Dietmayer. Pedestrian tracking using
where even the multi-object Bayes filter is not able to        random finite sets. In Proceedings of the 14th Inter-
continue to track hidden objects, the association using        national Conference on Information Fusion, pages
MLN outperforms the tracking-based approach when               1–8, 2011.
using exact evaluation. On the other hand, a sophisti-      M. Richardson and P. Domingos. Markov logic net-
cated tracking algorithm is adequate in scenarios with       works. Machine Learning, 62(1-2):107–136, 2006.
occlusions of no more than one second and might be
                                                            A. Sadilek and H. Kautz. Recognizing multi-agent
able to scale more easily to larger domains.
                                                              activities from GPS data. In Proceedings of 24th
In future, we plan to integrate more information out          AAAI Conference on Artificial Intelligence, pages
of the knowledge-base directly into the tracking algo-        1134–1139, 2010.
rithms like, e.g., probabilistically modeled destinations   H. Sidenbladh and S.-L. Wirkander. Tracking random
or goals of a person.                                         sets of vehicles in terrain. In Proceedings of the Con-
                                                              ference on Computer Vision and Pattern Recogni-
Acknowledgements                                              tion, page 98, 2003.
                                                            P. Singla and P. Domingos. Entity resolution with
This work is done within the Transregional Collab-            markov logic. In Data Mining, 2006. ICDM’06.
orative Research Centre SFB/TRR 62 “Companion-                Sixth International Conference on, pages 572–582,
Technology for Cognitive Technical Systems” funded            2006.
by the German Research Foundation (DFG).
                                                            S. Thrun, W. Burgard, and D. Fox. Probabilis-
                                                              tic Robotics (Intelligent Robotics and Autonomous
References                                                    Agents). The MIT Press, 2005.
C. M. Bishop.        Pattern Recognition and Ma-            S. Tran and L. Davis. Event modeling and recogni-
  chine Learning (Information Science and Statistics).        tion using markov logic networks. Computer Vision–
  Springer, 2006.                                             ECCV 2008, pages 610–623, 2008.
S. Blackman and R. Popoli. Design and Analysis of           J. Wang and P. Domingos. Hybrid markov logic net-
  Modern Tracking Systems. Artech House Publishers,            works. In Proceedings of the 22th AAAI Conference
  1999.                                                        on Artificial Intelligence, pages 1106–1111, 2008.