Relations as Context to Improve Multi-Target Tracking and Activity Recognition Cristina Manfredotti12 , Enza Messina1 , David Fleet2 1 DISCo, University of Milano-Bicocca, {manfredotti, messina}@disco.unimib.it 2 Computer Science Dept, University of Toronto, {cristina, fleet}@cs.toronto.edu Abstract. The explicit recognition of the relationships between interacting ob- jects can improve the understanding of their dynamic model. In this work, we investigate the use of Relational Dynamic Bayesian Networks to represent the dependencies between objects behavior in the context of multi- target tracking. We propose a new formulation of the transition model that ac- commodates for First-Order Logic relations and we extend the Particle Filter al- gorithm in order to directly track relations between targets. Many applications can benefit from this work, as activities recognition, traffic monitoring, strategic analysis, sports coaching and others. We present some re- sults about activity recognition in monitoring Canadian costal borders. Key words: Multi Target tracking, Probabilistic Relational Models, Bayesian Filtering, Particle Filtering. 1 Introduction and Motivations Context interpretation and context-based reasoning have been shown to be key factors for Computer Vision in the development of algorithms for object recognition [3]. In this domain the context is the scene where objects are located and the knowledge about it is expressed by the beliefs over the scene [4]. In this paper we deal with moving objects and we refer the concept of context to “what is happening around the object we are tracking”. Knowing the scene can im- prove the objects recognition task and the knowledge about the identity of the objects improves the belief over the scene; knowing what is happening in the scene (which “re- lations” are believed to be true in the scene) can improve the tracking and the knowledge about the state of the objects can improve our knowledge about the relation between the objects in the scene (i.e. the context). Consider, for example, the situation in which we have a group of people walking in a park. If we know they are walking together (i.e. if we have a certain belief over their relation), we know they will have a similar behavior or a similar motion. This will help us in tracking them. Moreover, taking into account the relations between objects can also allow us to recognize complex activities like, for example, the activity of “going to a pub together”: single persons walking can be a simple fragment of a more com- plex activity that includes some people meeting, going in the same direction, waiting each other at different points and entering together into the pub. Dealing with relations between moving objects allows us to recognize a complex activity like this one from another similar one that can be the “catching the subway during rush hour”: this com- plex activity also includes a group of people walking together in the same direction but those people will not wait for each other. In the last years Computer Vision has mostly dealt with the recognition of activities composed by the repetition of simple movements [12], instead those are examples of more complex activities that involve relations be- tween objects and/or single actions during time. In our work we model the context as a set of First-Order Logic relations using them in two principal ways: – We will use relations to improve the efficiency of the tracking. The information con- tained in the relationships can improve prediction, resulting in a better estimation of objects trajectories. – We will monitor relations as a goal in itself. This is the case in many applications like traffic prediction or consumer monitoring, anomaly detection or activity recog- nition. In this work we consider Relational Dynamic Bayesian Networks (RDBNs)1 , an extension of Probabilistic Relational Model [5] to dynamic domains, as a formalism to monitor relations between moving objects. In our RDBN-based model, relationships are considered as random variables whose values may change over time. While tracking the objects in the domain, we also track the evolution of their relationships. For this purpose, in the next sections we propose a formalization of a new dynamic model able to predict the future state of the objects taking into account their relations and we introduce a new version of Particle Filter (that we call Relational Particle Filter) that adapts to these settings. After presenting some preliminary results obtained on the Intelligent System Challenge 2008-2009 data set 2 , and a brief review of the literature, we conclude with some final remarks. 2 Modeling and Inference A relational domain is a set of objects with relations between them. We will call the state s of a relational domain relational state, and we define it as the set of instantia- tions of all the objects and their relations in the domain. Therefore, we can divide the relational state in two parts: the state of the objects3 so and the state of the relations sr and we will write: s = [so , sr ]. A Relational Bayesian Network (RBN) is a directed acyclic graph whose nodes are First-Order Logic attributes or relations between objects in the relational domain and whose structure represents the causality between the nodes. 1 The authors are aware of the works of Sanghai, Weld and Domingos on RDBNs; however the paper presenting their work has been retracted. Refer to: http://www.aaai.org/Library/JAIR/Vol24/jair24-019.php 2 http://www.intelligent-systems-challenge.ca/home/index.html 3 We will use the terms state of objects and state of instantiations interchangeably. When we deal with dynamic, relational states evolve with time and RBNs has to be extended to RDBNs. A Relational Dynamic Bayesian Network is structured as a pair of RBNs (B0 , B→ ), where B0 represents the probability distribution over the state of the relational domain at time 0 and B→ is a RBN of nodes at time t whose parent are predicates at time t − 1 or predicates at time t and nodes at time t − 1 without their parents. In order to make inference in a multi-target setting, we need to extend the algorithms traditionally used in tracking to represent relations. As in classic tracking, the aim is to estimate the current posterior distribution of the state space st conditioned to the sequence of observations z1:t up to time t: p(st |z1:t ). This distribution is often called the target’s belief (bel(st )). The tracker predicts the probability distribution of the future state st , given the knowledge about the current state st−1 , by means of a state transition model p(st |st−1 ). Once measurements about the state at time t (zt ) are acquired, the state is filtered using the sensor model p(zt |st ) that relates (potentially noisy) measurements to the state. Fig. 1. Relational Transition Model. Arrows indicate probabilistic dependence between variables. To extend the traditional tracking algorithms to represent relations we introduce the following components: The relational transitional model p(st |st−1 ) = p(sot , srt |sot−1 , srt−1 ) is a joint proba- bility of the state of all instances and relations. We assume that the state of relations is not directly affected by the state of the objects at the previous time step (see Fig- ure 1). Therefore the transition model can be rewritten as: p(sot , srt |sot−1 , srt−1 ) = p(sot |sot−1 , srt−1 )p(srt |srt−1 , sot ). (1) The sensor model p(zt |st ) gives the probability of the state at time t given the mea- surements obtained at the same time. We assume the relations to be not directly measurable, so the observation zt is independent of the relations between objects: p(zt |st ) = p(zt |sot , srt ) = p(zt |sot ). (2) Under the Markov assumption and the conditional independence of the data given the state, we can use a Bayesian filter algorithm to compute the belief of the relational state: f t) bel(st ) = α p(zt |sot ) bel(s (3) where α is a normalization constant and bel(s f t ) is the prediction done over the system (p(sot , srt |z1:t−1 )) that can be computed as: Z f t) = bel(s p(sot , srt |sot−1 , srt−1 )bel(st−1 )dst−1 . (4) According to the state transition model (Equation 1), we can write Equation (4) as: Z f t ) = p(so |so , sr )p(sr |sr , so )bel(st−1 )dst−1 . bel(s (5) t t−1 t−1 t t−1 t In the most general case we can represent the two partial transition models of Equa- tion (1) by a First Order Logic Tree (FOPT)4 . We will introduce an example of FOPT when dealing with the experiments. 2.1 Relational Particle Filter The specific and complex probabilistic nature of the presented setting makes impossible to use filters that require a probabilistic function in closed form, such as the Kalman filter. To solve this issue we developed an extension of the Particle Filter (PF) algorithm whose properties are appealing for our case. The PF algorithm [1] is a Monte Carlo method that approximates the required poste- rior density function by a set of random samples with associated weights and computes estimates based on these samples and weights. As the number of samples becomes very large, the Monte Carlo approximation to the correct posterior improves and the PF ap- proaches the optimal Bayesian estimate. We integrate the relational transitional model introduced in Equation (1) in a new Relational Particle Filter (RPF), shown in Algorithm (1). Algorithm 1: Pseudo Code for the Relational Particle Filter algorithm bel(st ) = RP F (bel(st−1), zt ) for all m = 1 : M do 1. xot,(m) ∼ p(sot |sot−1 , srt−1 ); hypothesis for the state of instantiations 2. xrt,(m) ∼ p(srt |srt−1 , sot = xot,(m) ); hypothesis for the state of relations 3. ω(m) = p(zt |xot,(m) ); weights computation for all m = 1 : M do ω 4. e (m) = P M (m)ω ; weights normalization ω m=1 (m) 5. Resample bel(st ) from {[xot,(m) , xrt,(m) ]} according to weights {e ω(m) } with repetition. 4 A FOPT (also known as First Order Decision Diagram [2]) is Probabilistic Tree whose nodes are First Order Logic formulas. A particle (xt,(m) ) is a representation of the state. For this reason, in our setting, it is divided in two parts: the part of the objects xot,(m) and the part of the relations xrt,(m) . (see Figure 2(a)). The part of the particle relative to the instantiations is sampled according to p(sot |sot−1 , srt−1 ) (Line 1), subsequently the part of the particle relative to the relations is sampled according to the second part of the relational transition model (Line 2). When the measurement is acquired, particles are weighted according to the sensor model (Line 3). The sensor model takes into account only the part of the parti- cles relative to the objects, since the particles are composed by two parts, also the parts associated to the relations are weighted. After the weighting step, weights are normal- ized (Line 4) and the set of particles for the next iteration is extracted according to the normalized weights in the resampling step (Line 5). (a) Particle representation. (b) First step of hypothesis. (c) Second step of hypothesis. (d) Particle weighting. Fig. 2. Cartoon representation of the proposed algorithm. 3 Experiments We evaluated the proposed method on a synthetic data set that is based on the Intelli- gence System Challenge 2008-2009 data set. We chose to use this data set because it is easier than a real data set but still challenging. The data set contains the description of the events happened in the sea; each element of the data set reports the tracks of two boats participating in an event (i.e. Rendezvous, PickUp and Avoidance) together. At each time interval at most one event takes place. We are particularly interested in the case where there is uncertainty about the par- ticipants taking part in an event, in order to demonstrate the advantage of maintaining beliefs over the set of possible relations. In order to test our relational particle filter algorithm for activity recognition in a more challenging scenario with multiple targets, we use the original data set to build a new synthetic data set of 120 situations (either rendezvous or avoidance), obtained by pairing two encounters randomly sampled from the original data set. In this way, four ships are present at the same time in the scene. In the experiment, we consider the task of detecting a rendezvous between a yacht and a fisher ship. After describing the setting of our experiments, in the next subsection, we report some results. 3.1 Settings and Results Rendezvous between a yacht and a fisher 180 100 160 50 x position y position 140 0 120 −50 100 −100 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 time time 30 20 20 0 speed x speed y 10 −20 0 −40 −10 −20 −60 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 time time Fig. 3. Example of Rendezvous. Avoidance between a yacht and a fisher 250 20 200 0 x position y position 150 −20 100 −40 50 −60 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 time time 40 20 20 15 speed x speed y 0 10 −20 5 −40 0 −60 −5 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 time time Fig. 4. Example of no relation. We used the data set to estimate the prior for the event Rendezvous between a Fisher and a Yacht (33/80). Then, we examined the data relative to the encounters in order to acquire information about the two different events (rendezvous or avoidance) that can be used to predict the relation. In particular, we focused on the variation of speed of the two targets. Consider for example, the rendezvous in Figure (3): the two ships come closer and both progressively reduce their speed until a nearly-zero value. Different is the case of ships that are avoiding each other (thus not in relation according to our model), one maintains its speed and the other decelerates (Figure (4)). From the images it is also possible to notice the three-phases which characterize the event of rendezvous: ships approach each other reducing their speed in the first phase, they travel in the same direction with nearly-zero speed in the second phase and finally they go apart and at least one of them change its speed. Our relational transition model takes into account these three different phases allowing to detect when the event starts and when it finishes but also allowing to understand if two ships can be in relation (since a ship can be in relation only with another ship). An example of the relational transition model used in our experiments is given in Figure (5) and in Figure (6) . Fig. 5. FOPT for p(sot |sot−1 , srt−1 ). We ran the experiments on each of the 120 sets of four tracks in the data set. In table (1) we show the accuracy of our method for the rendezvous detection compared Fig. 6. FOPT for p(srt |srt−1 , sot ). to the accuracy of a method that randomly choses which boats are in relation. In the table it is also reported the average tracking error of the RPF algorithm compared to a PF algorithm that does not take into account relations. The tracking error is computed as the distance between the trajectories acquired by the particle filter (at each time-step, it averages the position considering the states of all particles) and the real trajectories. method TP ration TN ratio Tracking Error(km, mean) RPF 0.4545 0.7235 1.8379 PF 3.3906 random choice 0.4444 0.4841 Table 1. In columns TP ratio and TN ratio the ratio over the 120 set of tracks of true positive and true negative is reported for our method and a random choice method. In the last column the average tracking error for our method (RPF) and a method that does not take into account relations (PF) is reported. 3.2 Related Works Our work is at the intersection of work in Probabilistic Relational Models, that to our knowledge have never before considered applications in tracking, and Computer Vision, where often heuristics are used to improve tracking, but not with a systematic account of relationships between targets. Recently there has been increasing interest in models that extend probabilistic rea- soning to First Order Logic to exploit redundancies observed in the worlds ([5], [6]). In this setting, many relational inference algorithms proceed by first fully instantiat- ing the First-Order relations and then working at the propositional level. In [10] an inference algorithm that instantiates relations only as needed is presented, but this al- gorithm can deal only with static domains as the relations are not supposed to change over time. Moreover, our model is different from the one presented in [9], where the concept of class is used to develop an inference system able to deal with a large number of heterogenous objects. We use First-Order Logic to explicitly represent relationships between objects to improve the inference task. Our method is potentially applicable to situations with a large number of objects as well. Hybrid states models have been used to deal with complex tracking tasks [7]. They combined continuous-valued dynamic with a discrete state of the world (context) en- coding which switching dynamic is performed jointly with tracking. Our system uses relations as representations of the context of each object instead of the context of the entire world. The explicit recognition of the relations of each object allows us to deal with much more complex tracking tasks. Moreover, the use of First Order Logic (as opposed to predicative logic) generalizes our models to different domains. In [8] the recognition of complex activity (temporally extended activities that can be fragmented in simple ones) is based on context-free grammar. They decouple the recognition task in two levels: a lower level that detects single simple activities that are the inputs for the stochastic context-free grammar used as a “bag of words” for a parsing mechanism. Instead, our approach does not decouple the recognition task, but seek to take advantage from the tracking, that provides the detection of simple activities, to recognize the temporally extended activity and from the knowledge about the complex activity to improve the tracking. In [11], the authors address the problem of activity recognition using First Order Logic rules and Markov Logic Networks to represent common sense domain knowl- edge. Differently from the method we are proposing, the inference task is performed off-line: they perform probabilistic inference for input queries about events of interest already happened. We seek, instead, to perform an on-line probabilistic inference of both the state of the domain and the activities. 4 Conclusions In this paper we presented a technique based on relational Bayesian reasoning in order to address the problem of activity recognition and tracking. We presented an extension of particle filter, called relational particle filter, that can be used to make inference. From our preliminary results we can conclude that our method can help to identify the type of encounter that the targets are engaging. Moreover we have shown how using relations as context can improve the tracking task. Compared to hybrid state model techniques, we are able to model the problem with a single dynamic model and the state representation is much more compact. There are a number of possible applications of this approach in problems where there is the need of monitoring a situation from sensed data (video surveillance, homeland- security, etc.) that we are interested to consider for future works. References 1. S. Arulampalam, S. Maskell, and N. Gordon. A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. IEEE Transactions on Signal Processing, 50:174– 188, 2002. 2. S. J. C. Wang and R. Khardon. First order decision diagrams for relational mdps. JAIR, 31:431–472, 2008. 3. A. A. E. Derek Hoiem and M. Hebert. Putting objects in perspective. In Proc. IEEE Com- puter Vision and Pattern Recognition (CVPR), volume 2, pages 2137 – 2144, June 2006. 4. G. Elidan, G. Heitz, and D. Koller. Learning object shape: From drawings to images. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2006. 5. N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In IJCAI, pages 1300–1309, 1999. 6. L. Getoor, N. Friedman, D. Koller, and B. Taskar. Learning probabilistic models of link structure. Journal of Machine Learning Research, 3:679–707, 2002. 7. M. Isard and A. Blake. A mixed-state condensation tracker with automatic model-switching. In ICCV, pages 107–112, 1998. 8. Y. A. Ivanov and A. F. Bobick. Recognition of visual activities and interactions by stochastic parsing. IEEE Trans. Pattern Anal. Mach. Intell., 22(8):852–872, 2000. 9. B. Milch and S. J. Russell. General-purpose mcmc inference over relational structures. In UAI, 2006. 10. H. Poon, P. Domingos, and M. Sumner. A general method for reducing the complexity of relational inference and its application to mcmc. In AAAI, pages 1075–1080, 2008. 11. S. D. Tran and L. S. Davis. Event modeling and recognition using markov logic networks. In ECCV (2), pages 610–623, 2008. 12. R. S. Yan Ke and M. Hebert. Event detection in crowded videos. In IEEE International Conference on Computer Vision, October 2007.