Efficient Hausdorff distance metric calculation algorithm for human position comparison to template and its performance and precision research Karolis Ryselis, Tautvydas Petkus Department of Software Engineering Kaunas University of Technology Kaunas, Lithuania e-mail: Karolis.Ryselis@ktu.edu, Tautvydas.Petkus@ktu.edu Here ‖𝑎 − 𝑏‖ could be any distance metric between points Abstract–This article presents efficient algorithm to calculate 𝑎 and 𝑏, e, g., Euclidean norm or 𝐿2 norm. Semantically this Hausdorff distance metric and heuristic algorithm that metric shows the distance to the most distance point from the maximizes amount of intersecting points in two point sets very other set corresponding to a chosen distance metric. quickly. Performance and precision research of those algorithms is presented. It is shown that algorithm is applicable for real time Looking at (2) formula it is trivial to see that direct position tracking using “Kinect” device. calculation of the metric has computational complexity of 𝑂(𝑛𝑚) because all points of both sets must be compared. If Keywords–Kinect; Hausdorff distance; pattern matching 𝑚 ≈ 𝑛, complexity approaches to 𝑂(𝑛2 ) and is not applicable for real time use. I. INTRODUCTION “Kinect” sensor is a motion tracking device presented by II. HAUSDORFF DISTANCE CALCULATION METHODS “Microsoft”. It is one of the most popular motion tracking A. Yanir Taflev’s method devices used at home. “Kinect’s” application programmable Yanir Taflev presents an open source solution to compute interface presents external applications with current human Hausdorff distance metric [4]. This algorithm does not position [1]: skeleton stream that consists of processed human calculate the distance itself, it produces Hausdorff distance joint position information; depth stream, which is a two- matrices instead. Each point of such matrix shows the distance dimensional matrix with data about every pixel’s distance to the closest point from set 𝐵 for each point of set 𝐴. from the sensor to the closest object in its direction; body index Hausdorff distance is then equal to the maximum value of both matrix which assigns every pixel to a tracked player’s body. distance matrices. Skeleton stream is processed by “Kinect” and presents external Hausdorff distance matrix is calculated by constructing application with information about human body parts. initial matrix first. A value of 0 is assigned to every point of Therefore, it is applicable when information about separate set 𝐵 and ∞ for all other points. For all points of set 𝐵 the body parts is needed. However, it is shown by some researches matrix is traversed in squares of increasing size and a value of [2] that skeleton stream is only accurate when the user is square edge length is assigned if the value of that point is not visible by “Kinect” comfortably and the user is standing. less that the length of square edge. Chebyshev distance is Skeleton is warped under different conditions. computed because of such matrix construction. Specialized algorithms are needed to process data from Algorithm is implemented in C# language and is used in other data streams and compare them to a template. All data C# framework “Shape Matching Framework”. streams present the user with large amount of data compared to preprocessed skeleton stream. Thus, they require efficient B. “Elastic Search” method algorithms to process them if real time or near real time “ElasticSearch” client created by “Vivid Solutions” is an tracking is required. open source Java project [5]. It used discrete Hausdorff The easiest and simplest to process data stream is body distance implementation. The computed distance is index stream. If only one human is tracked this stream is easily approximate. The algorithm is not given by “Vivid Solutions”. deduced to a binary matrix where 1 represents a pixel that The implementation is used to compare geometries in “Elastic belongs to human body, 0 – that is does not. Hausdorff distance Search” engine internally. metric is a good metric for this kind of data [3]. Say that 𝐴 = {𝑎1 , 𝑎2 , … , 𝑎𝑛 } and 𝐵 = {𝑏1 , 𝑏2 , … , 𝑏𝑚 } are finite sets of C. Princeton University’s method points. Then Hausdorff distance is defined as follows: Princeton University’s resolving library “RgResolve” uses 𝐻(𝐴, 𝐵) = max⁡(ℎ(𝐴, 𝐵), ℎ(𝐵, 𝐴)), kur (1) Hausdorff algorithm’s Java implementation [6]. Algorithm ℎ(𝐴, 𝐵) = max min‖𝑎 − 𝑏‖ (2) relies on assumption that all points are sorted either clockwise 𝑎∈𝐴 𝑏∈𝐵 or counterclockwise. Points are then analyzed in this order. Data is analyzed as a list of points instead of two-dimensional Copyright © 2017 held by the authors 33 matrix, which is different from Yanir Taflev’s implementation. 𝑏𝑖 = 𝐶𝐵 + (𝑏𝑖 −𝐶𝐵 +𝐶𝐵 −𝐶𝐴 ) (8) 𝑠 The distance is calculated using Euclidean norm. If silhouettes of template and human belong to the same III. PROPOSED HAUSDORFF DISTANCE CALCULATION METHOD sequence of movement, we can assume that these transformation parameters will stay constant during the The human could be in any part of “Kinect’s” frame. The process of matching. Thus, they can be calculated in the result of the algorithm must not depend on where the human is beginning and used later. This should speed up the tracking standing. A transformation of human’s silhouette is needed even more. Then before computing Hausdorff distance. It must move the 𝑡 = 𝐶𝐵1 − 𝐶𝐴1 (9) silhouette to the position where it would intersect with the ℎ2 template the most. 𝑠= (10) ℎ1 These requirements must be met by efficient Hausdorff (𝑏𝑖 −𝐶𝐵 +𝑡) 𝑏𝑖 = 𝐶𝐵 + (11) distance calculation algorithm: 𝑠 1. Execution time is close to 30 ms with “Kinect’s” body B. Proposed Hausdorff distance calculation algorithm index stream data Proposed algorithm is based on Yanir Taflev’s ideas, but is 2. Algorithm must find an optimal linear transformation better optimized for speed. The use of this algorithm that makes the template and user’s silhouette be eliminates the need to convert data from two-dimensional oriented in such a way that the amount of intersecting matrix to other formats, e.g., a list of points. Chebyshev points is maximum and the Hausdorff distance is distance is used as distance metric. Firstly, a two-dimensional minimum matrix is set up. All points with coordinates that belong to the A. Proposed algorithm to find the optimal silhouette are set to 0, others – with ∞. A point queue is transformation constructed. Initially it consists of all points that belong to the silhouette. A point is taken from the queue and 8 surrounding Say that we have two finite sets of points 𝐴 = equidistant points are analyzed. The distance from the original {𝑎1 , 𝑎2 , … , 𝑎𝑛 } and 𝐵 = {𝑏1 , 𝑏2 , . . , 𝑏𝑚 }. Let us assume that the points to those surrounding points is always 1, so the distance human will be oriented vertically in the frame, i.e., the normal from the other silhouette to those points cannot be larger than vector of the base that the user is standing on is vertical. This the original point’s distance increased by one. If this value is assumption eliminates the need of rotation transformation. less than already computed value for this point, this value is Two linear transformations need to be made: scale and shift. updated and the point is added to the queue. Distance is Say that the center point of set 𝐵 is 𝑏𝑚 . The shape will be calculated for all points this way. All points that do not belong scaled using this point as the center. Say that optimal scale to the other set are then removed because they do not coefficient is 𝑠, optimal shift transformation vector - 𝑡⃗. Then contribute to Hausdorff distance. Elements of this matrix show the transformed point set is the differences between the silhouettes. Two such matrices (𝑏1 −𝑏𝑚 +𝑡⃗) 𝐵𝑡 = {𝑏𝑡1 , 𝑏𝑡2 , … , 𝑏𝑡𝑛 } = 𝑏𝑚 + , 𝑏𝑚 + must be calculated, so the computation of them can be made 𝑠 (𝑏2 −𝑏𝑚 +𝑡⃗) (𝑏𝑛 −𝑏𝑚 +𝑡⃗) parallel – their computation is independent. , … , 𝑏𝑚 + (3) 𝑠 𝑠 To reduce the amount of calculation matrix edges without After the transformation set 𝐵𝑡 should maximize silhouette points are cut. The resulting cut matrix is minimum cardinality function rectangle matrix that contains all the points that belong to 𝑖(𝐴, 𝐵𝑡 ) = |𝐴 ∩ 𝐵𝑡 | (4) silhouette. Precise calculation of such function is computationally Common maximum of both matrices is Hausdorff distance. intensive, so heuristic algorithm is used instead. Algorithm pseudocode is given below. A fast and quite accurate heuristic for scale transformation matrices ← template matrix and user matrix is to compute scale coefficient as the ratio of heights of the two find common left upper corner of matrices shapes. If template and user are in similar positions, e.g., both find common right lower corner of matrices are standing, both silhouettes could be made the same height. ∀ matrix ∈ matrixes Then do initialize queue and ret ℎ𝐴 𝑠= , where (5) while queue ≠ ø ℎ𝐵 ℎ𝐴 = max 𝑎𝑦 − min 𝑎𝑦 , (6) do p ← take from queue 𝑎∈𝐴 𝑎∈𝐴 val ← get p value from ret where 𝑎𝑦 - coordinate in vertical axis. val ← val + 1 A fast to compute shift transformation algorithm is to shift ret ← min(r, val) ∀⁡adjacent point one of the two silhouettes using a vector that is equal to r to p difference between centroids of both shapes. Centroid is set points of res that do not belong to defined as ∑ 𝑎 ∑𝑎∈𝐴 𝑎𝑦 opposing matrix to 0 𝐴𝐶 = ( 𝑎∈𝐴 𝑥 ; ) (7) hausdorff distance = max(max(res[1]), 𝑛 𝑛 Then max(res2)) 34 IV. METHOD OF RESEARCH Two properties of the algorithm are evaluated: Algorithm performance performance and precision. comparison, distant silhouettes To evaluate performance different types of frames are set 14 up. The frames differ in size and distance between silhouettes. Used frame sizes are 10% to 300% “Kinect’s” frame size in 12 Relative execution time steps of 10%. For each frame size, real human silhouette is analyzed. Performance is measured when the human and the 10 template are in similar parts and in opposite parts of the frames. 8 Performance is compared to other existing algorithms’ performance. Each measure is run 10 times and execution time 6 average is found. Three modifications of the proposed algorithm are measured – Hausdorff distance only, Hausdorff 4 distance with transformation and Hausdorff distance with 2 cached transformation parameters. Algorithm should work in less than 30 ms, i.e., faster than “Kinect’ produces frames. 0 To evaluate precision different real human silhouettes are 0,1 0,3 0,5 0,7 0,9 1,1 1,3 1,5 1,7 1,9 2,1 2,3 2,5 2,7 2,9 used. Best possible transformation is found using brute force Frame edge size in Kinect's frames algorithm, Hausdorff distance is calculated using the optimal Yanir Taflev transformation, then Hausdorff distance is found using the Proposed algorithm transformation calculated using proposed algorithm and the Proposed algorithm with calculated transformation Proposed algorithm with cached transformation error is evaluated. RgResolve ElasticSearch V. RESEARCH RESULTS Figure 1. Algorithm performance comparison (distant Execution times of all algorithms and all frame sizes are silhouettes) shown in figure 1. Same silhouettes were used, but scaled. Silhouettes were around a third of the frame apart. The diagram reveals that “Elastic Search” algorithm is the Unit value in diagrams represents algorithm with cached slowest, “RgResolve” – better, Y. Taflev’s algorithm would be transformation performance when frame size is equal to applicable up to frame size of 0.6 “Kinect 2” frame size. This “Kinect’s” frame size. This time is 26 ms for distant silhouettes is close to “Kinect 1” frame size. Performance of proposed and 21 ms for close silhouettes on “Intel i7-3770K” processor algorithm varies and no clear dependency is visible, but (3.5 GHz). Maximum acceptable value on those scales are 1.15 execution times slowly increase with increasing frame size. It for distant silhouettes and 1.43 for close silhouettes. These could be caused by parallel nature of the algorithm. Threads values are hardware-dependent and presented for approximate are spawned by the runtime and operating system and evaluation. The processor used is a little more powerful than determination of their performance could be difficult to “Kinect’s” system requirements (i7 series processor, 3.1 GHz predict. Nevertheless, frame sizes of around 1.3 “Kinect’s” clock frequency). frame size and smaller are processed faster than in 1.15 relative units and 1.7 and less – in 2 relative units. Figure 2 shows the result of similar measurements, but the distance between the silhouettes was minimized. Other algorithms had little to no impact because of this change, but proposed algorithm performed better under those conditions. 1.5 relative units is reached at around 1.4-1.5 “Kinect’s” frame size, 2.5 – at around 1.8-2 times larger frames than “Kinect’s”. At round 3 “Kinect’s” frame sizes performance increase is around 1.5 times compared to distant silhouette test. 35 REFERENCES Algorithm performance [1]“Kinect API Overview,” [Online]. Available: comparison, close silhouettes https://msdn.microsoft.com/en-us/library/dn782033.aspx. [Accessed 10 03 2017]. [2] K. Ryselis, T. Petkus, “Nestandartinių žmogaus kūno pozicijų 14 atpažinimo tikslumo naudojant „Kinect 2.0“ jutiklius tyrimas,” Kaunas, 2015. 12 [3] D. P. Huttenlocher, G. A. Klanderman, W. J. Rucklidge, “Comparing Relative execution time Images using the Hausdorff Distance,” IEEE Transactions on Pattern 10 Analysis and Machine Intelligence, no. 15, pp. 850-863, 1993. [4] Y. Taflev, “Using the Hausdorff distance algorithm to point out 8 differences between two drawings,” 27 09 2009. [Online]. Available: https://www.codeproject.com/articles/42669/using-the-hausdorff- 6 distance-algorithm-to-point-ou. [Accessed 10 03 2017]. [5] V. Solutions, “elasticsearch-client/DiscreteHausdorffDistance,” [Online]. Available: https://github.com/jprante/elasticsearch- 4 client/blob/master/elasticsearch-client-jts- jdk5/src/main/java/com/vividsolutions/jts/algorithm/distance/Discre 2 teHausdorffDistance.java. [Accessed 10 03 2017]. [6] “Hausdorff Distance,” Princeton University, [Online]. Available: 0 http://www.princeton.edu/~rkatzwer/rgsolve/doc/edu/princeton/poly 0,1 0,3 0,5 0,7 0,9 1,1 1,3 1,5 1,7 1,9 2,1 2,3 2,5 2,7 2,9 gon/HausdorffDistance.html. [Accessed 12 03 2017]. Frame edge size in Kinect's frames Yanir Taflev Proposed algorithm Proposed algorithm with calculated transformation Proposed algorithm with cached transformation RgResolve ElasticSearch Figure 2. Algorithm performance comparison (close silhouettes) Algorithm precision was tested using different human silhouettes. Average error was 3.8%, minimum error – 0% (exact result), maximum error – 15.5%, median error – 2.7%. Data set size – 21 silhouettes. VI. CONCLUSIONS The proposed algorithm performs faster than existing open source alternatives to calculate Hausdorff distance, but high variation of execution speed is observed because of parallelism involved in the calculations. Despite that, the algorithm in the worst case works faster than in 30 ms with “Kinect 2” recommended hardware, therefore it is applicable for human motion tracking using “Kinect” sensors. Proposed algorithm is precise if applied for human motion tracking and proposed heuristic is worth applying, because it adds little to execution times and precision loss is low. Proposed algorithm works faster with silhouettes in the same part of frame than distant silhouettes. If this algorithm is applied for human motion tracking and recommended transformations are applied, this will always be the case and comparison will always be fast. Algorithm is well suited for human position tracking when the human does not change his position relative to sensor, only his pose changes, or the human moves together with the template silhouette. In this case proposed heuristics work well and near-precise value of minimum Hausdorff distance with Chebyshev metric is found in short time. 36