=Paper=
{{Paper
|id=Vol-2874/paper19
|storemode=property
|title=Tuning of Category Hierarchy Enhanced Classification Based Indoor Positioning
|pdfUrl=https://ceur-ws.org/Vol-2874/paper19.pdf
|volume=Vol-2874
|authors=Judit Tamás,Zsolt Tóth
}}
==Tuning of Category Hierarchy Enhanced Classification Based Indoor Positioning==
<pdf width="1500px">https://ceur-ws.org/Vol-2874/paper19.pdf</pdf>
<pre>
   Tuning of Category Hierarchy Enhanced
   Classification Based Indoor Positioning

                           Judit Tamás∗ , Zsolt Tóth

          Eszterházy Károly University, Faculty of Informatics, Eger, Hungary
                          tamas.judit@uni-eszterhazy.hu
                           toth.zsolt@uni-eszterhazy.hu

       Proceedings of the 1st Conference on Information Technology and Data Science
                           Debrecen, Hungary, November 6–8, 2020
                               published at http://ceur-ws.org


                                          Abstract

          The tuning of classification refinement using hierarchical grouping of cate-
      gories is presented in this paper. The refinement can improve the accuracy of
      classifiers in the case of low confidence level and it uses a classifier, a thresh-
      old and a dendrogram as parameters. For the examination, the 𝑘–NN and
      the Naive Bayes classifiers are used and the dendrogram will be generated
      by using linkage method and dissimilarity value of gravitational force-based
      approach on the topology information. The topology of the environment is
      described by IndoorGML (Indoor Geographic Markup Language) document.
      The data set for the classification is part of the Miskolc IIS (Institute of Infor-
      mation Science) Hybrid IPS (Indoor Positioning System) Data set recorded
      with the ILONA (Indoor Localization and Navigation) System. Three prop-
      erties are examined of a setup, namely hitRate, confidence and abstraction,
      however, they are conflicting. A fitness function is introduced using these
      properties for the purpose of tuning. In this paper, the different weight tu-
      ples are examined in the given test environment. The goal of the paper is to
      examine the weighting possibilities of the hitRate, confidence, and abstraction
      level features for indoor positioning purposes.
      Keywords: Classification, hierarchical clustering

Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
   ∗ The first author’s research was supported by the grant EFOP-3.6.1-16-2016-00001 (“Complex

improvement of research capacities and services at Eszterhazy Karoly University”).


                                             207
1. Introduction
These days people dependent on technology, our life has become unimaginable
without high-tech tools and gadgets. We highly rely on navigation, which gives
us turn-by-turn directions, traffic congestion information, and alternative routes to
a given location. The demand arisen to use navigation in complex buildings like
airports, railway stations or hospitals. However, classic Global Positioning Systems
do not work in indoor spaces. As a result, Indoor Positioning Systems (IPS) are
introduced.
    Indoor Positioning Systems can be used to determine the position of people or
objects in buildings and closed areas. IPS has been considered as an active research
field since the early 1990s, and these systems are detailed in the following surveys
[3, 6]. The existing indoor positioning solutions rely on different technologies such
as Infrared [18], ultrasonic [19], magnetic field [9], mobile communication [17], LED
[5] or other radio frequency [8, 20, 21] signals.
    Indoor positioning is challenging due to the unique properties of the indoor
environment. Developers have to make trade-offs between accuracy and cost when
they choose a technology. Currently, indoor positioning is vital for smart environ-
ments. However, a sufficiently precise, easily accessible, and sustainable industrial
standard has not been created yet.
    Symbolic positions can be considered as a category, thus the symbolic posi-
tioning can be converted into a classification problem. Some well-known classifier
accept classes as prediction based on the confidence values. There are some cases
when the confidence for each class is relatively small. Hence, the accuracy of these
classifiers can vary in a moderate range.
    For symbolic indoor positioning purposes, a classification refinement using hi-
erarchical grouping of categories had been proposed [12]. Three properties can
be established on the proposed method examined, namely hitRate, confidence and
abstraction. However, these properties are conflicting, for example, the increment
of the hitRate property stimulates the method to return all of the rooms as the
result, producing a low abstraction level. Tuning is required to find the balance
of these properties to improve the enhancement of the classification based indoor
positioning. The goal of the paper is to examine the weighting possibilities of the
hitRate, confidence, and abstraction level features for indoor positioning purposes.


2. Enhanced Classification Concept
To boost the performance of the classification, a hierarchical grouping of class cat-
egories was introduced [12]. Using hierarchical clustering information of symbolic
positions, the accuracy of symbolic indoor positioning algorithms can be improved
in case of a low confidence level.
    The concept of enhanced classification requires parameters, namely the classi-
fier, the threshold and the dendrogram. The classifier is a method for supervised
learning based on the training set and data set, where the target is a discrete at-

                                        208
tribute. The threshold is a real value between 0 and 1, which determines whether
the prediction is accepted or the proposed concept is used. If the confidence value
of the predicted class is equal to or higher than the threshold, the classifier method
returns with the class. The dendrogram can be predefined by a linkage matrix or
it is produced by linkage [1] and distance methods parameters from the topology
information.
    The tree structure generated by the hierarchical clustering can be seen in Fig-
ure 1. The leaf nodes are the rooms denoted by the uuid, while the root node is
the whole described environment.


                         Figure 1. Concept base structure.


The following process of the enhancement concept is performed.

  1. The prediction is performed with the classifier.

  2. If the confidence of the predicted class is equal to or higher than the threshold,
     the process terminates by returning the class as the result.

  3. The leaf node in the tree is located using the uuid.

  4. Until the confidence of the current node is not reaching the threshold or the
     root node is reached.

       (a) The parent of this node is selected for examination.
      (b) Its confidence is calculated as the sum of the confidence values of its
          descendant leaf nodes.

  5. The process terminating by returning the contained zones of the lastly exam-
     ined node.


3. Test and Environment
The concept of enhanced classification requires parameters, namely the classifier,
the threshold and the dendrogram. In the experiment, the 𝑘–NN and the Naive

                                         209
Bayes classifiers are used to the available functionality to return the class prob-
abilities. These classifiers are instance-based classifier, well-known and easy to
parameterize. The 𝑘–NNW denotes the weighted vote version of the 𝑘–NN classi-
fier in this paper. The threshold is noted as 𝑇 𝐻, and 𝑇 𝐻 ∈ {0.6, 0.7, 0.8, 0.9, 1}.
In the experiment the dendrograms are generated by using linkage methods and
dissimilarity value of gravitational force-based approach [10, 11, 14] on the topol-
ogy information. The linkage methods in the experiment are average, complete,
single and weighted, and each linkage method is performed for each classifier and
threshold value. The gravitational force-based approach is defined in our previous
work, it is designed to be used for indoor positioning.
    The Miskolc IIS Hybrid IPS Data Set [7, 16] was used to perform the classifica-
tion. The data set had been recorded in the Miskolc IIS Building of the University
of Miskolc using the ILONA System [13, 15, 22]. Each measurement is composed by
three parts, namely the measurement information, the position information and the
measured sensor values. The ID and the timestamp of the measurements is stored
as the measurement information. Position information part contains both absolute
position with 𝑥, 𝑦, 𝑧 coordinates, and symbolic position with uuid and name pairs.
Sensor information from WiFi, Bluetooth and Magnetometer are included in the
measurements. For the classification process, the measured sensor information is
the features, while the uuid of the symbolic position is the target.
    The topology of the building had been described using IndoorGML [2, 4], which
is used to generate the dendrograms. IndoorGML is a standard defined by the
Open Geospatial Consortium (OGC) [4], and it represents the indoor spaces as
non-overlapping closed objects. The indoor spaces are bounded by physical or
fictional boundaries. For each indoor space, the identifier is chosen to be derived
from the corresponding space of Miskolc IIS Hybrid Data set.
    To narrow the scope of the experiment, the environment is chosen to be the
second floor of the Miskolc IIS Building. Hence the used data set is also narrowed
to 431 measurements. From the narrowed data set, the training and the test set are
constructed by using stratified sampling with 0.9 and 0.1 ratio. The training and
the test sets are fixed during the test. The environment contains 20 zones, and it
can be seen in Figure 2. It can represent a general building with narrow corridors,
a huge room, which is a lecture hall in this environment, and small office rooms.
However, the Miskolc IIS Hybrid Dataset contains measurements taken in only 5
of these rooms, namely the East Corridor, West Corridor and North Corridor, the
Lobby and the Lecture Hall 205.
    Three properties are examined of a setup, namely hitRate, confidence and ab-
straction. It is the rate of the correctly classified cases and all the cases to represent
the accuracy. Hence, the hitRate is a real number in the [0, 1] interval. The goal
function is to maximize the hitRate.
    Confidence is a real value between the threshold and 1, including both value,
which represents the accepted confidence of the result. The goal function is to
maximize the confidence values.
    To minimize the size of the resulted list, the abstraction feature is introduced.


                                           210
                 Figure 2. Second floor of the Miskolc IIS Building.


However, to be consistent with the goal functions of the hitRate and the confidence,
the goal for the abstraction should also be a maximization. To eliminate the number
of rooms from the property, the level of abstraction is designed to be a real number
in the [0, 1] range. Equation (3.1) shows the calculation of abstraction level based
on the set size, where 𝑎 is the set size, 𝑛 is the number of classes and 𝑎    ˆ is the
normalized abstraction level. In case the set size is 1, the abstraction level is 1,
while the highest possible set size results in 0 as abstraction level.
                                            𝑎−1
                                   ˆ =1−
                                   𝑎                                             (3.1)
                                            𝑛−1
    However, when the increment of the hitRate is focused on, the method can
return all of the rooms as the result, producing a low abstraction level. In addition,
when higher confidence values are aimed at increased threshold, the abstraction
level can decrease. Therefore, the goal of the method cannot be based on only one
of these properties. Tuning is required to find the balance of these properties to
improve the enhancement of the classification based indoor positioning.

                                         211
    A fitness function is introduced using these properties for the purpose of tun-
ing. The introduced fitness function assigns a non-negative weight to each property,
where the sum of the weights is 1. The goal of the fitness function is to be maxi-
mized.
             fitness = 𝑤ℎ · hitRate + 𝑤𝑐 · confidence + 𝑤𝑎 · abstraction


4. Results
The results are stored in a csv file for further processing, the schema can be seen
in Table 1. The result contains 1688 rows, where the method, the 𝑇 𝐻, the linkage
method and the weights define a setup.

                       Table 1. Classification results schema.

   Method     TH    Hit    Abstraction    Confidence     𝑤ℎ      𝑤𝑎   𝑤𝑐   Fitness

    Among the 1688 setup cases, 756 cases resulted in the highest fitness value
in the experiment, and the focus is on these setups. The statistics of the three
properties for each classifier can be seen in Table 2.

                           Table 2. Best Fitness Setups.

        method            Average Hit    Confidence     Threshold     Count
        1nn               0.89           1              0.8           180
        1nnW              0.89           1              0.8           180
        5nn               1              1              0.95          72
        5nnW              1              1              0.95          72
        9nn               1              1              0.95          72
        9nnW              1              1              1             36
        11nn              1              1              1             36
        11nnW             1              1              1             36
        13nn              1              1              1             36
        13nnW             1              1              1             36
        Total Result      0.95           1               0.89         756

    As it can be seen in Table 2, the 1𝑛𝑛 and the 1𝑛𝑛𝑊 classifiers are the most
frequent, while setups using the Naive Bayes classifier is not presented. The 5𝑛𝑛,
the 5𝑛𝑛𝑊 and the 9𝑛𝑛 classifiers are presented mostly after the 1𝑛𝑛. The average
hitRate is 0.95, the average confidence is 1 and the average threshold is 0.89, and
these values are not affected by the used linkage method. However, the average
abstraction varied with different linkage method, which is shown in Table 3.

                                         212
                           Table 3. Average abstraction.

                      average    complete       single   weighted    Total
            1nn       1.00       1.00           1.00     1.00        1.00
            1nnW      1.00       1.00           1.00     1.00        1.00
            5nn       0.78       0.74           0.80     0.82        0.78
            5nnW      0.78       0.74           0.80     0.82        0.78
            9nn       0.75       0.71           0.77     0.78        0.75
            9nnW      0.75       0.71           0.77     0.78        0.75
            11nn      0.73       0.68           0.75     0.77        0.73
            11nnW     0.73       0.68           0.75     0.77        0.73
            13nn      0.73       0.67           0.75     0.77        0.73
            13nnW     0.73       0.67           0.75     0.77        0.73
            Total     0.87       0.85           0.88     0.89        0.87


    As Table 3 shows, the average abstraction of 1𝑛𝑛 and 1𝑛𝑛𝑊 classifiers are
obviously 1, while the second best value is in the case of 5𝑛𝑛 and 5𝑛𝑛𝑊 with 0.78.
In the point of view of the linkage method, the weighted linkage method resulted in
0.89 average abstraction, while the last in the order is complete linkage with 0.85.
The overall average abstraction of the highest fitness valued cases is 0.87.
    The distribution of the average hit among the best cases according to the thresh-
old values can be seen in Table 4.
                Table 4. Best cases- Average Hit Based on Threshold.

                                    0,6     0,7    0,8   0,9   1,0
                     1nn            0,9     0,9    0,9   0,9   0,9
                     1nnW           0,9     0,9    0,9   0,9   0,9
                     5nn                                 1,0   1,0
                     5nnW                                1,0   1,0
                     9nn                                 1,0   1,0
                     9nnW                                      1,0
                     11nn                                      1,0
                     11nnW                                     1,0
                     13nn                                      1,0
                     13nnW                                     1,0
                     Total Result   0,9     0,9    0,9   1,0   1,0

   It can be seen from the data in Table 4 that until 0.9 threshold, classifiers
could not reach the highest presented fitness value with the exception of 1𝑛𝑛 and

                                          213
1𝑛𝑛𝑊 . With 0.9 threshold, the 5𝑛𝑛, the 5𝑛𝑛𝑊 and the 9𝑛𝑛 classifiers could reach
1 average hit value. With the 1 threshold, only the 1𝑛𝑛 and the 1𝑛𝑛𝑊 classifier
could not reach 1 average hit value.
   In the point of view of the weights, the statistic made of the cases with the best
presented fitness value can be illustrated in Table 5.


                            Table 5. Statistic of weights.

               Hit weight             Confidence weight      Abstraction weight
               AVG Min        Max     AVG Min Max            AVG Min Max
    1nn        0      0       0       0.5    0.1   0.9       0.5    0.1   0.9
    1nnW       0      0       0       0.5    0.1   0.9       0.5    0.1   0.9
    5nn        0.5    0.1     0.9     0.5    0.1   0.9       0      0     0
    5nnW       0.5    0.1     0.9     0.5    0.1   0.9       0      0     0
    9nn        0.5    0.1     0.9     0.5    0.1   0.9       0      0     0
    9nnW       0.5    0.1     0.9     0.5    0.1   0.9       0      0     0
    11nn       0.5    0.1     0.9     0.5    0.1   0.9       0      0     0
    11nnW      0.5    0.1     0.9     0.5    0.1   0.9       0      0     0
    13nn       0.5    0.1     0.9     0.5    0.1   0.9       0      0     0
    13nnW      0.5    0.1     0.9     0.5    0.1   0.9       0      0     0
    Total      0.26 0         0.9     0.5    0.1   0.9       0.24 0       0.9


    From Table 5 we can see that the average weight of the hit and the abstraction
are similar with 0.26 and 0.24 value, while the average weight of confidence is the
double with 0.5. While both hit and abstraction could be eliminated from the
fitness value calculation in some cases, the weight of the confidence is at least 0.1.
In the case of 1𝑛𝑛 and 1𝑛𝑛𝑊 , the hit is completely eliminated, while in the other
classifiers resulted in the best fitness value presented eliminated the abstraction
property.
    The fitness value of the most frequent weights is presented according to the
threshold and the classifier using single linkage can be seen in Figure 3. The
weight for the hit and the confidence is 0.5, while the abstraction is eliminated.
    As shown in Figure 3, using 0.6 threshold, the Naive Bayes, the 1𝑛𝑛 and the
1𝑛𝑛𝑊 classifiers have the highest fitness value. When the threshold is increased
by 0.1, the 3𝑛𝑛, the 3𝑛𝑛𝑊 take the lead, and 9𝑛𝑛 and 9𝑛𝑛𝑊 classifiers are also
surpass the previous fitness value. However, by further increasing the threshold,
the 1𝑛𝑛, the 1𝑛𝑛𝑊 , the 3𝑛𝑛, the 3𝑛𝑛𝑊 and the Naive Bayes classifier could not
reach the highest fitness value presented. The other classifiers have increment in
their fitness value while the threshold is increased. With 0.9 threshold, the 5𝑛𝑛,
the 5𝑛𝑛𝑊 , and 9𝑛𝑛 classifiers could reach the highest fitness value, while the other
classifiers could reach this fitness value using 1 as threshold.

                                         214
                Figure 3. Single linkage with most frequent weights.


4.1. Discussion
We can make observation based on three point-of-view, namely the classifiers, the
threshold and the weights.
    The 1𝑛𝑛 and the 1𝑛𝑛𝑊 classifiers occurred most frequently among the cases
with the best fitness value presented. Contrary to the Naive Bayes classifier, which
could not reach this value with any of its setups. However, the 5𝑛𝑛, the 5𝑛𝑛𝑊
and the 9𝑛𝑛 classifiers were the second most frequent in the narrowed result set.
    With given weights, the 5𝑛𝑛, the 5𝑛𝑛𝑊 and the 9𝑛𝑛 could reach the highest
fitness value with 0.9 threshold. With lower threshold, there were cases when the
1𝑛𝑛 and the 1𝑛𝑛𝑊 classifiers could reach the highest fitness value, however the
average hit rate in this cases is 0.9. Using 1 as threshold, every other classifier
presented in the best fitness valued setups reached 1 as average hit.
    In most of the cases, only two of the three property is considered when calcu-
lating the fitness value. The 1𝑛𝑛 and the 1𝑛𝑛𝑊 classifiers neglect the hit property,
while the other classifiers neglect the abstraction property. However, the weights
of other two properties are equals, thus they are equally important.


5. Conclusion
The tuning of classification refinement using hierarchical grouping of categories
is presented in this paper. For the examination, the 𝑘–NN and the Naive Bayes
classifiers were used and the dendrogram was generated by using linkage method
and dissimilarity value of gravitational force-based approach on the topology in-
formation. A linear fitness function was introduced using these properties for the

                                        215
purpose of tuning.
    The investigation of the fitness function shows that instead of three properties,
the setups with the highest fitness value neglect one of the properties. The other
two properties were proved equally important in the cases. However, a tested
classifier could not reach the highest fitness value with any of its setups. This
research has thrown up many questions in need of further investigation.
    In the future, the category hierarchy enhanced classification based indoor posi-
tioning concept is planned to be examined in two ways. First is the expansion of
the test environment in three dimension, which helps to test the concept in multi-
floored environment. The second way is the modification of the fitness function by
using non-linear elements.


References
 [1] R. K. Blashfield, M. S. Aldenderfer: The literature on cluster analysis, Multivariate
     Behavioral Research 13.3 (1978), pp. 271–295.
 [2] K. Ilku, J. Tamas: IndoorGML Modeling: A Case Study, in: Carpathian Control Conference
     (ICCC), 2018 19th International, IEEE, 2018, pp. 633–638.
 [3] H. Koyuncu, S. H. Yang: A survey of indoor positioning and object locating systems, IJC-
     SNS International Journal of Computer Science and Network Security 10.5 (2010), pp. 121–
     128.
 [4] J. Lee, K.-J. Li, S. Zlatanova, T. Kolbe, C. Nagel, T. Becker: OGC® indoorgml,
     Open Geospatial Consortium standard (2014).
 [5] L. Li, P. Hu, C. Peng, G. Shen, F. Zhao: Epsilon: A Visible Light Based Positioning
     System. In: NSDI, 2014, pp. 331–343.
 [6] H. Liu, H. Darabi, P. Banerjee, J. Liu: Survey of wireless indoor positioning techniques
     and systems, Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE
     Transactions on 37.6 (2007), pp. 1067–1080.
 [7]    Miskolc IIS Hybrid IPS Data Set, http://archive.ics.uci.edu/ml/datasets/Miskolc+
       IIS+Hybrid+IPS, [Online; Date donated 04-July-2016].
 [8] L. M. Ni, Y. Liu, Y. C. Lau, A. P. Patil: LANDMARC: indoor location sensing using
     active RFID, Wireless networks 10.6 (2004), pp. 701–710.
 [9] S. Särkkä, V. Tolvanen, J. Kannala, E. Rahtu: Adaptive Kalman filtering and smooth-
     ing for gravitation tracking in mobile systems (Oct. 2015), pp. 1–7.
[10] J. Tamas, Z. Toth: Topology-based Evaluation for Symbolic Indoor Positioning Algorithms,
     IEEE Transactions on Industry Applications (2019), pp. 1–1, issn: 0093-9994,
     doi: 10.1109/TIA.2019.2928489.
[11] J. Tamas: Hierarchical Clustering based on IndoorGML Document, in: 2019 IEEE 15th In-
     ternational Scientific Conference on Informatics (INFORMATICS 2019), IEEE, 2019, pp. 411–
     416.
[12] J. Tamas, Z. Toth: Classification Refinement with Category Hierarchy, in: The 11th Inter-
     national Conference on Applied Informatics (ICAI 2020), published at http://ceur-ws.org,
     2020, pp. 358–369.
[13] J. Tamas, Z. Toth: Limitation of CRISP accuracy for evaluation of room-level indoor
     positioning methods, in: 2018 IEEE International Conference on Future IoT Technologies
     (Future IoT), Jan. 2018, pp. 1–6,
     doi: 10.1109/FIOT.2018.8325585.


                                             216
[14] J. Tamas, Z. Toth: Topology-based Classification Error Calculation for Symbolic Indoor
     Positioning, in: Carpathian Control Conference (ICCC), 2018 19th International, IEEE,
     2018, pp. 643–648.
[15] Z. Toth: ILONA: indoor localization and navigation system, Journal of Location Based
     Services 10.4 (2016), pp. 285–302,
     doi: 10.1080/17489725.2017.1283453, eprint: http://dx.doi.org/10.1080/17489725.
     2017.1283453,
     url: http://dx.doi.org/10.1080/17489725.2017.1283453.
[16] Z. Toth, J. Tamas: Miskolc IIS hybrid IPS: Dataset for hybrid indoor positioning, in: 2016
     26th International Conference Radioelektronika (RADIOELEKTRONIKA), IEEE, Kosice,
     Slovakia, Apr. 2016, pp. 408–412.
[17] S. Wang, M. Green, M. Malkawa: E-911 location standards and location commercial
     services, in: Emerging Technologies Symposium: Broadband, Wireless Internet Access, 2000
     IEEE, IEEE, Richardson,TX, USA, Apr. 2000, 5–pp.
[18] R. Want, A. Hopper: Active badges and personal interactive computing objects, Consumer
     Electronics, IEEE Transactions on 38.1 (1992), pp. 10–20.
[19] A. Ward, A. Jones, A. Hopper: A new location technique for the active office, Personal
     Communications, IEEE 4.5 (1997), pp. 42–47.
[20] Z. Weissman: Indoor location, White paper, Tadlys Ltd (2004).
[21] M. Youssef, A. Agrawala: The Horus WLAN location determination system, in: Pro-
     ceedings of the 3rd international conference on Mobile systems, applications, and services,
     ACM, Seattle, WA, USA, June 2005, pp. 205–218.
[22] T. Zsolt, M. Peter, N. Richard, T. Judit: Data Model for Hybrid Indoor Position-
     ing Systems, PRODUCTION SYSTEMS AND INFORMATION ENGINEERING 7 (2015),
     pp. 67–80.


                                             217

</pre>