=Paper= {{Paper |id=Vol-3360/p09 |storemode=property |title=A Simple Approach to Weather Predictions by using Naive Bayes Classifiers |pdfUrl=https://ceur-ws.org/Vol-3360/p09.pdf |volume=Vol-3360 |authors=Agnieszka Lutecka,Zuzanna Radosz |dblpUrl=https://dblp.org/rec/conf/system/LuteckaR22 }} ==A Simple Approach to Weather Predictions by using Naive Bayes Classifiers== https://ceur-ws.org/Vol-3360/p09.pdf
A Simple Approach to Weather Predictions by using Naive
Bayes Classifiers
Agnieszka Lutecka1 , Zuzanna Radosz1
1
    Silesian University Of Technology, Faculty of Applied Mathematics, Kaszubska 23, 44-100 Gliwice, Poland


                                             Abstract
                                             This article presents and explains how we have used Naive Bayes Classifier and date base to predict the weather. We compare
                                             the dependence of accuracy on the probability of different distributions for various types of data.

                                             Keywords
                                             Naive Bayes Classifier, probability distribution, Python



1. Introduction                                                                                                                             • 𝑃 (𝐵|𝐴) is the probability that we would observe
                                                                                                                                              such data if the hypothesis were true
Many IT systems use the widely understood artificial                                                                                        • 𝑃 (𝐴) is the a priori probability that the hypothe-
intelligence [1, 2]. Artificial intelligence methods also                                                                                     sis is true
apply to the data processing [3, 4, 5]. The wide appli-                                                                                     • 𝑃 (𝐵) is the probability of the observed data
cation of artificial intelligence also applies to systems
installed in cars, which are used, for example, to detect                                                                                Importantly, the naive Bayes classifier assumes that
the quality of the surface [6]. Many technical problems                                                                                the influence of the attributes is independent of each
lead to the optimization tasks [7, 8, 9], where the com-                                                                               other, therefore 𝑃 (𝐵|𝐴) can be written as
plexity of the [10, 11, 12] functional is a big challenge.                                                                                     𝑃 (𝐵1 |𝐴) * 𝑃 (𝐵2 |𝐴) * ... * 𝑃 (𝐵𝑛 |𝐴).       (2)
Optimization processes concern many different areas of                                                                                 The naivety of this classifier follows from the above as-
life require constant search for new, more effective op-                                                                               sumption.
timization methods [13, 14] based on the observation
of nature [15, 16, 17]. A very important and important
branch of artificial intelligence are the applications of                                                                              3. Description of how the
broadly understood [18] neural networks. Interesting
applications concern health protection [19], adult care
                                                                                                                                          program works
[20, 21]. Artificial intelligence methods are also used                                                                                We started the project with an analysis of data from the
for weather forecasting [22, 23, 24], as well as for the                                                                               database. Initially, we shuffled and normalized the data
detection of features [25, 26, 27].                                                                                                    to the 0 − 1 range to operate on smaller numbers, thus
                                                                                                                                       increasing the performance of our program. After di-
2. Program description                                                                                                                 viding into validation and training sets, we move on to
                                                                                                                                       the main part of our program, i.e. the use of the naive
The task of our program is to predict the weather using                                                                                Bayesian classifier. Its task is to return the name of the
the naive Bayes classifier. It is especially suitable for                                                                              most probable weather for a given sample.
problems with a lot of input data, so it is perfect for our                                                                            This algorithm uses a probability distribution defined by
project. It uses a conditional probability formula that                                                                                a density function that describes how the probability of
looks like this:                                                                                                                       a random variable (x) is distributed. We implemented 5
                                                                                                                                       different probability distributions to compare the algo-
                                                                       𝑃 (𝐵|𝐴) * 𝑃 (𝐴)                                                 rithm’s efficiency for different probability density formu-
                                                𝑃 (𝐴|𝐵) =                                                                   (1)
                                                                            𝑃 (𝐵)                                                      las. We use the following distributions: Gauss, Laplace,
                                                                                                                                       log-normal, uniform, triangular.
                         • 𝐴 is our hypothesis
                         • 𝐵 is the observed data (attribute values)
                                                                                                                                       3.1. Gaussian distribution
SYSYEM 2022: 8th Scholar’s Yearly Symposium of Technology, Engi-                                                                       It is one of the most important probability distributions,
neering and Mathematics, Brunek, July 23, 2022                                                                                         playing an important role in statistics. The formula for
" agnilut814@polsl.pl (A. Lutecka); zuzarad785@polsl.pl                                                                                the probability density is as follows:
(Z. Radosz)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                                                                                                                                         1     −(𝑥 − 𝜇)2
    CEUR
                                       Attribution 4.0 International (CC BY 4.0).
                                       CEUR Workshop Proceedings (CEUR-WS.org)                                                                          √ exp(           )                    (3)
                                                                                                                                                                 2𝜎 2
                  http://ceur-ws.org
    Workshop


                                                                                                                                                       𝜎 2𝜋
                  ISSN 1613-0073
    Proceedings




                                                                                                                                  64
Agnieszka Lutecka et al. CEUR Workshop Proceedings                                                                      64–75



The probability function plot of this distribution is a bell-    Where 𝜇 is the mean value and 𝜎 is the standard devi-
shaped curve (the so-called bell curve).                      ation.
                                                                 The density function graph is as follows:




Figure 1: Graph of the probability density function
Source: Wikipedia.org

                                                                  Figure 3: Graph of the probability density function
  Where 𝜇 is the expected or mean value and 𝜎 stan- Source: Wikipedia.org
dard deviation. The red line corresponds to the standard
normal distribution.

3.2. Laplace Distribution                                         3.4. Triangular Distribution
It’s a continuous probability distribution named after            It is a continuous probability distribution of a random
Pierre Laplace. The probability density is given by the           variable. The probability density of a triangular distribu-
formula:                                                          tion can also be expressed as:
                   1       −|𝑥 − 𝜇|
                      exp(            )               (4)                                   √
                                                                             0 dla 𝑥 < 𝜇 − 6𝜎
                                                                            ⎧
                  2𝑏           𝑏                                            ⎪
                                                                            ⎪                    √
Where 𝜇 is the expected value, i.e. the mean, and b> 0 is                          + √16𝜎 dla 𝜇 − 6𝜎 ≤ 𝑥 ≤ 𝜇
                                                                            ⎪ 𝑥−𝜇
                                                                            ⎨
                                                                              6𝜎 2                        √
the scale parameter. The function graph looks like this:            𝑓 (𝑥) =
                                                                            ⎪− 𝑥−𝜇 2 +
                                                                                       √1 dla 𝜇 ≤ 𝑥 ≤ 𝜇 +  6𝜎
                                                                            ⎩ 6𝜎         6𝜎 √
                                                                            ⎪
                                                                            ⎪
                                                                             0 dla 𝑥 > 𝜇 + 6𝜎
                                                                                                                       (6)
                                                                     Where 𝜇 is the mean value and 𝜎 is the standard devi-
                                                                  ation.
                                                                     The function graph looks like this:




Figure 2: Graph of the probability density function
Source: Wikipedia.org




3.3. Log Normal Distribution                                      Figure 4: Graph of the probability density function
                                                                  Source: Wikipedia.org
It is the continuous probability distribution of a positive
random variable whose logarithm is normally distributed.
Pattern:
           1        −(𝑙𝑛𝑥 − 𝜇)2                                   3.5. Uniform distribution
         √     exp(             ) * 1(0, ∞)            (5)
          2𝜋𝜎𝑥          2𝜎 2                                      It is a continuous probability distribution for which the
                                                                  probability density in the range from a to b is constant



                                                             65
Agnieszka Lutecka et al. CEUR Workshop Proceedings                                                              64–75



and not equal to zero, and otherwise equal to zero. We column j, sr [j] - the mean value of the column j and
can see it in the formula below                        std [j] - standard deviation of the values from column j.
                            √                          These methods process the input data through the formu-
           ⎨0 dla 𝑥 < 𝜇 −√ 3𝜎
           ⎧
           ⎪
                                           √           las for the probability distribution and return the value
 𝑓 (𝑥) = 2√13𝜎 dla 𝜇 − 3𝜎 ≤ 𝑥 ≤ 𝜇 + 3𝜎              (7)of the probability density function at the point sample
                            √
             0 dla 𝑥 > 𝜇 + 3𝜎                          [j], that is, the probability of sample [j] occurring under
           ⎪
           ⎩
                                                       the conditions sr [j] and st [j]. Finally, the algorithm
Where 𝜇 is the mean value and 𝜎 is the standard devia- returns the name of the weather most likely to occur at
tion.                                                  the sample input.
   The function graph is as follows:                      Each probability distribution has a differently defined
                                                       density function. Therefore, the distributions may differ
                                                       in the results. Below we present the pseudocode of Naive-
                                                       Classifier class methods with an emphasis on processing
                                                       the input data by probability distributions.

                                                                Data: Input 𝑑𝑎𝑡𝑎, 𝑠𝑎𝑚𝑝𝑙𝑒, 𝑛𝑎𝑚𝑒
                                                                Result: Weather Name
                                                                Extract weather records ;
                                                                Enter the weather record sets into the list 𝑛𝑎𝑚𝑒𝑠;
                                                                𝑖 := 0;
                                                                for 𝑖 < 𝑙𝑒𝑛(𝑛𝑎𝑚𝑒𝑠) do
Figure 5: Graph of the probability density function                 𝑡𝑟 = [];
Source: Wikipedia.org                                               𝑗 := 1;
                                                                    for 𝑗 < 7 do
                                                                        Calculate the mean value of the column j ;
                                                                        Calculate the column standard deviation j
3.6. Select Distributions                                                ;
                                                                        if 𝑛𝑎𝑚𝑒 == 𝑙𝑎𝑝𝑙𝑎𝑐𝑒′ 𝑎 then
As we can see, the formulas differ significantly, which                      tr.append(NaiveClassifier.laplace
will definitely have a big impact on the effectiveness                         (sample[j],sr[j-1]));
of the program. We tried to choose such formulas for                    end
the probability density so that the values were not very                if 𝑛𝑎𝑚𝑒 == 𝑙𝑜𝑔 − 𝑛𝑜𝑟𝑚𝑎𝑙𝑛𝑦 then
divergent, as we will present in the next paragraphs.                        tr.append(NaiveClassifier.logarytmiczny
                                                                               (sample[j],std[j-1],sr[j-1]));
4. Algorithm                                                            end
                                                                        if 𝑛𝑎𝑚𝑒 == 𝑗𝑒𝑑𝑛𝑜𝑠𝑡𝑎𝑗𝑛𝑦 then
The naive Bayes classifier uses probability density func-                    tr.append(NaiveClassifier.jednostajny
tions to compute the probability of a given start con-                         (sample[j],std[j-1],sr[j-1]));
dition. The NaiveClassifier class has 6 static methods:                 end
„laplace”, „logarytmiczny”, „jednastajny”, „trojkatny”,                 if 𝑛𝑎𝑚𝑒 == 𝑡𝑟𝑜𝑗𝑘𝑎𝑡𝑛𝑦 then
„gauss”, „bayes”, where the first 5 are different proba-                     tr.append(NaiveClassifier.trojkatny
bility distributions . We use as many as 5 to compare                          (sample[j],std[j-1],sr[j-1]));
the algorithm’s effectiveness for different formulas on                 end
the probability density. The "bayes" method accepts the                 if 𝑛𝑎𝑚𝑒 == 𝑔𝑎𝑢𝑠𝑠 then
following input data: data – training set, sample – a set of                 tr.append(NaiveClassifier.gauss
values in the range 0-1 that represent successive database                     (sample[j],std[j-1],sr[j-1]));
columns, name – the name of the probability distribution                end
(Gauss, Laplace, log-normal, uniform, triangular). At the           end
beginning, the algorithm extracts the records with the              Return the probability of the given weather;
given weather. Then, using the loops, the program goes          end
through all the records of the training set, calculating        Return the name of the most likely weather;
the mean values and standard deviation of each column                   Algorithm 1: Bayes algorithm
for each type of weather. Using the given "name", the
algorithm calls the appropriate method, where the input
data is: sample [j] - where j is the sample value for the



                                                           66
Agnieszka Lutecka et al. CEUR Workshop Proceedings                                                               64–75



  Data: Input 𝑥, 𝑚𝑒𝑎𝑛                                           Data: Input 𝑥, 𝑠𝑡𝑑,𝑠𝑟
  Result: The value of the density function at x                Result: The value of the density function at x
  𝑏 := 2;                                                       return (1/(dev*np.sqrt(2*np.pi)))*np.exp(-((x-
  return                                                         mean)**2)/(2*dev**2));
   ((1/(2*b))*(math.exp(-(math.fabs(x-mean)/b))));
         Algorithm 2: Laplace’s algorithm                               Algorithm 6: Gauss algorithm

  Data: Input 𝑥, 𝑠𝑡𝑑,𝑠𝑟
  Result: The value of the density function at x              5. Databases
  if 𝑥 > 0 then
      return (1/((math.pi*2)**(1/2)*std*x))*math.exp(-        5.1. Database Analysis
       ((math.log(x)-sr)**2)/(2*std**2));
                                                              For our project, we use the Istanbul Weather Data
                                                              database, downloaded from the kaggle website. The
  end                                                         database has 3896 records. It contains the following
  else                                                        data columns: DateTime, Condition, Rain, MaxTemp,
      return 0;                                               MinTemp, SunRise, SunSet, MoonRise, MoonSet, Avg-
  end                                                         Wind, AvgHumidity, AvgPressure.
       Algorithm 3: Logarithmic algorithm

  Data: Input 𝑥, 𝑠𝑡𝑑,𝑠𝑟
  Result: The value of the density function at x
  if 𝑥 < 𝑠𝑟 − 3 * *(1/2) * 𝑠𝑡𝑑 then
      return 0;
  end
  if 𝑥 >= 𝑠𝑟 − 3 * *(1/2) * 𝑠𝑡𝑑 && 𝑥 <=
   𝑠𝑟 + 3 * *(1/2) * 𝑠𝑡𝑑 then
      return 1/(2*(3**(1/2))*std);
  end
  if 𝑥 > 𝑠𝑟 + 3 * *(1/2) * 𝑠𝑡𝑑 then
      return 0;
  end
  else
      return 0;                                               Figure 6: Data types in each column
  end
         Algorithm 4: Uniform algorithm
                                                                 We analyzed the data using a matrix graph that shows
                                                              the relationships between weather variables.
  Data: Input 𝑥, 𝑠𝑡𝑑,𝑠𝑟
                                                                 The chart is dominated by warm colors, which means
  Result: The value of the density function at x
                                                              that most of the records in our database are sunny and
  if 𝑥 < 𝑠𝑟 − 6 * *(1/2) * 𝑠𝑡𝑑 then
                                                              slightly cloudy.
      return 0;
                                                                 It can be seen that the data is presented in compact
  end
                                                              groups. This means that the parameters for different
  if 𝑥 >= 𝑠𝑟 − 6 * *(1/2) && 𝑥 <= 𝑠𝑟 then
                                                              weather conditions do not differ much from the others.
      return (x-sr)/(6*std**2)+1/(6**(1/2)*std);
                                                              This can make our algorithm that determines the weather
  end
                                                              based on these parameters not very accurate. There
  if 𝑥 > 𝑠𝑟𝑎𝑛𝑑𝑥 <= 𝑠𝑡𝑑 + 6 * *(1/2) * 𝑠𝑡𝑑 then
                                                              may be situations where the algorithm will calculate the
      return -(x-sr)/(6*std**2) + 1/(6**(1/2)*std);
                                                              weather "Sunny" because it was the most probable, but
  end
                                                              the actual weather will be different. In the Experiments
  if 𝑥 > 𝑠𝑟 + 6 * *(1/2) * 𝑠𝑡𝑑 then
                                                              section, we will test and analyze the obtained results of
      return 0;
                                                              the algorithm’s accuracy.
  end
  else
      return 0;                                                 We also analyzed the data using a violin graph for all
  end                                                         weather conditions and maximum temperature as we can
       Algorithm 5: The triangle algorithm                    see below:




                                                         67
Agnieszka Lutecka et al. CEUR Workshop Proceedings                                                                  64–75




Figure 7: Matrix graph



  In the attached photo we can see how the temperature To normalize the data to the range 0-1, we changed int64
value changes for a given weather, for example for "Mod- to float64.
erate rain", i.e. moderate rain, the maximum temperature
ranges from 5 to 20 degrees.
                                                                6. Implementation
5.2. Database modification                                      6.1. ProcessingData class
DateTime, SunRise, SunSet, MoonRise, MoonSet will not           Our project consists of two files: a file containing the
be used in our project, so we can get rid of them.              program code - "Pogoda.ipnb" and the database - "Istan-
      d a t a . drop ( ’ DateTime ’ , a x i s = 1 ,             bul Weather Data.csv". After analyzing the data from
             i n p l a c e = True )                             the database, we went to the "ProcessingData" class, in
      d a t a . drop ( ’ S u n R i s e ’ , a x i s = 1 ,        which we created 3 static methods: shuffle, splitSet and
             i n p l a c e = True )                             normalize.
      d a t a . drop ( ’ SunSet ’ , a x i s = 1 ,
             i n p l a c e = True )
      d a t a . drop ( ’ MoonRise ’ , a x i s = 1 ,
                                                                6.2. Shuffle method
             i n p l a c e = True )
      d a t a . drop ( ’ MoonSet ’ , a x i s = 1 ,              It takes base as input, i.e. our database. We use for loop
             i n p l a c e = True )                             to go through it selecting records and swapping them.




                                                           68
Agnieszka Lutecka et al. CEUR Workshop Proceedings                                                                             64–75




Figure 8: Violin graph



   Data: Input 𝑏𝑎𝑠𝑒                                                          Data: Input 𝑥, 𝑘
   Result: Database with jumbled records                                     Result: Training and validation set
   for 𝑖 in range(len(base)-1,-1,-1): do                                     𝑛 = 𝑖𝑛𝑡(𝑙𝑒𝑛(𝑥) * 𝑘)
       take two records from the database, the                               𝑥𝑇 𝑟𝑎𝑖𝑛 = 𝑥[: 𝑛]
         second with a random index and swap them                            𝑥𝑉 𝑎𝑙 = 𝑥[𝑛 :]
   end                                                                       return xTrain, xVal;
   return base;                                                                     Algorithm 8: SplitSet algorithm
         Algorithm 7: The shuffle algorithm

                                                                                 def s p l i t S e t ( x , k ) :
   Program code                                                                      n= i n t ( l e n ( x ) ∗ k )
                                                                                     x T r a i n =x [ : n ]
  @staticmethod
                                                                                     x V a l =x [ n : ]
     def s h u f f l e ( base ) :
                                                                                     r e t u r n xTrain , xVal
         f o r i in range ( len ( base )
                −1 , −1 , −1) :
                base . i l o c [ i ] , base . i l o c [ rd
                                                                           6.4. Normalize method
                         . r a n d i n t ( 0 , i ) ]= b a s e .
                         i l o c [ rd . r a n d i n t ( 0 , i ) ] ,        Takes x, which is a database that will have records scram-
                        base . i l o c [ i ]                               bled using the shuffle method. At the beginning, we enter
         return base                                                       all data from the database into the variable values, except
                                                                           for the string value, and the values of the column names
                                                                           into the variable columnNames. We loop through all the
6.3. Splitset method                                                       columns in the column, and then take all the rows in
                                                                           the column column and store them in the variable data.
It takes as input x - database and k - division of the set. In
                                                                           Variables max1 and min1 are assigned maximum and
the variable n we write the length of the set x multiplied
                                                                           minimum values from date. Using the next loop, we go
by k to know how to divide this set. Then, to two xTrain
                                                                           through all the rows and assign to the variable val the
variables, we write the data from the database to the
                                                                           formula for normalization min-max, that is, we subtract
value n, creating the training set, and to the variable xVal
                                                                           the coordinate database record [row, column] from the
- all data following the value n, creating the validation
                                                                           value min1, and then divide this difference by the differ-
set. Finally, we return both of these sets.
                                                                           ence between max1 and min1. Finally, we write the value
                                                                           after normalization to the database. The method returns
                                                                           us a normalized database.
   Program code




                                                                      69
Agnieszka Lutecka et al. CEUR Workshop Proceedings                                                                       64–75


  Data: Input 𝑥                                                     described in general in the "Algorithm" section, so now
  Result: Standardized database                                     we will look at the details. First, the method extracts the
  𝑣𝑎𝑙𝑢𝑒𝑠 = 𝑥.𝑠𝑒𝑙𝑒𝑐𝑡𝑑 𝑡𝑦𝑝𝑒𝑠(𝑒𝑥𝑐𝑙𝑢𝑑𝑒 =, , 𝑜𝑏𝑗𝑒𝑐𝑡′′ )                  database records with the given weather name into sepa-
  𝑐𝑜𝑙𝑢𝑚𝑛𝑁 𝑎𝑚𝑒𝑠 = 𝑣𝑎𝑙𝑢𝑒𝑠.𝑐𝑜𝑙𝑢𝑚𝑛𝑠.𝑡𝑜𝑙𝑖𝑠𝑡()                            rate lists. Then each of the lists created above is put into
   for 𝑐𝑜𝑙𝑢𝑚𝑛𝑖𝑛𝑐𝑜𝑙𝑢𝑚𝑛𝑁 𝑎𝑚𝑒𝑠): do                                    the "names" list. Additionally, we create a stringnames
      take all the rows from the column column                      list with string weather names and a values list that will
       𝑚𝑎𝑥1 = 𝑚𝑎𝑥(𝑑𝑎𝑡𝑎) 𝑚𝑖𝑛1 = 𝑚𝑖𝑛(𝑑𝑎𝑡𝑎)                            store the calculated probabilities for each weather.
       for row in range(0,len(x) do
          we go through all the lines
            𝑣𝑎𝑙 = (𝑥.𝑎𝑡[𝑟𝑜𝑤, 𝑐𝑜𝑙𝑢𝑚𝑛] −
            𝑚𝑖𝑛1)/(𝑚𝑎𝑥1 − 𝑚𝑖𝑛1)
            𝑥.𝑎𝑡[𝑟𝑜𝑤, 𝑐𝑜𝑙𝑢𝑚𝑛] = 𝑣𝑎𝑙
      end
  end
  return x;
      Algorithm 9: The normalize algorithm


  Program code
      def normalize ( x ) :

            v a l u e s =x . s e l e c t _ d t y p e s (
                   exclude =" o b j e c t " ) # s e l e c t
                    a l l d a t a from t h e d a t a b a s e
                     except the object , i . e .
                   string
            columnNames= v a l u e s . columns .
                    tolist ()


            f o r column i n columnNames :
                  d a t a =x . l o c [ : , column ] #
                         summon a l l rows i n
                         column column

                  max1=max ( d a t a )
                  min1=min ( d a t a )

                f o r row i n r a n g e ( 0 , l e n ( x )
                      ) : #we go t h r o u g h a l l
                      the l i n e s
                       v a l = ( x . a t [ row , column
                              ] − min1 ) / ( max1−min1
                              )
                      x . a t [ row , column ]= v a l
            return x


6.5. NaiveClassifier class and bayes
     method
The NaiveClassifier class has 6 static methods: "laplace",
"logarithmic", "uniform", "triangle", "gauss", "bayes". The
first 5 are methods describing the functions of different
probability distributions. The "bayes" method has been



                                                               70
Agnieszka Lutecka et al. CEUR Workshop Proceedings            64–75




  Data: Input 𝑑𝑎𝑡𝑎, 𝑠𝑎𝑚𝑝𝑙𝑒, 𝑛𝑎𝑚𝑒
  Result: Weather name
  Extract weather records ;
  Enter weather record sets into 𝑛𝑎𝑚𝑒𝑠 ;
  Enter weather names in the list 𝑠𝑡𝑟𝑖𝑛𝑔𝑛 𝑎𝑚𝑒𝑠 ;
  𝑣𝑎𝑙𝑢𝑒𝑠 := [] 𝑖 := 0;
  for 𝑖 < 𝑙𝑒𝑛(𝑛𝑎𝑚𝑒𝑠) do
      𝑡𝑟 = [];
      𝑠𝑟 = [];
      𝑠𝑡𝑑 = [];
      𝑗 := 1;
      for 𝑗 < 7 do
          Calculate the mean value of the column j ;
          Calculate the column standard deviation j
           ;
          if 𝑠𝑟[𝑗 − 1] == 0 then
               sr[j-1]=0.0000001;
          end
          if 𝑠𝑡𝑑[𝑗 − 1] == 0 then
               std[j-1]=0.0000001;
          end
          if 𝑛𝑎𝑚𝑒 == 𝑙𝑎𝑝𝑙𝑎𝑐𝑒′ 𝑎 then
               tr.append(NaiveClassifier.laplace
                 (sample[j],sr[j-1]));
          end
          if 𝑛𝑎𝑚𝑒 == 𝑙𝑜𝑔 − 𝑛𝑜𝑟𝑚𝑎𝑙𝑛𝑦 then
               tr.append(NaiveClassifier.logarytmiczny
                 (sample[j],std[j-1],sr[j-1]));
          end
          if 𝑛𝑎𝑚𝑒 == 𝑗𝑒𝑑𝑛𝑜𝑠𝑡𝑎𝑗𝑛𝑦 then
               tr.append(NaiveClassifier.jednostajny
                 (sample[j],std[j-1],sr[j-1]));
          end
          if 𝑛𝑎𝑚𝑒 == 𝑡𝑟𝑜𝑗𝑘𝑎𝑡𝑛𝑦 then
               tr.append(NaiveClassifier.trojkatny
                 (sample[j],std[j-1],sr[j-1]));
          end
          if 𝑛𝑎𝑚𝑒 == 𝑔𝑎𝑢𝑠𝑠 then
               tr.append(NaiveClassifier.gauss
                 (sample[j],std[j-1],sr[j-1]));
          end
      end
      values.append
       (np.prod(tr)*len(names[i])/len(names));
  end
  𝐼𝑛𝑑𝑒𝑥 = 𝑣𝑎𝑙𝑢𝑒𝑠.𝑖𝑛𝑑𝑒𝑥(𝑚𝑎𝑥(𝑣𝑎𝑙𝑢𝑒𝑠));
  return value from 𝑠𝑡𝑟𝑖𝑛𝑔𝑛𝑎𝑚𝑒𝑠 at index 𝐼𝑛𝑑𝑒𝑋 ;
         Algorithm 10: Bayes algorithm




                                                         71
Agnieszka Lutecka et al. CEUR Workshop Proceedings                                                                     64–75



   The next step is to loop through all the values of the        @staticmethod
names list in sequence. Then we create auxiliary lists: tr       d e f a n a l i z e ( T r a i n , Val , name ) :
[] - to store 6 probability values that correspond to the              c o r r e c t =0
next database column, sr [] - to store the mean values                 f o r i in range ( len ( Val ) ) :
from each column, std - to store the standard deviations                       if NaiveClassifier .
for each column. We pass the next loop through all the                                 c l a s s i f y ( Train , Val .
columns one by one. Then we calculate the mean and                                     i l o c [ i ] , name ) == V a l .
standard deviation for a given column. The next step is                                i l o c [ i ] . Condition :
the conditions that prevent sr [] and std [] from occurring.                            c o r r e c t +=1
The time has come for timetables. Depending on the in-                 accuracy = c o r r e c t / len ( Val ) ∗100
put data "name" to the list tr [] we add the result of the             return accuracy
function of the given distribution. After going through
the inner loop, we compute the value from Bayes’ theo-
rem. Based on the formula for conditional probability, we 7. Tests
multiply the values in the tr list, then multiply that prod-
uct by the list of names [i]. We divide the whole thing by We started our tests by checking the algorithm’s opera-
the length of the "names" list. We add the obtained result tion using various samples.
to the "values" list. After going through both loops, we
determine the index from the values list with the highest
value. Finally, we return the name of the weather with
the given index from the stringnames list.

6.6. AnalizingData class
Another class in our project is AnalizingData with the
                                                                   Figure 9: Sample tests
Analyze method. This method measures the accuracy
as a percentage of the Bayes classifier. The input data
is: Train - training set, Val - validation set and name            The above code shows us that depending on the value in
- name of the probability distribution. The algorithm              the sample, the algorithm returns different values, which
first sets the value of the corrtect variable to 0. Then it        proves the correct operation of the algorithm.
goes through the iterator loop and goes through all the
records of the Val set. If the value returned by the bayes            The next step was to determine the accuracy of our al-
classifier at the input: Train, Val [i], name is the same          gorithm for various probability distributions. For this we
as the weather name for the Val [i] record, increase the           used the AnalizingData class with the analysis method.
variable correct by 1. Finally, the algorithm returns the          We called the method for each of the 5 types of probabil-
accuracy, which we calculate by dividing correct by the            ity distributions for the training and validation division
product the length of the validation set and 100.                  in the ratio of 7: 3, and then, using the plt package, we
                                                                   displayed the graph.
   Data: Input 𝑇 𝑟𝑎𝑖𝑛, 𝑉 𝑎𝑙, 𝑛𝑎𝑚𝑒
   Result: Accuracy of the bayes algorithm
   𝑐𝑜𝑟𝑟𝑒𝑐𝑡 := 0
   𝑖 := 0 for 𝑖 < 𝑙𝑒𝑛(𝑉 𝑎𝑙)): do
       if
        𝑁 𝑎𝑖𝑣𝑒𝐶𝑙𝑎𝑠𝑠𝑖𝑓 𝑖𝑒𝑟.𝑐𝑙𝑎𝑠𝑠𝑖𝑓 𝑦(𝑇 𝑟𝑎𝑖𝑛, 𝑉 𝑎𝑙.𝑖𝑙𝑜𝑐[𝑖], 𝑛𝑎𝑚𝑒) ==
        𝑉 𝑎𝑙.𝑖𝑙𝑜𝑐[𝑖].𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 then
           correct+=1;
       end
   end
   accuracy=correct/len(Val)*100;
   return x;
          Algorithm 11: Analyze algorithm                  Figure 10: The graph of the accuracy of the algorithm de-
                                                                   pending on the probability distribution

   Program code
                                                                   As you can see in the attached picture, the algorithm
       c l a s s AnalizingData :




                                                              72
Agnieszka Lutecka et al. CEUR Workshop Proceedings                                                                  64–75



has different accuracies depending on the probability dis-
tribution. The Gaussian distribution definitely exceeds
other distributions with its accuracy, which is over 60%.
This means that more than 60% of the weather names
returned were valid. However, the Laplace and uniform
distributions do not lag far behind. Their values are in the
range 50-60%. Log normal and triangular distributions
are the least efficient because their accuracy is less than
20%. Additionally, we measured the execution time of
the algorithm. It was almost 4.512 minutes.
                                                                 Figure 12: Bar chart 1. for the training set 0.1
8. Experiments
8.1. Analysis of algorithm results for
     normalized and non-normalized data
We tested the operation of our program for both nor-
malized and non-normalized values and we determined
the algorithm execution time for both data sets. The
graphs below show the dependence of the accuracy on
the probability of individual distributions.


                                                                 Figure 13: Bar chart 2. for the training set 0.3




Figure 11: Bar chart for unnormalized data


   The first plot shows the Bayesian operation for unnor-
malized data in a ratio of 7: 3, while the second plot shows Figure 14: Bar chart 3. for the training set 0.5
the operation for unnormalized data with the same par-
tition. The program launch time for the first graph was
4.512 minutes, while for the second graph it was 4.433
minutes. As we can see, the only significant difference
was when using the Laplace distribution, the accuracy of
which decreased by almost 10%. The remaining results
are similar for both types of data. The time difference is
insignificant as it is only 5.023s.

8.2. Analysis of the algorithm’s results
     for different data divisions
The following charts show the efficiency of the algorithm
                                                           Figure 15: Bar chart 4. for the training set 0.9
for normalized data for various divisions into training
and validation sets:
  The algorithm execution time decreases with the re-
duction of the training set. For the last execution of the algorithm, where the division was in the ratio of 9: 1, the




                                                            73
Agnieszka Lutecka et al. CEUR Workshop Proceedings                                                                    64–75



time was only 1.467 minutes. However, the first calcula-           [5] R. Avanzato, F. Beritelli, M. Russo, S. Russo, M. Vac-
tions, where the division was in the ratio of 1: 9, took as            caro, Yolov3-based mask and face recognition al-
long as 9,517 minutes. This is because by reducing the                 gorithm for individual protection applications, in:
training set, we increase the number of records in the                 CEUR Workshop Proceedings, volume 2768, CEUR-
validation set. As a result, the algorithm will be called              WS, 2020, pp. 41–45.
more times and the most time-consuming elements such               [6] M. Woźniak, A. Zielonka, A. Sikora, Driving sup-
as extracting records with a given weather or loops will               port by type-2 fuzzy logic control model, Expert
be performed many times.                                               Systems with Applications 207 (2022) 117798.
   Analyzing the above, we can see that the accuracy               [7] G. Borowik, M. Woźniak, A. Fornaia, R. Giunta,
of the Gaussian distribution is superior to all of them,               C. Napoli, G. Pappalardo, E. Tramontana, A soft-
its value is practically unchanged. The Laplace distri-                ware architecture assisting workflow executions
bution is in second place, almost tapping 60%. The                     on cloud resources, International Journal of Elec-
value of the uniform distribution ranges from 50-55 %. It              tronics and Telecommunications 61 (2015) 17–23.
achieves the most on the last chart where the training                 doi:10.1515/eletel-2015-0002.
set is 0.9. Triangular and log normal distributions reach          [8] T. Qiu, B. Li, X. Zhou, H. Song, I. Lee, J. Lloret,
much lower values than the previously mentioned dis-                   A novel shortcut addition algorithm with particle
tributions. The jump is quite big, around 30%. However,                swarm for multisink internet of things, IEEE Trans-
the log-normalized distribution only slightly exceeds the              actions on Industrial Informatics 16 (2019) 3566–
triangular distribution once, and it is in the third graph.            3577.
Nevertheless, the accuracy values of both distributions            [9] G. Capizzi, G. Lo Sciuto, C. Napoli, R. Shikler,
never exceed 20%.                                                      M. Wozniak, Optimizing the organic solar cell man-
                                                                       ufacturing process by means of afm measurements
                                                                       and neural networks, Energies 11 (2018).
9. Conclusion                                                     [10] M. Woźniak, A. Sikora, A. Zielonka, K. Kaur, M. S.
                                                                       Hossain, M. Shorfuzzaman, Heuristic optimization
We can conclude from this that the Gauss distribution
                                                                       of multipulse rectifier for reduced energy consump-
is the best probability distribution for our database. The
                                                                       tion, IEEE Transactions on Industrial Informatics
algorithm with this distribution, with each modification,
                                                                       18 (2021) 5515–5526.
correctly determines about 60% of weather names, which
                                                                  [11] G. Capizzi, G. Lo Sciuto, C. Napoli, E. Tramontana,
is a good but unsatisfactory value. This is due to the
                                                                       M. Woźniak, A novel neural networks-based tex-
way the data is distributed in the database. In the case
                                                                       ture image processing algorithm for orange defects
of more different values for different weather conditions,
                                                                       classification, International Journal of Computer
this algorithm could become much more efficient.
                                                                       Science and Applications 13 (2016) 45–60.
                                                                  [12] N. Brandizzi, S. Russo, R. Brociek, A. Wajda, First
References                                                             studies to apply the theory of mind theory to green
                                                                       and smart mobility by using gaussian area cluster-
 [1] Y. Li, W. Dong, Q. Yang, S. Jiang, X. Ni, J. Liu, Auto-           ing, volume 3118, CEUR-WS, 2021, pp. 71–76.
     matic impedance matching method with adaptive                [13] D. Yu, C. P. Chen, Smooth transition in communica-
     network based fuzzy inference system for wpt, IEEE                tion for swarm control with formation change, IEEE
     Transactions on Industrial Informatics 16 (2019)                  Transactions on Industrial Informatics 16 (2020)
     1076–1085.                                                        6962–6971.
 [2] J. Yi, J. Bai, W. Zhou, H. He, L. Yao, Operating             [14] C. Napoli, G. Pappalardo, E. Tramontana, A
     parameters optimization for the aluminum electrol-                hybrid neuro-wavelet predictor for qos control
     ysis process using an improved quantum-behaved                    and stability, Lecture Notes in Computer Sci-
     particle swarm algorithm, IEEE Transactions on                    ence (including subseries Lecture Notes in Arti-
     Industrial Informatics 14 (2017) 3405–3415.                       ficial Intelligence and Lecture Notes in Bioinfor-
 [3] J. W. W. L. Z. B. Wei Dong, Marcin Woźniak, De-                   matics) 8249 LNAI (2013) 527–538. doi:10.1007/
     noising aggregation of graph neural networks by                   978-3-319-03524-6_45.
     using principal component analysis, IEEE Transac-            [15] Y. Zhang, S. Cheng, Y. Shi, D.-w. Gong, X. Zhao,
     tions on Industrial Informatics (2022).                           Cost-sensitive feature selection using two-archive
 [4] N. Brandizzi, V. Bianco, G. Castro, S. Russo, A. Wa-              multi-objective artificial bee colony algorithm, Ex-
     jda, Automatic rgb inference based on facial emo-                 pert Systems with Applications 137 (2019) 46–58.
     tion recognition, in: CEUR Workshop Proceedings,             [16] M. Ren, Y. Song, W. Chu, An improved locally
     volume 3092, CEUR-WS, 2021, pp. 66–74.                            weighted pls based on particle swarm optimization
                                                                       for industrial soft sensor modeling, Sensors 19



                                                             74
Agnieszka Lutecka et al. CEUR Workshop Proceedings           64–75



     (2019) 4099.
[17] B. Nowak, R. Nowicki, M. Woźniak, C. Napoli,
     Multi-class nearest neighbour classifier for
     incomplete data handling, in: Lecture Notes
     in Artificial Intelligence (Subseries of Lec-
     ture Notes in Computer Science), volume
     9119, Springer Verlag, 2015, pp. 469–480.
     doi:10.1007/978-3-319-19324-3_42.
[18] V. S. Dhaka, S. V. Meena, G. Rani, D. Sinwar, M. F.
     Ijaz, M. Woźniak, A survey of deep convolutional
     neural networks applied for prediction of plant leaf
     diseases, Sensors 21 (2021) 4749.
[19] R. Brociek, G. Magistris, F. Cardia, F. Coppa,
     S. Russo, Contagion prevention of covid-19 by
     means of touch detection for retail stores, in: CEUR
     Workshop Proceedings, volume 3092, CEUR-WS,
     2021, pp. 89–94.
[20] N. Dat, V. Ponzi, S. Russo, F. Vincelli, Supporting
     impaired people with a following robotic assistant
     by means of end-to-end visual target navigation
     and reinforcement learning approaches, in: CEUR
     Workshop Proceedings, volume 3118, CEUR-WS,
     2021, pp. 51–63.
[21] M. Woźniak, M. Wieczorek, J. Siłka, D. Połap, Body
     pose prediction based on motion sensor data and
     recurrent neural network, IEEE Transactions on
     Industrial Informatics 17 (2020) 2101–2111.
[22] G. Capizzi, F. Bonanno, C. Napoli, A wavelet
     based prediction of wind and solar energy for long-
     term simulation of integrated generation systems,
     in: SPEEDAM 2010 - International Symposium
     on Power Electronics, Electrical Drives, Automa-
     tion and Motion, 2010, pp. 586–592. doi:10.1109/
     SPEEDAM.2010.5542259.
[23] G. Capizzi, G. Lo Sciuto, C. Napoli, M. Woźniak,
     G. Susi, A spiking neural network-based long-term
     prediction system for biogas production, Neural
     Networks 129 (2020) 271 – 279.
[24] G. Capizzi, G. Lo Sciuto, C. Napoli, E. Tramontana,
     An advanced neural network based solution to en-
     force dispatch continuity in smart grids, Applied
     Soft Computing Journal 62 (2018) 768 – 775.
[25] O. Dehzangi, et al., Imu-based gait recognition
     using convolutional neural networks and multi-
     sensor fusion, Sensors 17 (2017) 2735.
[26] G. Capizzi, F. Bonanno, C. Napoli, Hybrid neu-
     ral networks architectures for soc and voltage pre-
     diction of new generation batteries storage, in:
     3rd International Conference on Clean Electrical
     Power: Renewable Energy Resources Impact, IC-
     CEP 2011, 2011, pp. 341–344. doi:10.1109/ICCEP.
     2011.6036301.
[27] H. G. Hong, M. B. Lee, K. R. Park, Convolutional
     neural network-based finger-vein recognition using
     nir image sensors, Sensors 17 (2017) 1297.



                                                        75