=Paper= {{Paper |id=Vol-3092/p07 |storemode=property |title=Emotion Recognition from Tweets |pdfUrl=https://ceur-ws.org/Vol-3092/p07.pdf |volume=Vol-3092 |authors=Jakub Sydor,Szymon Cwynar |dblpUrl=https://dblp.org/rec/conf/system/SydorC21 }} ==Emotion Recognition from Tweets== https://ceur-ws.org/Vol-3092/p07.pdf
Emotion Recognition from Tweets
Jakub Sydor1 , Szymon Cwynar1
1
    Faculty of Applied Mathematics, Silesian University of Technology, Kaszubska 23, 44100 Gliwice, POLAND


                                             Abstract
                                             These days we face more and more often internet bullying. Our goal was to develop software, which would recognize
                                             emotions from the bare text. Our project is based on Twitter posts, but it could be also used in every single platform, in
                                             which users communicate via text messages. We use a few solutions to make our program as accurate as it possibly could
                                             get. Firstly we picked a large database to get the biggest context, We used World2Vec to represent words as vectors, and
                                             lastly, we used a neural network to predict output from sentences beyond our database. Our article is mostly about different
                                             versions of the algorithm and comparison to choose the best approach to the problem. As we learned the biggest difference-
                                             maker was the amount of hidden layers and number of neurons inside each one of them, type of activation function, and
                                             training algorithm. We attach a big amount of plots to visualize each of our tries. In our article, we will try to show our
                                             approaches and data which is connected to those approaches. We created functions to monitor our error, the accuracy
                                             function to sum up our algorithm - how efficient it is, precision function - to diagnose what proportion of identifications
                                             was correct, recall - a fraction of relevant instances that were retrieved and f1 which combines precision and recall to make
                                             an average of it valued from 0 to 1.

                                             Keywords
                                             Artificial neural network, Word2vec, emotion, tweets



1. Introduction                                                                                                            or data which possibly could make our algorithm less
                                                                                                                           reliable. Therefore we cleared it out of stuff like Names,
The assumption of our project was to create an algo-                                                                       links, and mentions.
rithm based on an artificial neural network. Its main                                                                         Next algorithm used in our program is Word2Vec
goal was to recognize whether the entry is neutral, neg-                                                                   which is responsible for translating our sentences and
ative, or positive. The algorithm is learning on a base                                                                    words into numbers. Every word is represented by a
that contains 1.6m tweets using backpropagation algo-                                                                      10-dimensional vector. The algorithm which stands be-
rithm. We decided to use neural network as classifiers as                                                                  hind word2Vec is nothing else than an artificial neural
they have been reported in various interesting applica-                                                                    network, which would be explained later. Because of the
tions [1, 2, 3, 4].                                                                                                        length of our one-word vector and the maximum length
    In [5] neural networks are used in federated systems                                                                   of twitter expression (280 words) we created an input
in which they resource information each other during                                                                       layer which size is simply the result of the multiplication
training. Models of neural networks are also very effi-                                                                    of those 2 values, which is 2800 neurons.
cient in detection threats over internet [6]. We can also                                                                     Then we move to the heart of our program - the artifi-
find them as classifiers in images [7] and systems of IoT                                                                  cial neural network. The whole structure is handwritten
to detect position of people [8, 9, 10].                                                                                   by us, we don’t use any libraries. Its main functions are
    We got our database from Kaggle but it was full of                                                                     run and addlayer, which are responsible for adding lay-
unnecessary data like date or user. We cleared it and left                                                                 ers and running the whole algorithm. The run function
only 2 columns - target and text, we got rid of columns                                                                    returns 2 output neurons, which represent, by using soft-
that contained information like date, the user, or tweet                                                                   max, for the probability of label. The first neuron is the
id.                                                                                                                        possibility of positive output and the second one for nega-
    To make our algorithm work we needed to divide it                                                                      tive. We also add a function that check the absolute value
into few subsections. The first of them is a section con-                                                                  of its difference, if it’s small enough then the output is
nected to a database. Firstly, as mentioned before, we                                                                     equal to neutral. The artificial network includes the input
dropped most of the columns, but secondly, we needed                                                                       layer with 2800 neurons, a first hidden layer with 600
to make sure that our data is not containing unused data                                                                   neurons, a second hidden layer with 200 neurons, third
                                                                                                                           hidden layer with 20 neurons, and output layer which
SYSTEM 2021 @ Scholar’s Yearly Symposium of Technology,                                                                    consists of 2 neurons.
Engineering and Mathematics. July 27–29, 2021, Catania, IT
" jakusyd988@student.polsl.pl (J. Sydor);
szymcwy664@student.polsl.pl (S. Cwynar)
~ https://github.com/Harasz/ (J. Sydor);
                                                                                                                           2. Data Base
https://github.com/SzymCwy/ (S. Cwynar)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative        Our database consists of 1 600 000 tweets, each item is
                                       Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)                                             represented by 5 columns. One record includes the date



                                                                                                                      40
Jakub Sydor et al. CEUR Workshop Proceedings                                                                             40–47



of the tweet, posting users nickname, the id of a tweet,          neural network and it allows us to make mathematical
the content of the tweet, and the label - if it’s positive or     operations on words. Gensims Wor2Vec implements two
negative. We needed to modify our database to contain             functions - Continuous Bag Of Words(CBOW) and Skip-
only 2 of 5 columns because only text and labels will be          Gram.
used. Except that we needed to adapt our database to our            In the CBOW model surrounding words is combined to
use and clear it of meaningless text.                             predict the word they surround, while in the Skip-Gram
   So firstly we loaded the database using ’pandas’ and           we use a word to predict the context.
provided our data frame with labels to make the access
easier. Also, we’ve implemented a function whose main
task was to clear any irrelevant text, such as pronouns,          5. Mathematical representation of
conjunctions, links, and mentions, to make sure our al-              Skip-Gram model
gorithm will learn properly.
                                                                  Mathematically, we can describe n-word sentence
                                                                  𝑤1 , . . . , 𝑤𝑛 , using skip-gram as following formula:
3. Algorithm overall
Our algorithms inputs are sentences that in the next steps                                                 𝑛
                                                                                                          ∑︁
are converted into words. Using Word2Vec each word in              𝑆𝑘𝑖𝑝𝐺𝑟𝑎𝑚 = 𝑤𝑖1 , 𝑤𝑖2 , . . . , 𝑤𝑖𝑛 |         𝑖𝑗 − 𝑖𝑗−1 < 𝑘
the sentence is converted into ten-dimensional vectors                                                    𝑗=1
                                                                                                                             (1)
and those vectors are inserted into the array, each vector
                                                                     where k is max skip-distance and n to subsequence
as a separate element of our array. Those words are easily
                                                                  length.
available because of two mechanisms, word to id and id to
                                                                     For example, when we have the sentence ”I love to
word. Now using an artificial network, weighted sum and
                                                                  write scripts” and k is equal to 1 and n to 2, that means
activation function the algorithm is filling every single
                                                                  we will connect 2 words, which have a maximum of
neuron with proper values. In the output layer, we have
                                                                  one word between them. Those connections would
2 neurons that, at the end of the algorithm, return two
                                                                  be: {I,love},{I, to},{love, to},{love, write},{to, write},{to,
values between 0 and 1. Because of softmax, those values
                                                                  scripts},{write,scripts}.
can be identified as the probability of each label. When
learning those two values are collated with expected
outcomes. That way we get the distance between our                6. Backward propagation
result and the real label and we used that values in the
algorithm of backward propagation to change all of the        A backward propagation algorithm is used in our pro-
weights so that our program is getting more and more          gram to modify the weights of each neuron to get the best
precise with each iteration.                                  results. Our neurons have pregenerated weights from 0
                                                              to 1. To make our algorithm more precise by analyzing
                                                              the errors backward propagation algorithm correct na-
4. Word2Vec algorithm                                         tive weights starting from the end of our artificial neural
                                                              network. As an input, it takes the probability of each la-
We are using the gensim library to implement the algo-
                                                              bel and the expected label. It calculates the error of each
rithm of Word2Vec. This part of our code lets us change
                                                              of the output neurons and those errors are propagated
words in the database to vectors, so they can be used
                                                              to previous layers. Each weight in our network is being
in our calculations. The algorithm as input data takes
                                                              modified based on the value of the error. This algorithm
the whole data frame with all sentences, each row rep-
                                                              has its limit so you need to be careful while setting its
resented as one sentence. Firstly sentences need to be di-
                                                              iterations. After a few runs values are being modified to
vided into words. Nextly we count how many times each
                                                              a lesser extent, so when those changes are minor, that’s
word occurs in the text and based on that information we
                                                              the sign to stop the algorithm.
create 2 dictionaries - word to id and id to word, which
would make the conversion from text to id and inversely
easier. In the built-in function, we need to specify the size 7. Activation function
of the vector, minimal number of occurrences, window,
and source of words. In our example we set minimum Activation function is inseparable element of every ar-
occurrence to 1, size of vector to 10 and window to 7, tificial neural network. We have lots of them available
to make sure our dictionary would be big, to connect but each of them is different. We use s-shaped function
big amounts word witch each other and also we needed - hiperbolic tangent as our activation function. It deter-
10-dimensional vector for every word so it would fit our mines the output of artificial neural network. All of the
input layer. Word2Vec is nothing else than an artificial output values are between 0 and -1. The advantage of



                                                             41
Jakub Sydor et al. CEUR Workshop Proceedings                                                          40–47




Figure 1: Graphical representation of the CBOW model and Skip-gram model [11].




Figure 2: Pseudo-code of the back-propagation algorithm in training ANN [12].



our activation function is mapping, all positive and nega- 8. Maths behind activation
tive values will be presented as strong values and those
which are close to 0 would be close to 0 on the tanh graph.
                                                                function
We also have chosen tanh function because is strongly Our activation function - Hyperbolic tangent might be
advised when neural network has only 2 outputs.             represented as:

                                                                                            sinh 𝑥
                                                                                 tanh 𝑥 =               (2)
                                                                                            cosh 𝑥

                                                                                           𝑒𝑥 − 𝑒−𝑥
                                                                                tanh 𝑥 =                (3)
                                                                                           𝑒𝑥 + 𝑒−𝑥




                                                          42
Jakub Sydor et al. CEUR Workshop Proceedings                                                                             40–47



                              𝜕           (𝑒𝑥 + 𝑒−𝑥 )(𝑒𝑥 + 𝑒−𝑥 ) − (𝑒𝑥 − 𝑒−𝑥 )(𝑒𝑥 − 𝑒−𝑥 )
                                 tanh 𝑥 =                                                                                   (4)
                              𝜕𝑥                           (𝑒𝑥 + 𝑒−𝑥 )2



                                                                  10. Inference
                                                          Inference in our algorithm is simply choosing an option
                                                          with a higher probability. Thanks to softmax we get on
                                                          our output layer two neurons with probabilities for each
                                                          label. Firstly we need to convert output as it is in a form
                                                          that cannot be compared to label from our database. The
                                                          output is a two-dimensional array, where first we got
                                                          the probability of positive tweet and the second as the
                                                          negative one. So we need to make a variable ’expected’,
                                                          so it would be represented as a two-dimensional array.
                                                          Next we check if the absolute value of the subtraction is
                                                          bigger than 0.1, if it’s not we make our entry neutral. If
                                                          one of the values is big enough we assign respectively
                                                          the label. As we compare these two values we also
Figure 3: Comprehension between sigmoid and tanh activa- calculate the accuracy of our algorithm. Below we
tion functions. [13].
                                                          represent the pseudocode of inference. [H] Input Data:
                                                          sentence label k, array of vectors j (sentence represented
                                                          as an array of vectors), Choosing the label of sentence
   It’s domain is range from -1 to 1. Its monotonic func-
tion, which derivative is non monotonic. Derivative of       𝑘 == 0 : 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 = [0, 1] 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 = [1, 0]
tanh:                                                        neo = Artificial neural net output as probability of each
                                                          label
                  𝜕                                          absolute = Absolute value of subtraction both output
                      tanh 𝑥 = 1 − tanh2 𝑥            (5)
                 𝜕𝑥                                       values
                                                             absolute < 0.1: Neutral neo [0] < neo [1] Positive Nega-
9. Artificial Neural Network                              tive Inference Algorithm.

Our neural network algorithm is divided into to main
classes. NeuralNetwork class and Neuron class.
                                                                  11. SoftMax
   NeuralNetwork has 2 variables - layers and weight.             Softmax is exponential function, which normalizes values
Which store respectively arrays of Neurons and theirs             of our 2 output neurons to the sum of 1. We use that
weights. First funtion is addlayer which was written to           function in our program to represent both of our output
allow creating layers with specific number of neurons in-         neurons values as a probability of getting positive or
side it given as an argument. When used it adds elements          negative label. Author names can have some kinds of
of Neuron class into the array of layers. Get size function       marks and notes:
is used to return number of neurons in whole artificial
neural network, thanks to that function we are able to                                          𝑒𝑥𝑝(𝑥𝑖 )
properly use generate weights. With the result of the                         𝑆𝑜𝑓 𝑡𝑚𝑎𝑥(𝑥𝑖 ) = ∑︀                            (6)
                                                                                                 𝑗 𝑒𝑥𝑝(𝑥𝑗 )
previous function we use generate weights to create array
with randomly generated numbers from 0 to 1. Load                    Additionaly apart from calculating softmax, we need
weights function is responsible for assigning weights             to get its derivative. Its used by backpropagation function
to neurons. The last of them is run, it firstly checks if         during calculating difference between expected values
algorithm has the same amount of input neurons and                and outputs from our net. We start by separately com-
inputs given by a user. If it returns true it starts to assign    puting derivatives. First for the first neuron.
values to neurons.
   The next class - Neuron, is responsible for calculating                     𝜕𝑒𝑧1
                                                                  𝜕𝑆(𝑧1 )             · (𝑒𝑧1 + 𝑒𝑧2 ) − 𝜕𝑧𝜕1 (𝑒𝑧1 + 𝑒𝑧2 ) · 𝑒𝑧1
weighted sum and using activation function to assign                      =
                                                                               𝜕𝑧1

values to each neuron.                                             𝜕𝑧1                          (𝑒𝑧1 + 𝑒𝑧2 )2
                                                                                                                            (7)




                                                             43
Jakub Sydor et al. CEUR Workshop Proceedings                                                                              40–47



  so we have:                                                                 True Positives: 10     False Positives: 3
                                                                              False Negatives: 2    True Negatives: 15
           𝜕
             𝑆(𝑧1 ) = 𝑆(𝑧1 ) × (1 − 𝑆(𝑧1 ))               (8)
         𝜕𝑧1
  Now for the second one.                                            13. Recall function
                                                                     Recall function is very similar to precision function. The
             𝜕𝑒𝑧2
 𝜕𝑆(𝑧2 )     𝜕𝑧1
                    · (𝑒𝑧1 + 𝑒𝑧2 ) − 𝜕𝑧𝜕1 (𝑒𝑧1 + 𝑒𝑧2 ) · 𝑒𝑧2         only difference is that we compare true positive values
         =
  𝜕𝑧1                         (𝑒𝑧1 + 𝑒𝑧2 )2                          to false negatives (incorrect predictions - model predict
                                                          (9)        incorrectly negative class). Our model is most efficient
  so we have:                                                        when recall factor is 1.0 - that means there are no false
                                                                     negatives.
            𝜕                                                           The equation is also very similar:
                𝑆(𝑧2 ) = −𝑆(𝑧1 ) × 𝑆(𝑧2 )                (10)
           𝜕𝑧1
  Conclusion for N outputs                                                            𝑅𝑒𝑐𝑎𝑙𝑙 =
                                                                                                    𝑇𝑃
                                                                                                                         (17)
                                                                                                 𝑇𝑃 + 𝐹𝑁
                                𝑒𝑧𝑖                                     And when we calculate Recall one the same set of data
                    𝑆(𝑧1 ) = ∑︀𝑁 𝑧                       (11)
                               𝑗=1 𝑒
                                     𝑗
                                                                     as Precision that’s our outcome:
  General formula for softmax derivative for N outputs:
                                                                              True Positives: 10     False Positives: 3
                 {︃                                                           False Negatives: 2    True Negatives: 15
   𝜕               𝑆(𝑧𝑖 ) × (1 − 𝑆(𝑧𝑖 )) if     𝑖=𝑗
      𝑆(𝑧𝑖 ) =                                           (12)
  𝜕𝑧𝑗                  −𝑆(𝑧𝑖 ) × 𝑆(𝑧𝑗 ) if      𝑖 ̸= 𝑗
                                                                                                     𝑇𝑃
                                                                                      𝑅𝑒𝑐𝑎𝑙𝑙 =                             (18)
                          𝜕                                                                        𝑇𝑃 + 𝐹𝑁
  If we are computing        𝑆(𝑧𝑖 ) the output is always
                         𝜕𝑧𝑖
𝑆(𝑧𝑖 ) × (1 − 𝑆(𝑧𝑖 )). However when we are computing                                                  10
                                                                                        𝑅𝑒𝑐𝑎𝑙𝑙 =                           (19)
 𝜕                                                                                                  10 + 2
     𝑆(𝑧𝑖 ) the output changes to −𝑆(𝑧𝑖 ) × 𝑆(𝑧𝑗 )
𝜕𝑧𝑗
                                                                                    𝑅𝑒𝑐𝑎𝑙𝑙 = 10/12 ≈ 0.833                 (20)

12. Precision function
                                                                     14. Comparison of recall and
Precision is a function which shows the proportion of
true positive identifications. If we analyse retrieval of
                                                                         precision
information, precision is fraction of correct results di-            The comparison of those 2 function is very difficult be-
vided by all returned results. We calculate precision from           cause of tension between them. That means if you im-
two variables: TP and FP. Which respectively stands for              prove one of them, the second one is reducing its preci-
true positive and false positive. By the term true positive          sion. Using data above we got:
we mean outcome where the model correctly predicted                  Precision ≈ 0.769
positive class and false positive as incorrect prediction            Recall ≈ 0.833
of positive class.
   Precision is given by the following formula:                        For data:
                                𝑇𝑃
                 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =                    (13)                         True Positives: 10     False Positives: 3
                             𝑇𝑃 + 𝐹𝑃                                          False Negatives: 2    True Negatives: 15
 When precision rate is equal to 1.0, that means that
model produces no false positives.
 Example of calculating Precision:                                     When we decrease number of FP and FN increases: We
                                                                     got:
                                  𝑇𝑃
                 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =                            (14)
                                𝑇𝑃 + 𝐹𝑃                                       True Positives: 10     False Positives: 1
                                                                              False Negatives: 4    True Negatives: 15
                                     10
                    𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =                         (15)
                                   10 + 3
                                                                     Precision ≈ 0.91
             𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 10/13 ≈ 0.769                  (16)        Recall ≈ 0.71




                                                                44
Jakub Sydor et al. CEUR Workshop Proceedings                                                                           40–47



  And when we do the opposite thing, we decrease num-
ber of FN and increase number of FP: We got:
                                                                                             𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙
                                                                    𝐹𝛽 = (1 + 𝛽 2 ) ×                                   (26)
                                                                                         (𝛽 2 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛) + 𝑟𝑒𝑐𝑎𝑙𝑙
           True Positives: 10    False Positives: 4
           False Negatives: 1   True Negatives: 15
                                                                                        (1 + 𝛽 2 ) × 𝑇 𝑃
Precision ≈ 0.71                                                       𝐹𝛽 =                                             (27)
                                                                              (1 + 𝛽 2 ) × 𝑇 𝑃 + 𝛽 2 * 𝐹 𝑁 + 𝐹 𝑃
Recall ≈ 0.91
                                                           When 𝛽 is equal to 2 recall weights are higher than
  So we got to conclusion that they are not quite com- precision, however when its equal to 0.5 weights of pre-
parable, but there is another method which uses both of cision are higher than recall. Example of calculating 𝐹𝛽
them in the calculations and it’s named F1 score.       score

                                                                            True Positives: 10    False Positives: 3
15. F1 score                                                                False Negatives: 2   True Negatives: 15

Firstly the name of F1 score, also known as F-measure,
is believed to refer to different F funcion, which was              with 𝛽 = 2:
concluded in Van Rijsbegens Book, when introduced to
                                                                                           (1 + 4) × 10
the Fourth Message Understanding Conference.                                  𝐹𝛽 =                                      (28)
   F1 score is measurement of test’s accuracy. It is cal-                            (1 + 4) × 10 + 4 * 2 + 3
culated from the recall and precision. F-meause is the                                            40
harmonic mean (the reciprocal of the arithmetic mean of                                   𝐹𝛽 =                          (29)
                                                                                                  55
the reciprocals of the given set of observations) of Preci-         𝐹𝛽 ≈ 0.73
sion and Recall. It can be modified by additional weights,
valuing precision or recall more than other.
   The highest value of F1 score is 1.0 which indicates
the best precision and recall and 0 indicates that one of
precision or recall is equal to 0.
   F-measue is also known as Sørensen–Dice coefficient
or Dice similarity coefficient (DSC).
                               2
             𝐹1 =                                     (21)
                    𝑟𝑒𝑐𝑎𝑙𝑙−1 + 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛−1
                         𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙
              𝐹1 = 2 ×                                (22)
                         𝑟𝑒𝑐𝑎𝑙𝑙 + 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
                               𝑇𝑃
               𝐹1 =                                   (23)
                      𝑇 𝑃 + 12 (𝐹 𝑃 + 𝐹 𝑁 )                       Figure 4: Sigmoid activation function [13].
  Example of calculating F1 score:

           True Positives: 10    False Positives: 3
           False Negatives: 2   True Negatives: 15                16. Experiments
                                                                  We’ve experimented with
                              10
                   𝐹1 =                               (24)             • Activation Function
                        10 + 21 (3 + 2)
                                                                       • Artificial neural network learning algorithms
                                 10                                    • Structure of artificial neural network
                        𝐹1 =                          (25)
                                12.5
𝐹1 = 0.8                                                  We’ve chosen Hyperbolic tangent as our main activa-
                                                       tion function. We decided on that function after compar-
  𝐹𝛽 Score is used when we want recall to be consid- ison of three functions - ReLU, Sigmoid and Tanh, as we
ered 𝛽 times more important than precision, where 𝛽 is thaught it would fit our algorithm best. We have tested all
positive real factor.                                  of them by accuracy and f1 score functions. Also we’ve



                                                             45
Jakub Sydor et al. CEUR Workshop Proceedings                                                                      40–47




                                                               Figure 7: Accuracy/Cost for Test Over Time [15].
Figure 5: ReLU activation function [14].


                                                               17. Conclusions
                                                               In our work we have tested application of neural net-
                                                               works to word processing purposes. We have used spe-
                                                               cial library to work with tweets. Our idea was tested
                                                               and results show we have good model which is able to
                                                               work with tweets. in future works we will try to develop
                                                               furhter our project to make it also compare tweets be-
                                                               tween various authors. We will also work to apply other
                                                               models and ideas to compar them to this presented neural
                                                               network.


                                                               References
                                                                [1] S. Brusca, G. Capizzi, G. Lo Sciuto, G. Susi, A
Figure 6: Comparison of activation functions [13].                  new design methodology to predict wind farm en-
                                                                    ergy production by means of a spiking neural net-
                                                                    work–based system, International Journal of Nu-
read articles that proves that this type of function is the         merical Modelling: Electronic Networks, Devices
best to neural network with 2 neuron in output layer.               and Fields 32 (2019). doi:10.1002/jnm.2267.
Thing that outweigh the decision was its shape. Because         [2] G. Capizzi, C. Napoli, L. Paternò, An innovative
of tanh function we are able to easily spot negative values         hybrid neuro-wavelet method for reconstruction of
and those which are close to 0.                                     missing data in astronomical photometric surveys,
   We also have used softmax function to represent values           Lecture Notes in Computer Science (including sub-
on our output neurons as probability of each label. Apart           series Lecture Notes in Artificial Intelligence and
from that we have used derivative of soft max in algo-              Lecture Notes in Bioinformatics) 7267 LNAI (2012)
rithm of back propagation to decrease the error.                    21–29. doi:10.1007/978-3-642-29347-4_3.
   We tried Particle Swarm Optimization and backward            [3] G. Capizzi, F. Bonanno, C. Napoli, Hybrid neural
propagation as our learning algorithms. After reading               networks architectures for soc and voltage pre-
articles and running some test, we decided to use back-             diction of new generation batteries storage, 2011.
ward propagation as it was easier to use with softmax               doi:10.1109/ICCEP.2011.6036301.
and also was more efficient than PSO.                           [4] C. Napoli, F. Bonanno, G. Capizzi, An hybrid neuro-
   After many tries we ended our tests with 3 hidden                wavelet approach for long-term prediction of solar
layers: first - 600, second 200, third 10, as input and             wind, Proceedings of the International Astronomi-
output layer number of neurons is constant - 2800 inputs            cal Union 6 (2010) 153–155.
and 2 outputs.                                                  [5] D. Połap, M. Woźniak, Meta-heuristic as manager




                                                          46
Jakub Sydor et al. CEUR Workshop Proceedings                   40–47



     in federated learning approaches for image process-
     ing purposes, Applied Soft Computing 113 (2021)
     107872.
 [6] M. Wozniak, J. Silka, M. Wieczorek, M. Alrashoud,
     Recurrent neural network model for iot and net-
     working malware threat detection, IEEE Transac-
     tions on Industrial Informatics 17 (2021) 5583–5594.
 [7] X. Liu, S. Chen, L. Song, M. Woźniak, S. Liu, Self-
     attention negative feedback network for real-time
     image super-resolution, Journal of King Saud
     University-Computer and Information Sciences
     (2021).
 [8] G. Capizzi, C. Napoli, S. Russo, M. Woźniak, Lessen-
     ing stress and anxiety-related behaviors by means
     of ai-driven drones for aromatherapy, volume 2594,
     2020, pp. 7–12.
 [9] M. Woźniak, M. Wieczorek, J. Siłka, D. Połap, Body
     pose prediction based on motion sensor data and
     recurrent neural network, IEEE Transactions on
     Industrial Informatics 17 (2020) 2101–2111.
[10] R. Avanzato, F. Beritelli, M. Russo, S. Russo, M. Vac-
     caro, Yolov3-based mask and face recognition al-
     gorithm for individual protection applications, in:
     CEUR Workshop Proc., 2020, pp. 41–45.
[11] T. Mikolov, Q. V. Le, I. Sutskever, Exploiting simi-
     larities among languages for machine translation,
     arXiv preprint arXiv:1309.4168 (2013).
[12] H. Guo, H. Nguyen, D.-A. Vu, X.-N. Bui, Forecast-
     ing mining capital cost for open-pit mining projects
     based on artificial neural network approach, Re-
     sources Policy (2019) 101474.
[13] S. Sharma, S. Sharma, A. Athaiya, Activation func-
     tions in neural networks, towards data science 6
     (2017) 310–316.
[14] K. Sarkar, Relu: Not a differentiable function: Why
     used in gradient based optimization? and other
     generalizations of relu, Data Science Group, IITR
     (2018).
[15] J. D. Seo, Unfair back propagation with tensorflow
     [manual back propagation with tf], 2018.




                                                          47