=Paper= {{Paper |id=Vol-2694/paper4 |storemode=property |title=Image classification with feed-forward neural networks |pdfUrl=https://ceur-ws.org/Vol-2694/p4.pdf |volume=Vol-2694 |authors=Bartłomiej Meller,Kamil Matula,Paweł Chłąd |dblpUrl=https://dblp.org/rec/conf/system/MellerMC20 }} ==Image classification with feed-forward neural networks== https://ceur-ws.org/Vol-2694/p4.pdf
Image classification with Feed-Forward Neural Networks
Bartłomiej Mellera , Kamil Matulaa and Paweł Chłąda
a Faculty of Applied Mathematics, Silesian University of Technology, Kaszubska 23, 44-100 Gliwice



                                          Abstract
                                          Artificial neural networks, such as feed-forward networks (FFN), convolutional neural networks (CNN), recursive neural
                                          network (RNN) are becoming powerful tools that are starting to replace many classical algorithms. It is known that, for
                                          image recognition, CNNs are often the best choice in terms of accuracy. In this paper we show that feed forward networks are
                                          capable of achieving comparable performance, with less complicated architecture in comparison to CNNs. After presentation
                                          of underlying theory of Feed Forward networks, we present different methods, that allowed us to get past network local
                                          minima, then we show experiments and conclusions that followed.

                                          Keywords
                                          Neural networks, Activation function, Images, Classification


1. Introduction                                          be done using neural networks [7]. We can also find
                                                         applications of vision support in virtual reality enter-
Artificial neural networks such as feed-forward net- tainment systems [8], where neural networks are used
works (FFN), convolutional neural networks (CNN), to improve perception. There are also many examples
recursive neural network (RNN) are becoming pow- in which neural networks are used for understanding
erful tools that are starting to replace many classical context and emotions from movies. In [9] adaptive at-
algorithms. It is known that, for image recognition, tention model was used to recognize patterns, while
CNNs are often the best choice in terms of accuracy. emotions from clips were detected by neural networks
We went with FFN because they are simpler in their [10] or complex neuro-fuzzy systems [11].
structure and with sufficient amount of data and heuris-    This paper addresses both of those issues. We show
tic approach they can achieve comparable performance, that combining heuristic and backpropagation algo-
while being easier to implement and manage.              rithm allows, for efficient overcoming of ”minima traps”
   One of the most effective learning method is back- and methods that help networks to generalize their
propagation (BP) algorithm. BP uses gradient calcula- knowledge. First we introduce basic theory of Feed-
tion to determine ”direction” in which network should Forward Networks, alongside with explanation of back-
”go”. The downside of strict mathematical approach is propagation algorithm. Then we describe our example
that gradient-based methods often ”get stuck” in local model and series of experiments that were performed
minima of a function. Another common problem in on the model. At the end of our paper, we present con-
many kinds of networks is knowledge generalization; clusion that we have gathered and give performance
we often train our models on large data sets, to avoid metrics of our model.
fixation of a network on small set of examples.
   We can find many applications of neural networks
in image processing. In [1] CNN was adopted to rec- 2. Feed-Forward Networks
ognize archaeological sites. We can also find many
applications in medicine where such systems extract
                                                              Theory
bacteria [2] or detect respiratory malfunctions or other Artificial Neural Network (ANN) is a mathematical mo-
pathologies [3, 4, 5]. In movie and advertisements field del of biological neural networks that compose a hu-
there are also many applications of such ideas. Movie man brain; similarly to one, the ANN is built of neu-
scenes can be segmented by using background infor- rons which are parts of layers. Each neuron from each
mation [6] or even prediction about such content can layer is connected to all neurons from the previous
                                                                                                                   layer and all neurons of the next layer are connected
SYSTEM 2020: Symposium for Young Scientists in Technology,
Engineering and Mathematics, Online, May 20 2020                                                                   by synapses. These connections have randomly ini-
" bartmel655@student.polsl.pl (B. Meller);                                                                         tialized weights, which are being modified in the learn-
kamimat133@student.polsl.pl (K. Matula);                                                                           ing process.
pawechl893@student.polsl.pl (P. Chłąd)
                                                                                                                      The first layer, responsible for receiving input data,

                                    © 2020 Copyright for this paper by its authors. Use permitted under Creative   is called input layer. Similarly the last one, which re-
                                    Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)                                        turns output data, is called output layer. There can be
                                                                  by adjusting synaptic weights in strictly defined way.
                                                                  There are many methods of learning, but the most pop-
                                                                  ular training technique that is used in Feed-forward
                                                                  Neural Network is back-propagation in which modify-
                                                                  ing weights of the connections goes from output layer
                                                                  to input layer sequentially. This direction is opposite
                                                                  to the way inserted information moves in all FFNs. The
                                                                  goal of training process is to minimize the value of loss
                                                                  function, for all elements included in training set (T).
                                                                  The training set consists of input vectors, which are
                                                                  inserted to first layer by input synapses, and expected
                                                                  output vectors, which are compared with gained out-
                                                                  puts every time neural network is fed. The loss func-
                                                                  tion shows the distance between predicted value and
                                                                  the actual one. To calculate it we can use Mean Square
                                                                  Error Loss or Cross-Entropy Loss (also known as Log
Figure 1: Bipolar linear activation function                      Loss). Using MSE, total error of training set can be de-
                                                                  scribed with equation:
                                                                                              𝑛
zero, one or more hidden layers between them. The                                  𝐸 = ∑ ∑(𝑑𝑖 − 𝑦𝑖 )2 ,                  (1)
goal of neural network architect is to find optimal sizes                                𝑇 𝑖=1
of layers, to make learning process much more effi-
cient. Input neurons’ count depends on number of                  where 𝑛 is a dimension of output vector (and also count
features the analyzing object has and output neurons’             of neurons on output layer), 𝑑𝑖 is predicted value on
count on how many classes it can be classified to.                𝑖 𝑡ℎ position of output vector and 𝑦𝑖 is actual value on
   Every neuron receives value on input, transforms it            𝑖 𝑡ℎ position of output vector. Correction of synaptic
using activation function and sends the output signal             weights starts in last layer and goes backwards through
to the next layer. The input signal of 𝑖 𝑡ℎ neuron of 𝑘 𝑡ℎ        all hidden layers until it reaches input layer. The weight
layer is:                                                         is changed according to equation:
                                𝑛
                                                                                     𝑤𝑖𝑗𝑘 = 𝑤𝑖𝑗𝑘 + 𝜂∇𝑤𝑖𝑗𝑘 ,              (2)
                        𝑠𝑖𝑘 = ∑ 𝑤𝑖𝑗𝑘 𝑦𝑗𝑘−1 ,
                                𝑗=1
                                                        where 𝜂 is correcting coefficient commonly called ’Learn-
where 𝑤𝑖𝑗𝑘 is a weight of connection between 𝑖 𝑡ℎ neu- ing Rate’ and ∇𝑤𝑖𝑗𝑘 is a value of gradient of synapse’s
ron of 𝑘 𝑡ℎ layer and 𝑗 𝑡ℎ neuron of previous layer and weight’s error:
𝑦𝑗𝑘−1 is 𝑗 𝑡ℎ neuron of previous layer’s output signal
value. The output signal of 𝑖 𝑡ℎ neuron is:                           𝜕𝐸     1 𝜕𝐸           𝜕𝑠𝑖𝑘
                                                            ∇𝑤𝑖𝑗𝑘 =        =   ⋅      ⋅ 2 ⋅       = 2𝛿𝑖𝑘 𝑦𝑗𝑘−1 , (3)
                                   𝑛                                 𝜕𝑤𝑖𝑗𝑘 2 𝜕𝑠𝑖𝑘           𝜕𝑤𝑖𝑗𝑘
               𝑦𝑖𝑘 = 𝑓 (𝑠𝑖𝑘 ) = 𝑓 (∑ 𝑤𝑖𝑗𝑘 𝑦𝑗𝑘−1 ).
                                      𝑗=1
                                                       where 𝛿𝑖𝑘 is value of a change of error function, for 𝑘 𝑡ℎ
   There are many activation functions (also known as layer’s 𝑖 𝑡ℎ neuron’s input signal and 𝑦 𝑘−1 is previous
                                                                                                𝑗
transfer functions or threshold functions). One of the
                                                       layer’s 𝑗 𝑡ℎ neuron’s output signal value. On last, 𝐾 𝑡ℎ
most commonly used is bipolar linear function. It’s
                                                       layer 𝛿 equals:
equation is:
                                                     𝑘
                            2               1 − 𝑒 −𝛼𝑠𝑖                  1 𝜕𝐸  1 𝜕(𝑑 𝐾 − 𝑦 𝐾 )2
           𝑓 (𝑠𝑖𝑘 ) =               𝑘
                                      −1=            𝑘             𝛿𝑖𝐾 = ⋅ 𝑘 = ⋅ 𝑖 𝑘 𝑖         = 𝑓 ′ (𝑠𝑖𝐾 )⋅(𝑑𝑖𝐾 −𝑦𝑖𝐾 ), (4)
                        1 + 𝑒 −𝛼𝑠𝑖          1 + 𝑒 −𝛼𝑠𝑖                  2 𝜕𝑠𝑖 2      𝜕𝑠𝑖
where 𝛼 is a coefficient that influences ”width” (con-
vergence rate) of activation function.                      where 𝑓 ′ (𝑠𝑖𝐾 ) is value of activation function’s differen-
   After artificial neural network has been properly built, tial of 𝑖 𝑡ℎ output neuron’s input signal. On this layer
it is time to teach it object recognition. ANN learns



                                                             23
the value depends mainly on distance between pre-                     Data: 𝐸𝑝𝑜𝑐ℎ𝑠𝐶𝑜𝑢𝑛𝑡, 𝑇 𝑟𝑎𝑖𝑛𝑖𝑛𝑔𝐼 𝑛𝑝𝑢𝑡𝑠,
dicted and actual values - on error. Other layers’ 𝛿s                        𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝑂𝑢𝑡𝑝𝑢𝑡𝑠 (𝐸𝑂𝑈 𝑇 )
use numbers calculated in previous steps:                             Result: Higher precision of neural network
                                                                      𝐿 ∶= count of network’s layers;
             𝑁        𝑘+1             𝑁𝑘+1
                                                                      𝐷 ∶= empty jagged array, for 𝛿 values;
     1 𝜕𝐸  1 𝑘+1 𝜕𝐸 𝜕𝑠𝑗                                               for 𝑖 = 0 to 𝐸𝑝𝑜𝑐ℎ𝑠𝐶𝑜𝑢𝑛𝑡 do
𝛿𝑖𝑘 = ⋅ 𝑘 = ⋅ ∑ 𝑘+1       = 𝑓 ′ 𝑘
                               (𝑠 𝑖 ) ∑ 𝛿𝑗𝑘+1 𝑤𝑖𝑗𝑘+1 ,
     2 𝜕𝑠𝑖 2 𝑗=1 𝜕𝑠𝑗 𝜕𝑠𝑖𝑘             𝑗=1                                 for 𝑗 = 0 to Training Set’s Length do
                                              (5)                             Insert 𝑗 𝑡ℎ vector from 𝐼 𝑛𝑝𝑢𝑡𝑠
                                                                              to first layer’s input synapses;
where 𝑁𝑘+1 is count of neurons on (𝑘 + 1)𝑡ℎ layer.
                                                                             for 𝑘 = 0 to L do
   In Algorithm 1 you can see the full process of train-                         Calculate 𝑠 𝑘 on all neurons on 𝑘 𝑡ℎ
ing Feed-Forward Neural Network using back-propag-                                layer
ation algorithm. It uses array called 𝐷, which con-                              by summing products of synaptic
sists of 𝛿 values. This jagged array has 𝐿 rows and                               weights
as many columns, in a row, as many neurons the layer                             and output values of previous layer
has, it corresponds to. All elements of arrays are writ-                         neurons (or synapses if it is input
ten like 𝐴𝑖,𝑗 which is just alternative form, for writing                         layer);
𝐴[𝑖][𝑗]. What’s more, numerical intervals that are used                         Calculate 𝑦 𝑘 on all neurons on 𝑘 𝑡ℎ
in the for-loops are half-open (the numbers after to                             layer
don’t count to these intervals). There are also symbols                         by using activation function;
like 𝐿𝑅, 𝑠𝑖𝑘 , 𝑦𝑖𝑘 and 𝑤𝑖𝑗𝑘 . They mean respectively: Learn-                 end
ing Rate (𝜂), 𝑘 𝑡ℎ layer’s 𝑖 𝑡ℎ neuron’s input and output
                                                                             𝑂𝑢𝑡𝑝𝑢𝑡 (𝑂𝑈 𝑇 ) ∶= vector made of
values and weight of synapse between 𝑖 𝑡ℎ neuron of
                                                                              output values
𝑘 𝑡ℎ layer and 𝑗 𝑡ℎ neuron of previous layer. Moreover
                                                                             values from output neurons;
there is 𝑓 ′ (⋅) symbol that signifies the differential of the
                                                                             for 𝑛 = 0 to output neurons count do
activation function - for bipolar linear function
                                                                                 𝐷(𝐿−1),𝑛 =
                                                                                  (𝐸𝑂𝑈 𝑇𝑗,𝑛 − 𝑂𝑈 𝑇𝑗 ) ⋅ 𝑓 ′ (𝑠𝑛𝐿−1 );
                                 1 − 𝑒 −𝛼𝑥
                       𝑓 (𝑥) =                            (6)                end
                                 1 + 𝑒 −𝛼𝑥
                                                                             for 𝑘 = 𝐿 − 2 to 0 step −1 do
the differential is:
                                                                                 for 𝑛 = 0 to 𝑘 𝑡ℎ layer’s neurons
                                 2𝛼𝑒 −𝛼𝑥                                          count do
                   𝑓 ′ (𝑥) =                 .            (7)                        𝐷𝑘,𝑛 = 0;
                               (1 + 𝑒 −𝛼𝑥 )2
                                                                                     for 𝑚 = 0 to (𝑘 + 1)𝑡ℎ layer’s
                                                                                     neurons count do
                                                                                                                    𝑘+1 ;
                                                                                         𝐷𝑘,𝑛 = 𝐷𝑘,𝑛 + 𝐷(𝑘+1),𝑚 ⋅ 𝑤𝑚𝑛
3. Example system
                                                                                     end
Our experimental FFN takes in an 50x25x3 image and                                   𝐷𝑘,𝑛 = 𝐷𝑘,𝑛 ⋅ 𝑓 ′ (𝑠𝑛𝑘 );
outputs six dimensional vector of decision values. Each                          end
decision value represents one movie that is associated                       end
with given frame.
                                                                            for 𝑘 = 𝐿 − 2 to 0 step −1 do
                                                                                for 𝑛 = 0 to 𝑘 𝑡ℎ layer’s neurons
3.1. Data preparation                                                            count do
3.1.1. Image preparation                                                            for 𝑚 = 0 to (𝑘 − 1)𝑡ℎ layer’s
                                                                                    neurons count do
After loading data set into the memory, we cut each
                                                                                                          𝑘,𝑛 ⋅ 𝑦𝑚 ;
                                                                                          𝑘 = 2 ⋅ 𝐿𝑅 ⋅ 𝐷
                                                                                        𝑤𝑛𝑚                      𝑘−1
image to meet 2:1 aspect ratio, this value was chosen                               end
because our input vector is an image that has an as-
                                                                                end
pect ratio 2:1. If we took an image with aspect ratio
                                                                            end
that does not meet input aspect ratio, we would need
to stretch or shrink an image; that would in turn add                    end
                                                                      end
                                                                        Algorithm 1: FFN training algorithm.

                                                                 24
some pixels and thus make results more inaccurate.             was not capacious enough to accommodate amount of
After cropping to target aspect ratio, we crop another         knowledge that could recognize four or more movies.
12 of width and height, from each side, to eliminate           Eventually, model that contained 3 hidden layers with
 1
possible letterboxes. Finally we shrink an image to get        the following count of neurons: 800, 200, 50 came out
it to 50x25 dimensions. Such dimensions have been              to be an optimal solution.
chosen to cut down on training times. Obviously to                The back-propagation algorithm has one significant
eliminate possible artifacts introduced by the prepro-         defect. It does not use any heuristics that could help
cessing steps odescribed above the image will be fil-          it to deal with local minima. Solution to this issue
tered [12].                                                    is quite simple and easy to implement. In this case,
                                                               adding a random number from range of [−0.002; 0.002]
3.1.2. Data labeling                                           to weights of all connections turned out to be extraor-
                                                               dinarily efficient. However it’s hard to say about ex-
Each image is given label, signifying movie it belongs         act values because, the method was crossed with grad-
         1 of image samples are redirected into test-
to. Then 10                                                    ual increase of classes count.Surprisingly, increasing
ing set.                                                       amount of classes resulted in leap of quality in deci-
                                                               sions made by the network.
3.2. Training network                                             This phenomenon was observed, after few epochs
                                                               of learning, after class addition. Sometimes network
After preparation of training and testing datasets, all
                                                               needed randomization mentioned above for the phe-
225 per movie images, converted to the vectors, are in-
                                                               nomenon to occur.
serted to the neural network. As it was said in ”Feed-
                                                                  At the time of writing this article absolute accuracy
Forward Networks Theory” section, the information
                                                               of predictions made by the network, for six classes ex-
goes through all layers to the output layer with us-
                                                               ceeded maximal accuracy given on four classes. We
ing activation function in all encountered neurons and
                                                               suspect that this could be explained by growing gener-
then goes back in back-propagation process, modify-
                                                               ality of classifier contained in the model with increas-
ing synaptic weights. This sequence is repeated mul-
                                                               ing amount of known classes.
tiple times and results in improvement of network’s
                                                                  Methodology of learning was following. Model was
ability to recognizing objects.
                                                               trained until it’s accuracy hadn’t increased, for few
           1 hidden layer with 50 neurons                      epochs. In next stage networks’ weights was random-
          Movies count Highest precision                       ized to get three different child networks. All four in-
                4               76.0 %                         stances was learned in parallel. At the end, the best
                5               71.2 %                         one of them was picked, reproduced by randomizing
                6               52.7 %                         their weights and then all the process were repeated.
                                                                  When the network had stalled with it’s advance-
      3 hidden layers: 800, 200 and 50 neurons                 ment, one class was added to its possible outputs and
      Movies count        Highest precision                    learning was continued in the same way. One of the
                                                               most interesting issues that we have experienced in
            4                   82.0 %
                                                               the initial part of the research was a significant drop
            5                   88.8 %
                                                               in accuracy when class set contained ”Shrek 2” movie.
            6                   84.6 %
                                                               After preliminary learning on dataset that hadn’t con-
                                                               tained this movie, this problem disappeared.
4. Experiments                                                    We were also experimenting with different weight
                                                               initialization techniques, described in [4, 13]. We have
Main problem that was dealt with in order to provide           tried the following methods:
optimal learning accuracy and time, was selecting ap-
propriate architecture that grants relatively good ac-             • He weight initialization
                                                                                    √         - multiplies random value
curacy, but on the other hand, contains as few neurons               [−1.0, 1.0] with   𝑠𝑖𝑧𝑒 where 𝑠𝑖𝑧𝑒 is a size of pre-
                                                                                          2
as required. Minimal neuron count results in minimal                 vious layer
computational time.
   Initially, when tests were carried out on 4 classes             • Xavier’s weight initialization
                                                                                          √         - similar to He ini-
of data, network that contained only one hidden layer                tialization, but with     1
                                                                                             𝑠𝑖𝑧𝑒
which consisted of as few as 50 neurons seemed to be
the most efficient and accurate (76%). Nevertheless it             • We also used the following technique:



                                                          25
      √
          𝑠𝑖𝑧𝑒𝑙−1 +𝑠𝑖𝑧𝑒𝑙 where 𝑠𝑖𝑧𝑒𝑛 is size of nth layer
                2                                          [2] D. Połap, M. Woźniak, Bacteria shape classifica-
                                                               tion by the use of region covariance and convo-
   Unfortunately none of those methods worked cor-             lutional neural network, in: 2019 International
rectly. In testing, we used networks with significant          Joint Conference on Neural Networks (IJCNN),
number of neurons in nearly every layer (3750 neurons          IEEE, 2019, pp. 1–7.
in input layer) that in turn set initial weight values to [3] M. Wózniak, D. Połap, R. K. Nowicki, C. Napoli,
minute values.                                                 G. Pappalardo, E. Tramontana, Novel approach
   Results of our research are shown in the tables be-         toward medical signals classifier, in: 2015 Inter-
low.                                                           national Joint Conference on Neural Networks
   First table presents the most satisfactory accuracies       (IJCNN), IEEE, 2015, pp. 1–7.
of the neural network (with 1 hidden layer consisting [4] G. Capizzi, G. L. Sciuto, C. Napoli, D. Polap,
of 50 neurons) before the experiment. The second one           M. Woźniak, Small lung nodules detection based
shows the effects of randomization. Percents situated          on fuzzy-logic and probabilistic neural network
in right columns are calculated by dividing amount of          with bio-inspired reinforcement learning, IEEE
correctly recognized movies by size of testing dataset         Transactions on Fuzzy Systems 28 (2019) 1178–
(which is 25 frames per movie). As it shows, the results       1189.
are surprisingly high.                                     [5] F. Beritelli, G. Capizzi, G. Lo Sciuto, C. Napoli,
                                                               M. Woźniak, A novel training method to preserve
                                                               generalization of rbpnn classifiers applied to ecg
5. Example Predictions                                         signals diagnosis, Neural Networks 108 (2018)
The tables below contain example predictions made by           331–338.
the network. Provided images are screenshots from          [6] L.-H. Chen, Y.-C. Lai, H.-Y. M. Liao, Movie
the Netflix platform that were taken independently from        scene   segmentation using background informa-
the learning and testing datasets.                             tion,  Pattern Recognition 41 (2008) 1056–1065.
   As it can be seen, the accuracy is satisfactory high.   [7] Y. Zhou,   L. Zhang, Z. Yi, Predicting movie box-
It makes a few mistakes when it comes to dark and              office revenues  using deep neural networks, Neu-
distant frames. Fixing this issue will be our next goal.       ral  Computing   and Applications  31 (2019) 1855–
                                                               1865.
                                                           [8] D. Połap, K. Kęsik, A. Winnicka, M. Woź-
6. Conclusions                                                 niak, Strengthening the perception of the vir-
                                                               tual worlds in a virtual reality environment, ISA
After all experiments done with this network, we can           transactions 102 (2020) 397–406.
state the following conclusions:                           [9] J. Chen, J. Shao, C. He, Movie fill in the blank by
    1. Combining back-propagation and heuristic ap-            joint learning from video and text with adaptive
       proach gave an unprecedented leap in network            temporal attention, Pattern Recognition Letters
       accuracy. - After a number of tests with weight         132 (2020) 62–68.
       randomization, we suspect that by giving a ”nud-   [10] S. Alghowinem, R. Goecke, M. Wagner, A. Alwa-
       ge” to weights, we push it out of local minimum,        bil, Evaluating and validating emotion elicitation
       hence allowing it further learning.                     using   english and arabic movie clips on a saudi
    2. Changing model in-flight allows, for more gen-          sample,   Sensors 19 (2019) 2218.
       erality. - By changing model topology (ex. adding  [11] T.-L. Nguyen,   S. Kavuri, M. Lee, A multimodal
       additional output dimension), we have seen an           convolutional   neuro-fuzzy   network for emotion
       increase in generality of a model.                      understanding    of movie  clips, Neural Networks
                                                               118 (2019) 208–219.
    3. Training on diversified samples first, results in
                                                          [12] G. Capizzi, S. Coco, G. Lo Sciuto, C. Napoli, A
       increased generality.
                                                               new iterative fir filter design approach using a
                                                               gaussian approximation, IEEE Signal Processing
References                                                     Letters 25 (2018) 1615–1619.
                                                          [13] M. Matta, G. Cardarilli, L. Di Nunzio, R. Fazzo-
  [1] M. Woźniak, D. Połap, Soft trees with neural             lari, D. Giardino, A. Nannarelli, M. Re, S. Spanò,
      components as image-processing technique for             A reinforcement learning-based qam/psk sym-
      archeological excavations, Personal and Ubiqui-          bol synchronizer, IEEE Access 7 (2019) 124147–
      tous Computing 24 (2020) 363–375.                        124157.



                                                            26
Picture       Origin         Prediction     Matches   2nd prediction


           Indiana Jones    Indiana Jones     yes       Shark Tale



           Indiana Jones    Indiana Jones     yes        Shrek 2



           Indiana Jones    Indiana Jones     yes        Shrek 2



           Indiana Jones      Shrek 2         no      The Lego Movie



          The Lego Movie   The Lego Movie     yes       Shark Tale



          The Lego Movie   The Lego Movie     yes       Shark Tale



          The Lego Movie     Shark Tale       no      The Lego Movie



          The Lego Movie   The Lego Movie     yes        Shrek 2



           Mada- gascar     Mada- gascar      yes       Shark Tale



           Mada- gascar       Shrek 2         no      The Lego Movie



           Mada- gascar     Mada- gascar      yes        Shrek 2



           Mada- gascar     Mada- gascar      yes        Shrek 2




                             27
Picture       Origin         Prediction     Matches   2nd prediction


            Shark Tale       Shark Tale       yes     The Lego Movie



            Shark Tale       Shark Tale       yes      Indiana Jones



            Shark Tale       Shark Tale       yes     The Lego Movie



            Shark Tale       Shark Tale       yes     The Lego Movie



             Shrek 2          Shrek 2         yes       Shark Tale



             Shrek 2          Shrek 2         yes      Indiana Jones



             Shrek 2          Shrek 2         yes      Mada- gascar



             Shrek 2          Shrek 2         yes       Shark Tale



          Loving Vincent   Loving Vincent     yes        Shrek 2



          Loving Vincent   Loving Vincent     yes     The Lego Movie



          Loving Vincent   Loving Vincent     yes       Shark Tale



          Loving Vincent   Loving Vincent     yes        Shrek 2




                             28