<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Pizzo Calabro (VV),
Italy
" l.sacco@dimes.unical.it (L. Sacco); dino.ienco@inrae.fr (D. Ienco); roberto.interdonato@cirad.fr
(R. Interdonato)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A neural network strategy for supervised classification via the Learning Under Privileged Information paradigm</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ludovica Sacco</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dino Ienco</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberto Interdonato</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CIRAD, UMR TETIS</institution>
          ,
          <addr-line>Montpellier</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>DIMES - University of Calabria</institution>
          ,
          <addr-line>P. Bucci 41C, 87036 Rende (CS)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>INRAE, UMR TETIS, Univ. Montpellier</institution>
          ,
          <addr-line>Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>TETIS, Univ. of Montpellier</institution>
          ,
          <addr-line>APT, Cirad, CNRS, INRAE, Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Devising new methodologies to handle and analyse Big Data has become a fundamental task in our increasingly service-oriented and interconnected society. One of the problems arising while handling such data is that, for a given set of entities, not all the entities may be described at the same level of detail, i.e., the number of features describing each entity may vary. In general cases, in order to apply classic data science methods, it is necessary to have a common features set over a data set. This will then correspond to the maximum number of common features among the entities, resulting in a loss of information for the entities for which additional information may be available. In order to exploit such additional information, the Learning Using Privileged Information (LUPI) paradigm has been proposed, based on the use of the teacher role in the learning process. In this schema the teacher acquires a strategic position, by exploiting at the training stage some additional privileged information about the entities, which will not be available at the test stage. In this work, we apply this paradigm in the context of neural networks, by proposing a LUPI based deep learning architecture able to exploit a larger set of attributes at training time, with the aim to improve classification performances on a set of entities associated to a reduced attribute set. Experimental results show how the proposed approach improves upon the ones applying the same schema to classic machine learning methods (e.g., SVM).</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Big Data</kwd>
        <kwd>LUPI paradigm</kwd>
        <kwd>Neural Networks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The developments achieved in recent years in machine learning, due both to theoretical studies
and to the constant increase of computational power of the processors, have led to several
advancements on the traditional learning techniques. Moreover, these years have also seen
a sudden increase in the quantity of available data. Massive quantities of information are
produced, collected and digitally stored everyday, in contexts that touch diferent domains
and aspects of human life (human health and behavior, agriculture, biology, and so on). The
availability of these so called big data, together with that of machines that can handle higher
volumes of data than ever before, have also led to an increase in the use of supervised learning
techniques, i.e., techniques that rely on labeled data in order to train a machine learning model.
However, in practical contexts, these data are generally noisy, and may include a diversity of
features that are not always easy to exploit for computational analyses. A typical scenario is
that in which entities of the same type are associated to a non homogeneous set of attributes
that describe them. When using traditional machine learning methods, this would require the
need to reduce the set of the attributes exploited to learn the model to just the common ones,
thus resulting in a significant loss of information.</p>
      <p>
        One of the solutions that have been proposed in order to overcome this problem, and allow
the use of such diverse set of attributes, is the Learning Using Privileged Information paradigm
(LUPI) paradigm [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The idea behind LUPI is to introduce a teacher-student model in the
learning process, that allows to train a model on a larger set of attributes than that available at
testing time. More specifically, the idea is to select a subset of entities for which a larger set of
attributes is available (i.e., the privileged information), and then to train a model that can use
such information to discern among entities that are associated to a reduced set of attributes (i.e.,
entities for which the priviliged information is not available, but just a limited set of regular
attributes). The original implementations of the LUPI paradigm where introduced on classic
machine learning techniques, e.g., Support Vector Machines (SVM) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, this paradigm
can also be integrated in more advanced frameworks, such as the ones based on deep learning
architectures [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], that are nowadays the state of the art in several domains.
      </p>
      <p>In this paper, we present a LUPI based deep learning approach based on a three-branch
architecture where a first model (M1) is trained on the full set of attributes (regular and privileged
information), a second one (M2) exploits the representation learnt by M1 in order to stretch
the representation learnt from regular attributes towards the one obtained from privileged
ones, and a third model (M3) is optimized on the regular attributes only. The final feature set
is then obtained by combining the otput of M2 and M3. While the proposed architecture is
implemented by using a classic Multi-Layer Perceptron to instantiate the three models, it is
general enough to be extended to diferent neural network models (e.g., Convolutional Neural
Networks and Recurrent Neural Networks), and to be efectively applied to data coming from
diferent domains. Experimental evaluation on six real world dataset show how the proposed
methodology improves upon a competitor that applies the LUPI paradigm to an SVM approach
(SVM+). The paper is organized as follows: Section 2 discusses the related work, section 3
introduces the proposed architecture, Section 4 discusses experimental results, while Section 5
concludes the work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The first algorithm that implements the LUPI paradigm is an extension of the classic SVM
algorithm, known as SVM+ [
        <xref ref-type="bibr" rid="ref1 ref3">1, 3</xref>
        ]. SVM+ proposes a modified formulation of SVM in order to
evaluate the misclassification of the training example through a function that learns from the
privileged information as slack variables. Successively, diferent improvements and alternatives
have found place into supervised learning problems. In [4] it has been remarked that the LUPI
paradigm has always been implemented with the L-2 support vector machine, with the result of
the number of the tuning parameters doubled and a consequent increase of the computational
cost. So the decision to employ the L-1 norm, widely used for feature and variable selection [5],
brought to the creation of L-1 SVM and its use in LUPI. Some amelioration to the optimization
problem of the SVM+ is provided with the presentation of two new SMO-style algorithms [6]:
aSMO and gSMO. When large training sets are used, both show to be more eficient in terms of
generalization error/running treade-of than the generic optimizer LOQO, based on the interior
point optimizers. Further similar algorithms can be found in [7], with some comparison between
aSMO and caSMO. Considerations on diferentiating easy examples from the dificult ones are
discussed and formalized in [8], where a new approach in choosing weights associated with
every training example is presented. More recent works [9, 10] discuss the importance of the
knowledge transfer concept into SVM+ and neural networks to improve their convergence
properties and therefore accelerating the speed of Student’s learning.
      </p>
      <p>Privileged information plays an important role in domains like computer vision, where several
studies have successfully introduced the LUPI paradigm [11, 12]. Other applications are based
on the use of bounding-boxes, image captions and descriptions as additional information and
combine it with the Rank Transfer techniques to overcome the limitations of the SVM+ [13]. In
recent years, other studied use the privileged information often associated to neural networks
and exploited to solve various tasks. Classifying fine-grained images or recognizing images
from a data set containing annotation noise is possible thanks to the DeepLUPI framework [14].
In [15] a two-stream fully convolutional network, named MIML-FCN+, is proposed to solve the
multi-instance multi-label problem by using privileged bags of labels. A LUPI framework for
CNNs and RNNs is ofered in [16], where a heteroscedastic dropout is used and the privileged
information is represented by the variance of the dropout. In this work, while we exploit a
simple Multi-Layer Perceptron to instantiate the models at the base of our architecture, we
propose a deep learning framework that is general enough to be extended to diferent neural
network models (e.g., Convolutional Neural Networks and Recurrent Neural Networks), and to
be efectively applied to data coming from diferent domains.</p>
    </sec>
    <sec id="sec-3">
      <title>3. The LUPI Architecture</title>
      <sec id="sec-3-1">
        <title>3.1. Learning Using Privileged Information Paradigm</title>
        <p>In the classic machine learning paradigm, the role of a teacher in the human learning process
has been underestimated. The supervised learning task considers a simple strategy. Given a set
of pairs (1, 1), ..., (, ),  ∈ ,  ∈ {− 1, 1}, where the vector  ∈  is the description
of the example and  its classification, finding the function  =  (,  * ) that minimizes the
probability of incorrect classifications. In the LUPI paradigm the goal is exactly the same, but
it contemplates the knowledge provided by the teacher at the training stage, the so called
Privileged Information. So triplets (1, *1, 1), ..., (, * ),  ∈ , * ∈ * ,  ∈ {− 1, 1}
are now considered instead of pairs and each one is independently generated by some underlying
unknown distribution  (, * , ).</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Implementation</title>
        <p>The proposed LUPI based deep learning architecture (Figure 1) is an end-to-end framework
consisting of three neural networks (e.g. multi-layer perceptron) that process the input data to
produce the final classification.</p>
        <p>The first stream analyzes the training examples  (i.e., the entity with the associated regular
information) with the addition of the privileged information * , while the other two streams
work only on the training example . This choice allows to exploit the knowledge acquired
when the privileged information is available. In particular, the distribution of the labels got
as an output from the execution of the first model (M1) is channeled into the second neural
network. From this configuration, the second model (M2) uses as input a set of pairs (, )
where  ∈  and  ∈ {− 1, 1} and  corresponds to the output obtained from M1. Moreover,
to train the second neural network we choose the Kullback Leibler divergence [17] to let M2
behaves as similar as possible to M1. Finally, the third stream manages the training examples
without any additional information to help during the classification. It should be noted that the
M1 model is executed separately from the other two, and that M2 and M3 are run in parallel, as
part of a single multi-output neural network.</p>
        <p>Regarding the loss function the Categorical Cross Entropy has been used, except for M2,
where the Kullback Leibler divergence is employed with the aim to mimic the behavior of the
M1 model trained with the privileged information.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>This section reports the experimental setup, the results achieved and the data used to evaluate
our architecture. To evaluate the proposed LUPI architecture, we carried out the experiments on
six dataset, from diferent domains. We describe how we preprocessed them for the utilization in
the experimental phase. We introduce setup details for both our implementation and competitors,
specifying the metrics adopted for the comparison. Finally we discuss the results, aiming to
validate our architecture.</p>
      <sec id="sec-4-1">
        <title>4.1. Datasets</title>
        <p>Six well-known datataset were used in the experiments :
• COIL20 [18]: image dataset including 20 diferent objects. For each of them 72 images are
included in the size 128x128. The background has been discarded.
• HAR [19]: sensors data, built through a smartphone, from the recording of 30 subjects,
performing various activities of daily living and containing six diferent activities: walking,
walking upstairs, walking downstairs, sitting, standing and laying.
• Isolet [20]: 150 subjects spoke the name of each letter of the alphabet twice. Hence, there
are 52 training examples from each speaker.
• landsat [21]: dataset of agricultural land images in Australia to classify seven diferent
soil classes constituting dissimilar soil types.
• USPS [22]: 9298 images belonging to 10 diferent classes with size 16x16.
• waveform [23]: generator generating 3 classes of waves. Each class is generated from a
combination of 2 to 3 base waves.</p>
        <p>
          Structural characteristics of these datasets are summarized in Table 1.
4.2. Setup
In order to evaluate the significance of the proposed approach, we provide an experimental
evaluation including tests on diferent percentages of privileged information and a comparison
with the performance of the SVM+ algorithm [
          <xref ref-type="bibr" rid="ref1 ref3">1, 3</xref>
          ]. For training and evaluation purposes, the
datasets were divided into three parts: training, validation and test set, respectively representing
the 50%, the 20% and the 30% of the instances (i.e., rows of the dataset). In line with recent deep
learning literature, the model obtaining the best performance on the validation set has been used
for the evaluation on the test set. The privileged information is provided by selecting a portion
of the available features for each dataset, i.e., a certain number of columns. In detail, indices
are randomly shufled and then 9 diferent percentages of attributes are taken into account:
5%, 10%, 25%, 30%, 50%, 60%, 70%, 80%, 90%. The indices are selected in an incremental fashion,
so that the higher percentages contain the indices included in the lower ones. Bear in mind
that the percentage of privileged and “regular” information are complementary, so that for a
certain percentage  of privileged information, the number of attributes available at testing time
will be the (100 − )% of total attributes. In our setting, this means that higher percentages
of privileged information will correspond to harder test stages, because lower quantities of
information will be available at testing time. Obviously this assumption is only related to our
need to “artificially” partition the available attributes in privileged and regular, and does not
hold in real world cases, where higher quantities of privileged information should correspond
to better performances.
        </p>
        <p>In order to avoid the bias induced by this selection process, these operations are executed
10 times, in order to get 10 diferent configurations of privileged indices for each considered
percentage of privileged information. In relation to the deep learning parameters, the number
of epochs is set to 100 and the learning rate of the Adam optimizer is equal to 1 × 10− 4 [24].
While the metrics identified to evaluate our architecture are Accuracy and F-measure.</p>
        <p>Regarding the internal structure of the LUPI architecture, the three models, M1, M2 and
M3 are built using a multi-layer perceptron neural network (MLP). Each MLP has an input,
three internal layers and an output layer, with respectively 256, 192, 128, 64 neural units and
the number of classes of each dataset as the units for the output layer. As the activation
function, the Rectified Linear Unit (ReLU) is employed for each layer, a part from the output
one which utilized the SoftMax function to normalize the output vector values so that their sum
is equal to 1. The experiments are run 30 times for each configuration of the diferent privileged
indices, then mean and standard deviation are taken into account to evaluate the results. The
reported experiments are carried out on the following platform: Intel(R) Core(TM) i5-7300U
CPU @ 2.60GHz 2.71 GHz with 8,00 GB of RAM. The LUPI based deep learning architecture is
implemented in Python Tensorflow 2.0 library. The source code for SVM+ is written in C and it
is available online 1.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.3. Experimental Results</title>
        <p>Table 2 and table 3 report the average result in term of Accuracy and F-Measure for the 6 diferent
datasets. It can be noted that the proposed approach has better performances for percentages of
privileged information lower or equal to 30%. This result is actually not surprising since, as
explained in Section 4.2, higher percentages of privileged information correspond to harder test
stages (i.e., lower quantities of information are available at testing time). For higher percentages
of privileged information, performances are slightly lower, but still comparable for most datasets,
showing the robustness of the proposed approach even in the cases when only 10% of the total
attributes are available at testing time (i.e., 90% of privileged information).</p>
        <p>Actually, the only dataset showing a significance degradation in performances for higher
percentages of privileged information is waveform, which is also the one corresponding to
the lowest performances for all configuration. A reason for this behavior may be found in
its synthetic nature. On the contrary, the dataset who produces the best results is COIL20,
which is the dataset including the highest number of features, also corresponding to a relatively
low number of objects. This probably allows to obtain a better classification, thanks to the
large amount of available information at the training time over a relatively small numbers of
tuples. As a general observation, all the other datasets show good Accuracy and F-Measure
results, confirming the efectiveness of the proposed approach on datasets coming from diferent
application domains.</p>
        <p>Table 4 shows the average Accuracy results for the SVM+ algorithm. It can be noted how in
most cases the performances begin to degrade already for a PI around the 10%, making SVM+
significantly less robust to our approach. To ease the comparison between the two approaches,
in Figure 2 we present a visual comparison between our architecture and SVM+ in terms of
Accuracy. It is evident how the LUPI architecture dominates the scene, clearly outperforming
SVM+ in most datasets. The only two datasets in which the performances are not clearly
superior (remain still comparable) are USPS and Isolet.</p>
        <p>The scenario described so far reflects the reality, regarding the Teacher-Student metaphor
on which the LUPI paradigm is based. Too high percentages of PI would not allow the student
to develop such skills to adequately face the testing phase. While percentages of up to 30%
represent the right measure to develop one’s own method and acquire such knowledge as to be
able to build precise classification models. Our architecture proves to be able to fully perform
this task, providing a valid structure for an optimal learning up to 30% of PI, also allowing to
improve the performance in terms of Accuracy compared to the SVM+ algorithm.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In this work, we introduced a new deep learning architecture based on the Learning Using
Privileged Information (LUPI) paradigm. The LUPI paradigm allows to exploit extra information
that may be available only for a subset of the total objects in a dataset, i.e., the so called privileged
information. More specifically, a model is trained by exploiting such privileged information,
which is then able to classify test examples associated only to regular one. Experimental
performances on six well-known datasets show the significance of our approach, proving its
robustness to diferent available quantities of privileged and regular information. Experimental
results also showed how the proposed deep learning architecture outperforms a competitor
integrating the LUPI paradigm on a classic machine learning method (SVM+).</p>
      <p>While in this work, in order to test the significance of our approach, we resorted to an
artificial partitioning of the available information between privileged and regular one, in the
future we plan to test the architecture on real-world case studies, corresponding to specific
application domains. A first example can be that of remote sensing, when the availability of
diferent sensors on specific areas may be exploited as privileged information (e.g., exploiting
drone images associated to a limited geographical area, associated to satellite images available
on larger portions of the scene). From a methodological point of view, we also plan to extend
our approach by instantiating the three models included in the architectures with diferent
neural networks (e.g., Recurrent and Convolutional ones).</p>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgments</title>
      <p>This work was supported by the Regione Calabria, as part of the Ph.D scolarship of Ludovica
Sacco.
[4] L. Niu, Y. Shi, J. Wu, Learning using privileged information with l-1 support vector machine,
in: 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent
Agent Technology, IEEE, 2012. doi:10.1109/wi-iat.2012.52.
[5] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, Springer New</p>
      <p>York, 2009. doi:10.1007/978-0-387-84858-7.
[6] D. Pechyony, R. Izmailov, A. Vashist, V. Vapnik, Smo-style algorithms for learning using
privileged information., 2010, pp. 235–241.
[7] D. Pechyony, V. Vapnik, Fast Optimization Algorithms for Solving SVM+, 2011, pp. 27–42.</p>
      <p>doi:10.1201/b11429-5.
[8] M. Lapin, M. Hein, B. Schiele, Learning using privileged information: SVM and weighted</p>
      <p>SVM, Neural Networks 53 (2014) 95–108. doi:10.1016/j.neunet.2014.02.002.
[9] V. Vapnik, R. Izmailov, Learning using privileged information: Similarity control and
knowledge transfer, Journal of Machine Learning Research 16 (2015) 2023–2049. URL:
http://jmlr.org/papers/v16/vapnik15b.html.
[10] V. Vapnik, R. Izmailov, Knowledge transfer in SVM and neural networks, Annals of
Mathematics and Artificial Intelligence 81 (2017) 3–19. doi: 10.1007/s10472-017-9538-x.
[11] J. Feyereisl, S. Kwak, J. Son, B. Han, Object localization based on structural svm
using privileged information, in: Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence,
K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems,
volume 27, Curran Associates, Inc., 2014. URL: https://proceedings.neurips.cc/paper/2014/file/
6c4b761a28b734fe93831e3fb400ce87-Paper.pdf.
[12] W. Li, L. Niu, D. Xu, Exploiting privileged information from web data for image
categorization, in: Computer Vision – ECCV 2014, Springer International Publishing, 2014, pp.
437–452. doi:10.1007/978-3-319-10602-1_29.
[13] V. Sharmanska, N. Quadrianto, C. H. Lampert, Learning to rank using privileged
information, in: 2013 IEEE International Conference on Computer Vision, IEEE, 2013.
doi:10.1109/iccv.2013.107.
[14] M. Chevalier, N. Thome, G. Hénaf, M. Cord, Classifying low-resolution images by
integrating privileged information in deep CNNs, Pattern Recognition Letters 116 (2018)
29–35. doi:10.1016/j.patrec.2018.09.007.
[15] H. Yang, J. Zhou, J. Cai, Y. Ong, Miml-fcn+: Multi-instance multi-label learning via fully
convolutional networks with privileged information, 2017, pp. 5996–6004. doi:10.1109/
CVPR.2017.635.
[16] J. Lambert, O. Sener, S. Savarese, Deep learning under privileged information using
heteroscedastic dropout (2018). arXiv:1805.11614.
[17] S. Kullback, R. A. Leibler, On information and suficiency, The Annals of Mathematical</p>
      <p>Statistics 22 (1951) 79–86. doi:10.1214/aoms/1177729694.
[18] S. A. Nene, S. K. Nayar, H. Murase, Columbia university image library (coil-20) (1996).</p>
      <p>URL: http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php.
[19] J. L. Reyes-Ortiz, D. Anguita, A. Ghio, L. Oneto, X. Parra, A public domain dataset for
human activity recognition using smartphones, 21th European Symposium on Artificial
Neural Networks, Computational Intelligence and Machine Learning (2013).
[20] R. Cole, M. Fanty, Spoken letter recognition, n Advancesin Neural Information Processing</p>
      <p>Systems (1991) 220–226.
[21] Statlog landsat satellite data set: Machine learning repository, Univ. California, Irvine,</p>
      <p>Irvine, CA, USA (1993).
[22] J. Hull, A database for handwritten text recognition research, IEEE Transactions on Pattern</p>
      <p>Analysis and Machine Intelligence 16 (1994) 550–554. doi:10.1109/34.291440.
[23] L. Breiman, J. Friedman, R. Olshen, C. Stone, Classification and regression trees (1984)
49–55.
[24] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, 2017. arXiv:1412.6980.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Vapnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vashist</surname>
          </string-name>
          ,
          <article-title>A new learning paradigm: Learning using privileged information</article-title>
          ,
          <source>Neural Networks</source>
          <volume>22</volume>
          (
          <year>2009</year>
          )
          <fpage>544</fpage>
          -
          <lpage>557</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.neunet.
          <year>2009</year>
          .
          <volume>06</volume>
          .042.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>LeCun</surname>
          </string-name>
          , Y. Bengio,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>Deep learning</article-title>
          ,
          <source>Nat. 521</source>
          (
          <year>2015</year>
          )
          <fpage>436</fpage>
          -
          <lpage>444</lpage>
          . URL: https: //doi.org/10.1038/nature14539. doi:
          <volume>10</volume>
          .1038/nature14539.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Vapnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vashist</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Pavlovitch</surname>
          </string-name>
          ,
          <article-title>Learning using hidden information (learning with teacher)</article-title>
          ,
          <source>in: Proceedings of the 2009 International Joint Conference on Neural Networks, IJCNN'09</source>
          , IEEE Press,
          <year>2009</year>
          , p.
          <fpage>1252</fpage>
          -
          <lpage>1259</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>