<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>June</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>CNN-based System for Low Resolution Face Recognition (Discussion Paper)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fabio Valerio Massoli[</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Am</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>rizio F</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ISTI-CNR</institution>
          ,
          <addr-line>via G. Moruzzi 1, 56124 Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>1</volume>
      <fpage>6</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>Since the publication of the AlexNet in 2012, Deep Convolutional Neural Network models became the most promising and powerful technique for image representation. Speci cally, the ability of their inner layers to extract high level abstractions of the input images, called deep features vectors, has been employed. Such vectors live in a high dimensional space in which an inner product and thus a metric is de ned. The latter allows to carry out similarity measurements among them. This property is particularly useful in order to accomplish tasks such as Face Recognition. Indeed, in order to identify a person it is possible to compare deep features, used as face descriptors, from di erent identities by means of their similarities. Surveillance systems, among others, utilize this technique. To be precise, deep features extracted from probe images are matched against a database of descriptors from known identities. A critical point is that the database typically contains features extracted from high resolution images while the probes, taken by surveillance cameras, can be at a very low resolution. Therefore, it is mandatory to have a neural network which is able to extract deep features that are robust with respect to resolution variations. In this paper we discuss a CNN-based pipeline that we built for the task of Face Recognition among images with di erent resolution. The entire system relies on the ability of a CNN to extract deep features that can be used to perform a similarity search in order to ful ll the face recognition task.</p>
      </abstract>
      <kwd-group>
        <kwd>Convolutional Neural Networks Ensemble Methods</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Content based image retrieval (CBIR) is one of the most active research eld
in the computer vision community. In this context, commonly faced tasks are
instance-level retrieval and class retrieval [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. In the former, given a query
image, the goal is to retrieve images that contain the same object regardless of
image distortions such as di erent illumination, rotation or occlusion. Instead,
in the latter the purpose is to retrieve all the available images that belong to
the same class.
      </p>
      <p>
        Before the advent of the Convolutional Neural Networks (CNN), the
scale-invariant feature transform [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] (SIFT) based methods were among
the most frequently used in order to extract global descriptors from the given
images. A breakthrough occurred in 2012 when the AlexNet [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] was created
and won the ImageNet Large Scale Visual Recognition Competition (ILSRVC)
improving upon the state-of-the-art by a noticeable margin. Since then,
CNN-based methods for image retrieval received considerably more attention
from the scienti c community [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>Under the hood, these methods rely on the ability of deep models to extract the
so called deep features from given input images. From a theoretical perspective,
the inner layers of a CNN realize an abstraction of the input that describes
speci c concepts contained inside the data. Moreover, due to the typical structure
of deep models architecture, inner layers combine the information available from
previous layers thus achieving a higher level of abstraction that summarizes the
overall content of the input data. Based on this observation, deep features are
usually adopted as global descriptors for input images. Thus, the deeper the
layer from which we extract deep features is, the more descriptive of the input
they are. It is common practice to extract them from the penultimate layer of a
CNN.</p>
      <p>As previously said, deep features are high dimensional vectors de ned in a space
on which it is de ned an inner product and thus a metric. This property is
fundamental since it allows to evaluate similarities among descriptors, extracted
from di erent images, that can be used as indicators of the similarities of the
content of the original data. An example of this principle is sketched in Figure 1.</p>
      <p>This concept is typically applied in the context of surveillance systems.
Indeed, in a scenario where an input face image is acquired by a camera, its
global descriptor is extracted and used in order to perform a similarity search
on a database (db) containing a gallery of features vectors that belong to known
identities. For example, the research in the db can be accomplished by
evaluating the cosine similarity, or the Euclidean distance, among the probe and gallery
vectors.</p>
      <p>The similarity search becomes even more challenging when gallery and probe
come from di erent resolution domains.</p>
      <p>
        This is the background from which we started the study we present in this
paper. Our nal goal is to conceive a pipeline for face recognition based on neural
networks. In order to extract deep features we used a pre-trained ResNet-50 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
architecture, with Squeeze-and-Excitation blocks [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The extracted descriptors
are then used to perform a similarity measurements.
      </p>
      <p>
        The performance of the deep model used for features extraction has been
evaluated on the 1:1 veri cation protocol on the IARPA Janus Benchmark-B (IJB-B)
dataset [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
The remaining part of the paper is organized as follows. In Section 2 we brie y
review some related works. In Section 3 we describe in detail the pipeline that we
implemented. In Section 4 we present the experimental results. In Section 5 we
conclude the paper with a summary of the main results and future perspectives.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Works</title>
      <p>
        Deep learning techniques are currently experiencing a huge expansion in their
eld of application mainly as a result of the extremely high computational
power reached by the modern GPUs. Moreover, the existence of big
datasets [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] has made it possible to train neural networks and
to let them nearly reach human levels of performance when tested against tasks
such as image classi cation [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], object detection [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and face recognition [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
Due to its wide range of applications, the task of Face Recognition (FR) is
among the hottest topics in the computer vision community. In particular,
FR plays a key role in the context of smart surveillance systems [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. In
such systems, the case is usually that a low resolution face image, taken by
a surveillance camera, has to be matched against a database containing deep
features extracted from high resolution images.
      </p>
      <p>To this end, several techniques have been developed in order to train deep
models to deal with low resolution images. Some examples are Super Resolution
and Common Space Projection techniques.</p>
      <p>
        Super Resolution is a technique based on the ability of a neural network to
synthesize a high resolution image starting from a low resolution one. The
recognition task is later ful lled in the high resolution domain [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
One of the weaknesses of the super resolution technique is that the identity
information can be lost. In [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] they developed a neural network that, together
with the super resolution task, tries to recover the initial low resolution image
identity in the high resolution one.
      </p>
      <p>
        Instead, Common Space Projection techniques concern the ability of a neural
networks to minimize the distance among deep features, in a common space,
extracted from a low resolution image and its high resolution counterpart.
For example, in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] they train a two-branch CNN to learn a mapping from
high/low resolution domain to a common space. Speci cally, given a low and
high resolution image, the model extracts features vectors of size 2048 and their
distance is evaluated and used as loss in order to lead the training towards the
desired direction.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Pipeline</title>
      <p>In this section we brie y describe the main modules of the pipeline we developed.
A schematic view is shown in Figure 2.</p>
      <p>Face Detector</p>
      <p>CNN for features extraction</p>
      <p>Probe
Gallery</p>
      <p>
        There are three main components at the heart of the system: a face detector,
a features extractor and a classi er. We will be focusing on the rst two.
The face detection task is accomplished by means of a multi-stage architecture
that, given an input image, delivers the coordinates of the bounding boxes that
are centred around each face visible in the image. Speci cally, we used the
Multi-task Cascaded Convolutional Neural Networks (MTCNN) [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. This step
is performed once for each input frame. After all the faces have been identi ed
from the picture, they are cropped, preprocessed and then used as input for
the features extractor. The preprocess step includes the rescaling of the image
with the shortest side resized at 256 pixels, the cropping of the squared central
region with side of 224 pixels and the normalization of the image
The features extractor module is made of a ResNet-50 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] architecture, equipped
with Squeeze-and-Excitation [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] blocks, that has been pretrained on the
VGGFace2 dataset [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>Features vectors are extracted before the classi cation layer, they have
dimensionality equals to 2048 and they have been L2-normalized before to evaluate
any metric on them.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experimental Tests</title>
      <p>
        In order to test the performance of the features extractor we used the IJB-B
dataset [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. In particular, we tested the model against the 1:1 veri cation
protocol aiming to estimate its ability to extract discriminative features. Due to
the low resolution requirements for a surveillance system, we conducted the
performance evaluation using the dataset with di erent resolutions. In Figure 3, we
show an example of the various resolution versions of the test images. The rst
column contains the full resolution test images while from the second to the last
column down sampled images in the [
        <xref ref-type="bibr" rid="ref8">8, 256</xref>
        ] pixels range are shown.
      </p>
      <p>The resulting Receiver Operating Characteristic (ROC) for the 1:1 veri
cation task is shown in Figure 4.</p>
      <p>In order to evaluate the similarity among the di erent features vectors we
measured the cosine similarity among them. As it is clear from Figure 4 the
performance of the features extractor are degraded for lower resolution, especially
below 32 pixels. Moreover in Table 1 we reported the True Acceptance Rate
(TAR) at a reference value for the False Acceptance Rate (FAR) of 1.e 3.</p>
      <p>ROC curve</p>
      <p>Up until now, we have considered the case in which the images had the
same resolution. In a real scenario it usually happens that probe and gallery
have di erent resolutions. For example, in the case of surveillance systems it
is common that the gallery database is populated with high resolution images
descriptors while the probe has a lower resolution.</p>
      <p>In Table 2 we reported the value for the TAR @ FAR equals to 1.e 3 for the
cross resolution 1:1 veri cation task.</p>
      <p>It is clear from Table 2 that the cross resolution face recognition task is very
challenging for a deep neural network model, especially when the images have
very low resolutions.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Future Experiments</title>
      <p>Surveillance systems require high performance from CNN-based systems on the
face recognition task. Precisely, the deep models have to be robust with respect
to variations of the input image resolution since it is usually the case that low
resolution images, from surveillance cameras, have to be matched against a
database containing deep features extracted from high resolution images. In fact, the
descriptor extracted for each human face have to be robust with respect to
resolution variations otherwise any kind of similarity search across the identities
database will fail. We have seen that, even though there are models that
perform well on the FR task, their performances suddenly drop when we used low
resolution images.</p>
      <p>Although we have shown the feasibility of a pipeline for FR based on deep
models, we need to improve upon its performance especially in the case of mixed
resolutions. Thus, we are planning a new training campaign focused on the low
resolution domain below 32 pixels. Finally, we will pay particular attention to
the case of FR in which probe and gallery have di erent resolutions. What we
expect from such a campaign is that, even if we might obtain a small drop in the
performance at high resolution, the improvement at low and mixed resolutions
should outweigh that drop.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Babenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Slesarev</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chigorin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lempitsky</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Neural codes for image retrieval</article-title>
          .
          <source>In: European conference on computer vision</source>
          . pp.
          <volume>584</volume>
          {
          <fpage>599</fpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bansal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nanduri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Castillo</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ranjan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chellappa</surname>
          </string-name>
          , R.:
          <article-title>Umdfaces: An annotated face dataset for training deep networks</article-title>
          .
          <source>In: 2017 IEEE International Joint Conference on Biometrics (IJCB)</source>
          . pp.
          <volume>464</volume>
          {
          <fpage>473</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parkhi</surname>
            ,
            <given-names>O.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Vggface2: A dataset for recognising faces across pose and age</article-title>
          .
          <source>In: 2018 13th IEEE International Conference on Automatic Face &amp; Gesture Recognition (FG</source>
          <year>2018</year>
          ). pp.
          <volume>67</volume>
          {
          <fpage>74</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Donahue</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darrell</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malik</surname>
          </string-name>
          , J.:
          <article-title>Region-based convolutional networks for accurate object detection and segmentation</article-title>
          .
          <source>IEEE transactions on pattern analysis and machine intelligence</source>
          <volume>38</volume>
          (
          <issue>1</issue>
          ),
          <volume>142</volume>
          {
          <fpage>158</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Grm</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pernus</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cluzel</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scheirer</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dobrisek</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Struc</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Face hallucination revisited: An exploratory study on dataset bias</article-title>
          . arXiv preprint arXiv:
          <year>1812</year>
          .
          <volume>09010</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
          </string-name>
          , J.:
          <article-title>Ms-celeb-1m: A dataset and benchmark for large-scale face recognition</article-title>
          .
          <source>In: European Conference on Computer Vision</source>
          . pp.
          <volume>87</volume>
          {
          <fpage>102</fpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
          </string-name>
          , J.:
          <article-title>Deep residual learning for image recognition</article-title>
          .
          <source>In: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          . pp.
          <volume>770</volume>
          {
          <issue>778</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
          </string-name>
          , G.:
          <article-title>Squeeze-and-excitation networks</article-title>
          .
          <source>arxiv</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Krizhevsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.E.:
          <article-title>Imagenet classi cation with deep convolutional neural networks</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>1097</volume>
          {
          <issue>1105</issue>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lavi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Serj</surname>
            ,
            <given-names>M.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ullah</surname>
            ,
            <given-names>I.:</given-names>
          </string-name>
          <article-title>Survey on deep learning techniques for person reidenti cation task</article-title>
          . arXiv preprint arXiv:
          <year>1807</year>
          .
          <volume>05284</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Learned-Miller</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>G.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>RoyChowdhury</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hua</surname>
          </string-name>
          , G.:
          <article-title>Labeled faces in the wild: A survey</article-title>
          . In:
          <article-title>Advances in face detection and facial image analysis</article-title>
          , pp.
          <volume>189</volume>
          {
          <fpage>248</fpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>D.G.</given-names>
          </string-name>
          :
          <article-title>Distinctive image features from scale-invariant keypoints</article-title>
          .
          <source>International journal of computer vision 60(2)</source>
          ,
          <volume>91</volume>
          {
          <fpage>110</fpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Nikouei</surname>
            ,
            <given-names>S.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Choi</surname>
            ,
            <given-names>B.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Faughnan</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Smart surveillance as an edge network service: From harr-cascade, svm to a lightweight cnn</article-title>
          .
          <source>In: 2018 IEEE 4th International Conference on Collaboration and Internet Computing (CIC)</source>
          . pp.
          <volume>256</volume>
          {
          <fpage>265</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Parkhi</surname>
            ,
            <given-names>O.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vedaldi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , et al.:
          <article-title>Deep face recognition</article-title>
          .
          <source>In: bmvc</source>
          . vol.
          <volume>1</volume>
          , p.
          <volume>6</volume>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Tolias</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sicre</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jegou</surname>
          </string-name>
          , H.:
          <article-title>Particular object retrieval with integral maxpooling of cnn activations</article-title>
          .
          <source>arXiv preprint arXiv:1511.05879</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Torresani</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szummer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fitzgibbon</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>E cient object category recognition using classemes</article-title>
          .
          <source>In: European conference on computer vision</source>
          . pp.
          <volume>776</volume>
          {
          <fpage>789</fpage>
          . Springer (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Whitelam</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taborsky</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blanton</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maze</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adams</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalka</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duncan</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Allen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , et al.:
          <article-title>Iarpa janus benchmark-b face dataset</article-title>
          .
          <source>In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops</source>
          . pp.
          <volume>90</volume>
          {
          <issue>98</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Wolf</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassner</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maoz</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Face recognition in unconstrained videos with matched background similarity</article-title>
          .
          <source>IEEE</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Zangeneh</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahmati</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohsenzadeh</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Low resolution face recognition using a two-branch deep convolutional neural network architecture</article-title>
          .
          <source>arXiv preprint arXiv:1706.06247</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , Z., Cheng, C.W.,
          <string-name>
            <surname>Hsu</surname>
            ,
            <given-names>W.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qiao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , Zhang, T.:
          <article-title>Super-identity convolutional neural network for face hallucination</article-title>
          .
          <source>In: Proceedings of the European Conference on Computer Vision (ECCV)</source>
          . pp.
          <volume>183</volume>
          {
          <issue>198</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qiao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Joint face detection and alignment using multitask cascaded convolutional networks</article-title>
          .
          <source>IEEE Signal Processing Letters</source>
          <volume>23</volume>
          (
          <issue>10</issue>
          ),
          <volume>1499</volume>
          {
          <fpage>1503</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>