<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Neural Network Model for Face Recognition from Dynamic Vision Sensor?</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Lomonosov Moscow State University</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2035</year>
      </pub-date>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>In this work, we consider the applicability of the face recognition algorithms to the data obtained from a dynamic vision sensor. A basic method using a neural network model comprised of reconstruction, detection, and recognition is proposed that solves this problem. Various modifications of this algorithm and their influence on the quality of the model are considered. A small test dataset recorded on a DVS sensor is collected. The relevance of using simulated data and different approaches for its creation for training a model was investigated. The portability of the algorithm trained on synthetic data to the data obtained from the sensor with the help of fine-tuning was considered. All mentioned variations are compared to one another and also compared with conventional face recognition from RGB images on different datasets. The results showed that it is possible to use DVS data to perform face recognition with quality similar to that of RGB data.</p>
      </abstract>
      <kwd-group>
        <kwd>DVS</kwd>
        <kwd>Face Recognition</kwd>
        <kwd>Data Simulation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In recent years, a new type of camera is gaining popularity, Dynamic Vision Sensor
(DVS). While in traditional cameras, information is recorded with a certain fixed
frequency (usually 25–30 times per second), the dynamic vision sensor records only the
fact of a change in the level of illumination in a pixel if it exceeds a certain threshold.
Thus, these cameras operate according to the principle of the human eye, which
responds only to changes. This approach allows to get rid of a large amount of redundant
static data, focusing only on dynamic events. Such sensors have several advantages:
high speed (which allows to catch very fast events in small details), low power and
memory consumption (an important feature for embedded systems, where there is no
way to place a large battery and hard drive), high sensitivity (a key property for
recording under extreme light conditions).
? Publication is supported by RFBR grant 18-08-01484.</p>
      <p>
        Such cameras are quite expensive and do not have high resolution, however, due to
the rapid development of technologies in this area, it is necessary to develop algorithms
for solving various applied problems, such as 3D reconstruction, detection and tracking
of objects [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. One of these tasks is person identification, since these cameras are
currently often used in video surveillance systems. There are various ways to recognize a
person by frame, for example, by the walk [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or by face. If the face is
distinguishable in the frame and has a sufficient size, it makes sense to recognize person by
it.
      </p>
      <p>Nowadays, systems that perform face recognition are very relevant, since they
implement the most effective way of contact-less identification of a person. They are used
in security systems, bank card verification, people mark-up, forensics, online payments
etc. Face recognition problem can be decomposed into several sub-tasks: finding the
face in the image, normalizing the found face and, finally, identifying the person. In this
paper we propose a new method for face recognition using data from DVS sensor.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <sec id="sec-2-1">
        <title>Detection</title>
        <p>
          The problem of detecting faces is one of the special cases of the detection problems, but
it has its own specificities. The human face has distinctive features, which were searched
and analyzed in first approaches in this area. A big breakthrough was [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], which used
haar filters to find faces using cascades of detectors. However, such algorithms did not
provide stability, since faces had great variability due to different lighting and viewing
angles.
        </p>
        <p>
          Then the partially-deformed models [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] were proposed, which aimed at solving
detection problem. However, these methods were computationally costly and required
complex markup for training.
        </p>
        <p>
          Similarly to most of computer vision tasks, the detection problem can be solved by
the deep learning methods, the popularity of which has grown significantly after the
work [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. These methods have also been successfully applied to the task of detecting
faces, for example, in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Being also based on this approach, the work MTCNN [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]
uses three light neural networks to find faces in the image.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Recognition</title>
        <p>
          The face recognition problem has been of interest to the scientific world for a long time.
The first systems for solving this problem were developed back in 1964 [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Since then,
the level of quality of this technology has greatly increased, and modern algorithms are
able to distinguish people’s faces better than the people themselves [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>
          Various methods were used to solve the face recognition problem, and these
methods have changed greatly over time. The first algorithms attempted to distinguish
between faces by finding distinctive features such as eye color, face proportions, etc. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
The work [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] has made a great contribution to the development of methods by using
the similarity of eigenvectors for faces (eigenface). However, in general, the majority
        </p>
        <p>
          Neural Network Model for Face Recognition from Dynamic Vision Sensor 3
of modern methods [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] try to recognize faces by creating embeddings for them. These
methods have become especially popular after the widespread use of convolutional
neural networks [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. The same approach is used in the popular work [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Reconstruction</title>
        <p>
          One of the key aspects of this work is an algorithm for reconstructing frames from the
stream of events of a dynamic vision sensor. The algorithm was proposed by [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. It
also apply artificial neural networks. In this case, recurrent neural networks are used,
the main feature of which is the ability to memorize the state obtained by processing
the next element of the sequence and use it for further calculations. In this algorithm,
the neural network receives at the input of stream of events from the dynamic vision
sensor for a certain period of time, and the model reconstructs an image that visually
looks like grayscale image.
3
3.1
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Proposed method</title>
      <sec id="sec-3-1">
        <title>Formal problem</title>
        <p>The face recognition problem can be considered in two equivalent forms: identification
and verification. In this paper verification form was chosen.</p>
        <p>Data from the dynamic vision sensor comes in the form of set of events.</p>
        <p>Event — (x; y; ts; p), where x; y 2 Z; x 2 [0; N ]; y 2 [0; M ] are the coordinates
of the pixel in the matrix N M , ts 2 R — timestamp, p 2 f 1; 1g — the polarity of
the change (the brightness in the pixel decreased/increased by a given threshold).</p>
        <p>Algorithm Input: sets of events T1 and T2 received from the dynamic vision sensor.</p>
        <p>Algorithm Output: A 2 f0; 1g: A = 1 if the sets T1 and T2 describe one person,
A = 0 if different.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Our method</title>
        <p>
          Since the main source of information in computer vision tasks is usually an image or a
sequence of images, but not the event stream obtained by DVS, it is necessary to convert
the stream of events into visual representation. Such a visualization can be made in
different ways. The most simple one is setting time marks and counting events occurred
between and visualizing it in gray-scale. However, it turns out, that this approach does
not provide with satisfying quality and detectors could not find faces on such images.
Thus, the reconstructions with neural network [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] were used for visualizations which
yields much better results (see Fig. 1).
        </p>
        <p>
          The proposed basic method works as follows (see Fig. 2): first, the stream of events
from the sensor is reconstructed into frames with model [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], then the faces are located
in frames using the detector [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], then the neural network calculates the internal
representations for them in the form of vectors f1; f2 2 Rn [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], then the proximity of these
vectors is determined using the cosine distance (1). If the proximity is higher than the
specified threshold, then 1 is predicted, otherwise – 0.
kf1kkf2k
        </p>
        <p>Pn
= pPn i=1 f1if2ii=1 (f2i)2
i=1 (f1i)2pPn
(1)
The dynamic vision sensor is a fairly new type of cameras, and very few datasets have
been recorded for it so far. As far as we know there are no publicly available datasets
for a face recognition task. Therefore, it was proposed to use the collections of color
videos collected for the face recognition task and simulate dynamic vision sensor data
from them. It can be done in two steps: firstly, the intermediate values in each pixel are
interpolated between two neighboring frames and secondly, at each point the change in</p>
        <p>Neural Network Model for Face Recognition from Dynamic Vision Sensor 5
intensity between adjacent interpolated frames is compared, and if this change exceeds
the threshold, an event is generated. As you can see in fig. 3, the results of real and
simulated event streams are very similar. This gives us the opportunity to assume that
studies conducted on simulated data will be fairly well transferred to real data.</p>
        <p>
          Since the linear interpolation of intermediate frames is not very fair leading to
blurred frames, it was proposed to improve the simulation method by using better
approximation. To do this, it was decided to use the results of [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], which creates a slow
motion effect, and incorporate them into the reconstruction process. This approach uses
the creation of intermediate frames to simulate dynamic vision sensor data from color
video sequences, thereby smoothing visualizations. A variation with the creation of one
intermediate frame was used. In fig. 4 the visual difference in the images presented.
5
5.1
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental Evaluation</title>
      <sec id="sec-4-1">
        <title>Datasets</title>
        <p>
          YouTubeFaces. The main dataset that met the criteria for simulation was
YouTubeFaces Dataset [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], which consists of videos collected from the YouTube, each of which
contains a specific person. Its main advantage is a large number of people. The
collection consists of 3425 video sequences containing information about 1595 people. Due to
the large number of subjects in this dataset, it was possible to apply the neural network
fine-tuning. Two-thirds of the collection was held-out for the training set, and one-third
for testing, where all videos with a specific person were fully included in either the first
or second group. Fine-tuning on the training set was performed where original network
was trained on VGGFace2 [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] collection.
        </p>
        <p>
          ChokePoint. In addition, it is proposed to use dataset obtained under conditions
similar to real scenarios of using a dynamic vision sensor as a test set. For this, the
ChokePoint [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] dataset was selected. In this dataset, 48 video sequences were recorded
containing 40 people passing through the entrance to the room. Along with the video,
frame-by-frame mark-up of a person in the frame is provided. The viewing angles of the
cameras used to record the dataset are similar to the same angles in video surveillance
systems, which reflects the possible location of the dynamic vision sensor designed to
solve this problem.
        </p>
        <p>GML DVS. In order to check the portability of the created model for real data obtained
from a dynamic vision sensor, a small dataset of eight video sequences containing eight
people was captured. 80 faces were automatically found by face detector and manually
labelled.
5.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Results</title>
        <p>To evaluate the quality of the proposed method, we make a verification experiment
selecting the pairs of objects and comparing their similarity with some threshold to decide
if they belong to the same person or not. Setting different thresholds to distinguish faces
we can obtain AUC metric which is the area below ROC curve and that can be a great
indicator of general performance of the model. The method is tested against
verification on RGB images when possible. Variations with advanced reconstruction and uses
of fine-tuning on those reconstructions are examined. The results are presented in Table
1 and 2.</p>
        <p>We can see that recognition results on DVS reconstructions are quite similar to
those on RGB images and that fine-tuning enables us to improve quality of a model.
Furthermore, this fine-tuning allows to enhance performance on data obtain from the
real DVS sensor which was also quite good comparing to simulated data proving the
portability of model.</p>
        <p>Neural Network Model for Face Recognition from Dynamic Vision Sensor 7
This paper explores the possibility of constructing a solution to the face recognition
problem based on dynamic vision sensor data. It implements the basic solution method
using a neural network model. The results show that we can apply existing methods to
solve this task at a level similar to that of the RGB images. Uses of simulated frames
provided a great way to improve performance of the model which is very helpful due to
the scarce amount of real data.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>C.</given-names>
            <surname>Posch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Serrano-Gotarredona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Linares-Barranco</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Delbruck</surname>
          </string-name>
          .
          <article-title>Retinomorphic event-based vision sensors: Bioinspired cameras with spiking output</article-title>
          .
          <source>Proceedings of the IEEE</source>
          ,
          <volume>102</volume>
          (
          <issue>10</issue>
          ):
          <fpage>1470</fpage>
          -
          <lpage>1484</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Anna</given-names>
            <surname>Sokolova</surname>
          </string-name>
          and
          <string-name>
            <given-names>Anton</given-names>
            <surname>Konushin</surname>
          </string-name>
          .
          <article-title>Human identification by gait from event-based camera</article-title>
          .
          <source>In 2019 16th International Conference on Machine Vision Applications</source>
          (MVA),
          <source>IEEE Xplore Digital Library</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . IEEE,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Anna</given-names>
            <surname>Sokolova</surname>
          </string-name>
          and
          <string-name>
            <given-names>Anton</given-names>
            <surname>Konushin</surname>
          </string-name>
          .
          <article-title>Pose-based deep gait recognition</article-title>
          .
          <source>IET Biometrics</source>
          ,
          <volume>8</volume>
          (
          <issue>2</issue>
          ):
          <fpage>134</fpage>
          -
          <lpage>143</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Yanxiang</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Bowen Du, Yiran Shen, Kai Wu,
          <string-name>
            <given-names>Guangrong</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Jianguo</given-names>
            <surname>Sun</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Hongkai</given-names>
            <surname>Wen</surname>
          </string-name>
          .
          <article-title>Ev-gait: Event-based robust gait recognition using dynamic vision sensors</article-title>
          .
          <source>In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>June 2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>P.</given-names>
            <surname>Viola</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Jones</surname>
          </string-name>
          .
          <article-title>Rapid object detection using a boosted cascade of simple features</article-title>
          .
          <source>In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR</source>
          <year>2001</year>
          , volume
          <volume>1</volume>
          ,
          <string-name>
            <surname>pages</surname>
            <given-names>I-I</given-names>
          </string-name>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Felzenszwalb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. B.</given-names>
            <surname>Girshick</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>McAllester</surname>
          </string-name>
          .
          <article-title>Cascade object detection with deformable part models</article-title>
          .
          <source>In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition</source>
          , pages
          <fpage>2241</fpage>
          -
          <lpage>2248</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Alex</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          , Ilya Sutskever, and
          <string-name>
            <given-names>Geoffrey E</given-names>
            <surname>Hinton</surname>
          </string-name>
          .
          <article-title>Imagenet classification with deep convolutional neural networks</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>1097</fpage>
          -
          <lpage>1105</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Haoxiang</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Zhe</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Xiaohui</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Jonathan</given-names>
            <surname>Brandt</surname>
          </string-name>
          .
          <article-title>A convolutional neural network cascade for face detection</article-title>
          . pages
          <fpage>5325</fpage>
          -
          <lpage>5334</lpage>
          ,
          <year>06 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Kaipeng</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Zhanpeng Zhang,
          <string-name>
            <given-names>Zhifeng</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Yu</given-names>
            <surname>Qiao</surname>
          </string-name>
          .
          <article-title>Joint face detection and alignment using multitask cascaded convolutional networks</article-title>
          .
          <source>IEEE Signal Processing Letters</source>
          ,
          <volume>23</volume>
          (
          <issue>10</issue>
          ):
          <fpage>1499</fpage>
          -
          <lpage>1503</lpage>
          ,
          <year>Oct 2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. Jan Bergstra Karl de Leeuw.
          <article-title>The history of information security: A comprehensive handbook</article-title>
          .
          <source>page 264-265</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. P. Jonathon Phillips and
          <string-name>
            <surname>Alice J. O'Toole</surname>
          </string-name>
          .
          <article-title>Comparison of human and computer performance across face recognition experiments</article-title>
          .
          <source>Image and Vision Computing</source>
          ,
          <volume>32</volume>
          (
          <issue>1</issue>
          ):
          <fpage>74</fpage>
          -
          <lpage>85</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Brunelli</surname>
          </string-name>
          and
          <string-name>
            <given-names>Tomaso</given-names>
            <surname>Poggio</surname>
          </string-name>
          .
          <article-title>Face recognition: Features versus templates</article-title>
          .
          <source>IEEE transactions on pattern analysis and machine intelligence</source>
          ,
          <volume>15</volume>
          (
          <issue>10</issue>
          ):
          <fpage>1042</fpage>
          -
          <lpage>1052</lpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>Matthew</given-names>
            <surname>Turk</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alex</given-names>
            <surname>Pentland</surname>
          </string-name>
          .
          <article-title>Face recognition using eigenfaces</article-title>
          .
          <source>In Proceedings. 1991 IEEE computer society conference on computer vision and pattern recognition</source>
          , pages
          <fpage>586</fpage>
          -
          <lpage>587</lpage>
          ,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Yaniv</surname>
            <given-names>Taigman</given-names>
          </string-name>
          , Ming Yang,
          <string-name>
            <surname>Marc'Aurelio Ranzato</surname>
            , and
            <given-names>Lior</given-names>
          </string-name>
          <string-name>
            <surname>Wolf</surname>
          </string-name>
          .
          <article-title>Deepface: Closing the gap to human-level performance in face verification</article-title>
          .
          <source>In Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          , pages
          <fpage>1701</fpage>
          -
          <lpage>1708</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Yi</surname>
            <given-names>Sun</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Xiaogang</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Xiaoou</given-names>
            <surname>Tang</surname>
          </string-name>
          .
          <article-title>Deeply learned face representations are sparse, selective, and robust</article-title>
          .
          <source>In Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          , pages
          <fpage>2892</fpage>
          -
          <lpage>2900</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Florian</surname>
            <given-names>Schroff</given-names>
          </string-name>
          , Dmitry Kalenichenko, and
          <string-name>
            <given-names>James</given-names>
            <surname>Philbin</surname>
          </string-name>
          .
          <article-title>Facenet: A unified embedding for face recognition and clustering</article-title>
          .
          <source>2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>Jun 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Henri</surname>
            <given-names>Rebecq</given-names>
          </string-name>
          , Rene Ranftl, Vladlen Koltun, and
          <string-name>
            <given-names>Davide</given-names>
            <surname>Scaramuzza</surname>
          </string-name>
          .
          <article-title>High speed and high dynamic range video with an event camera</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence, page 1-1</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Huaizu</surname>
            <given-names>Jiang</given-names>
          </string-name>
          , Deqing Sun, Varan Jampani,
          <string-name>
            <surname>Ming-Hsuan</surname>
            <given-names>Yang</given-names>
          </string-name>
          , Erik
          <string-name>
            <surname>Learned-Miller</surname>
            ,
            <given-names>and Jan</given-names>
          </string-name>
          <string-name>
            <surname>Kautz</surname>
          </string-name>
          .
          <article-title>Super slomo: High quality estimation of multiple intermediate frames for video interpolation</article-title>
          .
          <source>2018 IEEE/CVF Conference on Computer Vision</source>
          and Pattern Recognition,
          <year>Jun 2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19. L.
          <string-name>
            <surname>Wolf</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Hassner</surname>
            ,
            <given-names>and I.</given-names>
          </string-name>
          <string-name>
            <surname>Maoz</surname>
          </string-name>
          .
          <article-title>Face recognition in unconstrained videos with matched background similarity</article-title>
          .
          <source>In CVPR 2011</source>
          , pages
          <fpage>529</fpage>
          -
          <lpage>534</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>Q.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. M.</given-names>
            <surname>Parkhi</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A. Zisserman.</surname>
          </string-name>
          <article-title>Vggface2: A dataset for recognising faces across pose and age</article-title>
          .
          <source>In International Conference on Automatic Face and Gesture Recognition</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Yongkang</surname>
            <given-names>Wong</given-names>
          </string-name>
          , Shaokang Chen, Sandra Mau, Conrad Sanderson, and
          <string-name>
            <surname>Brian</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Lovell</surname>
          </string-name>
          .
          <article-title>Patch-based probabilistic image quality assessment for face selection and improved videobased face recognition</article-title>
          .
          <source>In IEEE Biometrics Workshop</source>
          , Computer Vision and
          <article-title>Pattern Recognition (CVPR) Workshops</article-title>
          , pages
          <fpage>81</fpage>
          -
          <lpage>88</lpage>
          . IEEE,
          <year>June 2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>