<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Detecting Deepfakes with Multi-Metric Loss</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ziwei Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xin Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rongrong Ni</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yao Zhao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Beijing Jiaotong University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In recent years, DeepFake techniques have advanced to generate so realistic forged content that it could jeopardize personal privacy and national security. We observe the distribution discrepancy between genuine faces and tampered faces manipulated by DeepFake techniques. It can be described that embedding vectors of genuine faces are tightly distributed in the embedding space, while tampered faces are comparatively scattered. We, therefore, propose a novel DeepFake detection method based on Multi-metric Loss. Specifically, real and fake faces are mapped onto the embedding space, which is of intra-class compactness and inter-class separation. Then by adding Weight-Center Loss to project genuine faces onto a more compact region in the embedding space, the distance between the two types of sample clusters is further expanded, thereby improving the separability of genuine and tampered samples. Moreover, the Adaptive Hardness-aware Expander is designed to further improve feature description ability of the model because the metric is always challenged with proper dificulty. Extensive experiments show that our approach can achieve state-of-the-art performance on present datasets.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Deepfakes</kwd>
        <kwd>Multi-metric Loss</kwd>
        <kwd>Adaptive Hardness-aware Expander</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Of various digital media, videos containing digital hu</title>
        <p>man faces, especially the ones involving personal
identiifcation information, are most vulnerable to be attacked.</p>
        <p>These assaults are collectively referred to as DeepFake
manipulations. Therefore, to develop efective methods
capable of detecting DeepFake videos carries substant
weight. Since the existing manipulations tamper with
specific areas frame by frame, the artifacts and noises (a) DFDC (b) Celeb-DF
appear in the spurious videos. So previous researchers
have proposed many handcrafted methods [1, 2, 3, 4] and Figure 1: DFDC and Celeb-DF dataset distribution
visualizadata-driven methods [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16] tion by t-SNE. The projections of real face features are tightly
to find manipulation traces. distributed, while the fakes are comparatively scattered.</p>
        <p>Due to uncertain counterfeit methods and
manipulation quality in DeepFake videos, the spurious data is ent levels and face sample cluster with diverse labels
scattered in the whole feature space. Relatively, gen- (real/fake). Under the restriction of Triplet Loss and
uine human faces concentrate close to a non-linear low- Cross-Entropy Loss, the real faces and fake faces are
dimensional manifold [17] in the feature space. As shown mapped onto the embedding space, which is intra-class
in Figure 1, the vectors of real faces are tightly distributed, compactness and inter-class separation. Then through
while the fakes are comparatively scattered. Therefore, adding Weight-Center Loss, the real faces are projected
we consider that this distribution discrepancy also exists to a more compact region. The method of excavating
in the embedding space obtained by the feature space fundamental distinction between the two types of
sammapping. The existing detection schemes, however, do ples is, therefore, to extend the distance between the
not consider the distribution discrepancy between the two types of sample clusters in the embedding space,
two types of samples. thereby improving the separability of genuine and
spu</p>
        <p>To this end, we propose the DeepFake detection frame- rious videos. In the end-stage of training, in order to
work with Multi-metric Loss, as shown in Figure 2. further improve the feature description ability of the
Triplet Loss, Cross-Entropy Loss and Weight-Center Loss model, we designed the Adaptive Hardness-aware
Extogether constitute Multi-metric Loss acting on difer- pander (AHE). The rigorous experiments on
FaceForensics++ [6], DFDC [18] and Celeb-DF [19] datasets show
that the proposed method based on Multi-metric Loss
is highly efective and achieves state-of-the-art
performance.</p>
        <p>International Workshop on Safety &amp; Security of Deep Learning,
August 19th -26th, 2021, Montreal-themed Virtual Reality
" rrni@bjtu.edu.cn (R. Ni); yzhao@bjtu.edu.cn (Y. Zhao)</p>
        <p>© 2021 Copyright for this paper by its authors. Use permitted under Creative
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g CCoEmmUoRns LWiceonsrekAstthribouptionP4r.0oIncteerenadtiionnagl s(CC(CBYE4U.0)R.-WS.org)</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>though they also mapped data onto the embedding space
With huge risks posed by face forgery technology, there based on Deep Metric Learning, they just followed the
trais currently an urge to investigate DeepFake detection ditional metric strategy and imposed the same constraint
methods. Existing detection techniques mainly fall into on two types of samples. In our work, considering the
two categories: handcrafted and data-driven methods. distribution discrepancy of real and fake data, diferent
Handcrafted Methods. For the limited face manipu- levels of classification constraints are imposed on these
lation techniques at that time, early works achieved the two kinds of sample clusters. Specifically, we design the
DeepFake detection through handcraft features. This Multi-metric Loss to further widen the distance between
methods mainly include eye blinking [1], incomplete de- the real cluster and the fakes by capturing fundamental
tails in the eye and teeth [2], face warping [3] and head distinction between spurious videos and genuine videos,
poses [4]. With the development of generative adver- and the Adaptive Hardness-aware Expander to further
sarial network (GAN) [20],a variety of tampering tech- improve the feature description ability of the model.
nologies have emerged and forgery faces have become 3. Proposed Approach
more realistic. Therefore, the efectiveness of formerly
handcrafted methods has gradually been weakened. In this section, we give an overview of our framework.
Data-driven Methods. Given the powerful feature As aforementioned, the embedding vectors of real faces
representation capabilities of deep neural network, the are aggregative in the embedding space, while the fakes
data-driven methods have received widespread attention. are relatively scattered. Motivated by this observation,
Firstly, some classification networks were applied to de- two key components are integrated into the framework:
tect fake faces like MesoNet [5], XceptionNet [6], Capsule 1) Multi-metric Loss is designed to mine fundamental
network [7], R3D and C3D [8] etc. Then Zhou et al. [9] distinction between real and fake faces so as to improve
proposed to use the Two-stream neural network to cap- separability; 2) Adaptive Hardness-aware Expander can
ture tampering artifacts and local noise residuals. The be used to further improve the feature description ability
adaptive face weighting layer [10] was designed with of the model. The framework is depicted in Figure 2.
it focus forgery details. The model [11] was trained to 3.1. Multi-metric Loss
mark the blending boundary for forged images.
Considering inconsistent warping left by manipulation in the Let  denote the data space where we sample a set of
inter-frame, the methods [12, 13, 14] were proposed. The facial area maps X = [1, 2, · · · ,  ]. Each data 
methods [15, 16] introduced the Deep Metric Learning has a label  ∈ {0, 1} representing real or fake. Let
to DeepFake detection for the first time. ℎ :  −ℎ→  be the mapping from the data space to the</p>
      <p>Kumar et al. [15] mainly explored the method’s ef- feature space, where the extracted feature  preserves
fectiveness for detecting videos with high compression semantic characteristics of its corresponding data point
factor. Feng et al. [16] used the diference of the full face . Then the feature is projected onto the embedding
image in videos as the feature for DeepFake detection. Al- space  with the mapping  :  −→ . Since the
projection can be incorporated into the deep network, we 3.1.3. Weight-Center Loss
can directly learn the mapping  (·; ) = ℎ∘ :  −→  Considering the distribution discrepancy of genuine and
from the data space to the embedding space, where  is tampered data, we hope to further widen the distance
network parameters. between two categories of sample clusters by capturing</p>
      <p>Based on the data distribution discrepancy, namely, the fundamental distinction between real videos and fake
embedding vectors of real faces are tightly distributed, videos. Under the action of Triplet Loss and CE Loss, the
while the fakes are comparatively scattered. We deem network has acquired preliminary classification
capabilthat various levels of classification constraints should be ity. On this basis, we design Weight-Center Loss for real
imposed, so as to mine fundamental distinction between sample cluster to capture the fundamental distinction
spurious videos and genuine videos, as shown in the between two types of samples.</p>
      <p>Figure 3. Multi-metric Loss is formulated as follows: Some embedding vectors are far from the center of
Loss = ℒ ℎ− + ℒ  + ℒ (1) the real sample cluster, it may be due to certain
interference, which has nothing to do with judging real and fake
3.1.1. Triplet Loss videos. Therefore, Weight-Center Loss is proposed which
Under the constraint of Triplet Loss, the mapping from only acts on the cluster of real samples. We define the
high-dimensional sparse features into low-dimensional sample that is far from the center of the real sample
clusdense vectors is learned. Reflected in the embedding ter compared to the surrounding samples as the deviating
space, the distribution of data is characterized by intra- sample. It adaptively imposes larger penalty on
deviatclass compactness and inter-class separation. ing samples, and imposes smaller penalty on adjacent</p>
      <p>Let  (; ) be the anchor embedding vector. The samples. Simultaneously, the center of the real sample
embedding vector with the same and diferent label rel- cluster is continuously updated. Based on the above
operative to  (; ), are defined as  (; ) and  (; ), ations, real faces are projected to a more compact region
respectively. Triplet Loss is formulated as follows: in the embedding space, so as to broaden the distance
beℒ  := [S − S + ]+ (2) Wtweeiegnhtt-hCeernetaelrsLamospsleiscfloursmteurlaanteddthaes ffaoklleowsasm: ple cluster.
ℒ ℎ− = 1 log [︃1 + ∑︁ −(−)]︃ (3)
where S = ⟨ (; ) ,  (; )⟩ indicates the
similarity of positive pair, S = ⟨ (; ) ,  (; )⟩ is the
similarity of negative pair, ⟨·, ·⟩ denotes dot product, and
 is metric margin.
∈
where  is the collection of real embedding
vec3.1.2. Cross-Entropy Loss tors,  is the similarity of the center sample pair
{ (; ) ,  (; )},  (; ) and  (; ) are real
In our approach, Cross-Entropy (CE) Loss and the Triplet embedding vectors and the iterative center and ,  are
Loss act jointly. Specifically, CE Loss encourages the sep- ifxed hyperparameters. It is worth noting that the center
aration of real embedding vectors from the fakes. Simul- is iterated continuously.
taneously, the Triplet Loss is used to achieve intra-class Based on [21], we can get the generic definition about
compactness and inter-class separation, so as to initially the penalty weight of sample pair. Then the penalty
separate the two types of sample clusters. weight of the center sample { (; ) ,  (; )} in
manipulated, and for other samples {︀, +}︀, we perform
no transformation. Then the reduction in the distance
between negative pairs will create rise of the hard level, so
that the measurement process is always at an appropriate
level of dificulty during the training cycle. As shown in
the Figure 4, in order to simplify the representation, we
use , +, − to represent the anchor embedding vector
 (; ), the positive embedding vector  (; ), and
the negative embedding vector  (; ), respectively.</p>
      <p>Firstly, a toy example that constructs an augmented
harder negative sample ˜− by linear interpolation, is
presented:
˜− =  +  (︀− − )︀ ,  ∈ [0, 1]
(6)</p>
      <sec id="sec-2-1">
        <title>In this section, we first explore the optimal settings for</title>
        <p>our approach and then present extensive experimental
results to demonstrate the efectiveness of our method.</p>
        <sec id="sec-2-1-1">
          <title>3.2. Adaptive Hardness-aware Expander</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>In the end-stage of training, considering that original</title>
        <p>samples are already well separable under the action of For all real/fake video frames, we use face extractor
Multi-metric Loss. Continuing to train original samples MTCNN to detect faces and save the aligned facial images
cannot further improve the model’s feature description as inputs with the size of 256 × 256. ,  in Eq.1 and 
ability. To address this limitations, we propose the Adap- in Eq.3 is set to 2.0, 1.0, 2.0 to impose diferent levels of
tive Hardness-aware Expander, as shown in Figure 4. classification constraints. The margin of Triplet Loss in</p>
        <p>We construct the hardness-aware triplet {︀, +, ˜−}︀ Eq.2 is set to 1.0. Optimization is performed using SGD
in the embedding space, where manipulation of the dis- optimizer with weight decay 5−4. The initial learning
tances among samples will directly alter the hard level rate is kept at 0.01 and divided by 10 after every 3000
of the triple. The distances of negative pairs {︀, ˜−}︀ is iterations. We adopt ResNet-34, which is pre-trained on</p>
        <sec id="sec-2-2-1">
          <title>4.1. Implement Details</title>
          <p>hardness-aware samples forces the network to pay more
attention to some key features that characterize the truth
and counterfeit under the constraint of Multi-metric Loss,
thereby improving the feature description ability of the
model and achieving better classification performance.
Therefore, our method can achieve state-of-the-art
performance on FaceForensics++, DFDC and CelebDF datasets.</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>4.3. Ablation Study</title>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>To verify the efectiveness of Multi-metric Loss and Adap</title>
        <p>tive Hardness-aware Expander, we conduct ablation
studies and results are shown in Table 3, Figure 5, Figure 6.
4.3.1. Efe ctiveness of Multi-metric Loss
Table 3 To confirm the efectiveness of Multi-metric Loss, we
The ablation study about Multi-metric Loss. evaluate how diferent levels classification constraints
df f fs nt afect the detection accuracy. We train the model on</p>
        <p>Triplet Loss 0.946 0.925 0.939 0.810 FF++ (c23), other hyperparameters are kept the same as
+ Cross-Entropy Loss 0.962 0.933 0.942 0.870 settings in Table 1.
+ Weight-center Loss 0.985 0.974 0.995 0.938 The t-SNE plots of four diferent manipulation
methods in FF++ datasets are reported in Figure 5. It can be
the ImageNet dataset, as the backbone network. Our found that the separability of the sample is poor when
model is trained on 4 RTX 2080Ti GPUs with batch Triple Loss acts independently, as shown in the first row
size 16 and the total number of iterations is set to 10, 000. of Figure 5. The reason is that the data selection in the
4.2. Comparsion with Previous Methods batch results in uneven data distribution, which makes
it dificult to divide the interface. When Cross-entropy
In this section, we compare our method with previous Loss is introduced, the data distribution of diferent
maDeepFake detection methods. The performance of var- nipulation in FF++ datasets is shown in the second row
ious methods on FaceForensics++ [6], DFDC [18] and of Figure 5. Among them, Cross-Entropy Loss
encourCeleb-DF [19] dataset is shown. We adopt ACC (accu- ages the separation of real embedding vectors from fake
racy) and AUC (area under Receiver Operating Charac- embedding vectors, and the Triple Loss helps constrain
teristic Curve) as the evaluation metrics for experiments. the intra-class compactness and inter-class separation,</p>
        <p>The evaluation results of the individual datasets are thereby improving the separability of samples. In the
shown in Table 1 and Table 2. The results indicate that third row of Figure 5, Weight-Center Loss is added and
our model trained with Multi-metric Loss and AHE have it only acts on the real cluster. By mining the features
significant improvement over previous methods with representing authenticity, the real sample clusters are
metric learning [15, 16], especially in DFDC and Celeb- tightly clustered, thereby further extending distance
beDF dataset. The reason is that diferent levels of classifica- tween two types of sample clusters in the embedding
tion constraints based on the phenomena of distribution space. The ACC ablation studys about Multi-metric Loss
discrepancy is imposed to mine the fundamental distinc- on FF++ are reported in Table 3, which further confirm
tion between spurious videos and genuine videos, so that the efectiveness of Multi-metric Loss.
it can still work on tampered videos without obvious Note that Triplet Loss and Cross-Entropy Loss work
artifacts. At the same time, the generation of adaptive during the entire training stage, while Weight-Center
Real Samples</p>
        <p>Fake Samples
tive hardness force the network paying more attention to
some key features that characterize the authenticity and
the counterfeit under the constraint of Multi-metric Loss,
thereby improving the feature description ability of the
model. For example, in Figure 6, NeuralTextures (nt), a
tampering scheme only modifies the mouth area. Before
the Adaptive Hardness-aware Expander is used, class
activation map shows that the nose and mouth regions
together provide evidence that the video is tampered.
After the Adaptive Hardness-aware Expander is used, class
activation map shows that the network will pay more
attention to the mouth area tampered, which demonstrates
the interpretability of our proposed method.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Conclusion</title>
      <sec id="sec-3-1">
        <title>In this work, we propose the DeepFake detection method</title>
        <p>based on Multi-metric Loss, considering the distribution
Loss only works in the middle and end stages of training. discrepancy that the embedding vectors of genuine faces
The main reason is that the center point of the real sam- are tightly distributed in the embedding space, while
tamples is unstable at the beginning of the training, which pered faces are comparatively scattered. Multi-metric
will cause the network to optimize in the wrong direction. Loss improves the separability of genuine and tampered
4.3.2. Efe ctiveness of AHE samples through further widening distance between the
To confirm the efectiveness of Adaptive Hardness-aware two types of sample clusters. Besides, adaptive
hardnessExpander, we analyze the class activation maps for four aware samples is generated to make the metric be always
diferent manipulation methods, as shown in Figure 6. in the proper dificulty, so as to improve the feature
de</p>
        <p>Class activation maps corresponding to the operation scription ability of the model. Our method achieves good
of Expander indicate that synthetic samples with adap- improvements in extensive metrics.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <sec id="sec-4-1">
        <title>Detection, in: 2020 IEEE/CVF Conference on Com</title>
        <p>puter Vision and Pattern Recognition (CVPR), 2020,
This work was supported in part by the National Key Re- pp. 5000–5009.
search and Development of China (2018YFC0807306), Na- [12] D. Güera, E. J. Delp, Deepfake Video Detection
tional NSF of China (U1936212), Beijing Fund-Municipal Using Recurrent Neural Networks, in: 2018 15th
Education Commission Joint Project (KZ202010015023). IEEE International Conference on Advanced Video
and Signal Based Surveillance (AVSS), 2018, pp. 1–6.
[13] E. Sabir, J. Cheng, A. Jaiswal, W. AbdAlmageed,
References I. Masi, P. Natarajan, Recurrent Convolutional
Strategies for Face Manipulation Detection in
[1] Y. Li, M. Chang, S. Lyu, In Ictu Oculi: Exposing AI Videos, in: 2019 IEEE/CVF Conference on
ComCreated Fake Videos by Detecting Eye Blinking, in: puter Vision and Pattern Recognition Workshops
2018 IEEE International Workshop on Information (CVPRW), 2019, pp. 80–87.</p>
        <p>Forensics and Security (WIFS), 2018, pp. 1–7. [14] K. Chugh, P. Gupta, A. Dhall, R. Subramanian, Not
[2] F. Matern, C. Riess, M. Stamminger, Exploiting Made for Each Other- Audio-Visual
DissonanceVisual Artifacts to Expose Deepfakes and Face Ma- Based Deepfake Detection and Localization, in:
nipulations, in: 2019 IEEE Winter Applications of Proceedings of the 28th ACM International
ConferComputer Vision Workshops (WACVW), 2019, pp. ence on Multimedia, 2020, p. 439–447.
83–92. [15] A. Kumar, A. Bhavsar, R. Verma, Detecting
Deep[3] Y. Li, S. Lyu, Exposing DeepFake Videos By Detect- fakes with Metric Learning, in: 2020 8th
Intering Face Warping Artifacts, in: IEEE/CVF Confer- national Workshop on Biometrics and Forensics
ence on Computer Vision and Pattern Recognition (IWBF), 2020, pp. 1–6.</p>
        <p>(CVPR) Workshops, 2019, pp. 46–52. [16] K. Feng, J. Wu, M. Tian, A Detect method for
[4] X. Yang, Y. Li, S. Lyu, Exposing Deep Fakes Using deepfake video based on full face recognition, in:
Inconsistent Head Poses, in: IEEE International 2020 IEEE International Conference on
InformaConference on Acoustics, Speech and Signal Pro- tion Technology,Big Data and Artificial Intelligence
cessing, ICASSP, 2019, pp. 8261–8265. (ICIBA), 2020, pp. 1121–1125.
[5] D. Afchar, V. Nozick, J. Yamagishi, I. Echizen, [17] N. Lei, Z. Luo, S. Yau, X. D. Gu,
GeometMesoNet: a Compact Facial Video Forgery Detec- ric Understanding of Deep Learning, 2018.
tion Network, in: 2018 IEEE International Work- arXiv:1805.10451.
shop on Information Forensics and Security (WIFS), [18] B. Dolhansky, J. Bitton, B. Pflaum, J. Lu, R. Howes,
2018, pp. 1–7. M. Wang, C. Canton, The DeepFake Detection
[6] A. Rössler, D. Cozzolino, L. Verdoliva, C. Riess, Challenge Dataset, 2020. arXiv:2006.07397.</p>
        <p>J. Thies, M. Niessner, FaceForensics++: Learn- [19] Y. Li, X. Yang, P. Sun, H. Qi, S. Lyu, Celeb-DF:
ing to Detect Manipulated Facial Images, in: 2019 A Large-Scale Challenging Dataset for DeepFake
IEEE/CVF International Conference on Computer Forensics, in: 2020 IEEE/CVF Conference on
ComVision (ICCV), 2019, pp. 1–11. puter Vision and Pattern Recognition, CVPR, 2020,
[7] H. Nguyen, J. Yamagishi, I. Echizen, Use of a Cap- pp. 3204–3213.</p>
        <p>sule Network to Detect Fake Images and Videos, [20] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu,
2019. arXiv:1910.12467v2. D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio,
[8] I. Ganiyusufoglu, L. M. Ngô, N. Savov, S. Karaoglu, Generative Adversarial Nets, in: 2014 Annual
ConT. Gevers, Spatio-temporal Features for Gen- ference on Neural Information Processing Systems
eralized Detection of Deepfake Videos, 2020. (NIPS), 2014, pp. 2672—-2680.</p>
        <p>arXiv:2010.11844. [21] X. Wang, X. Han, W. Huang, D. Dong, M. R. Scott,
[9] P. Zhou, X. Han, V. I. Morariu, L. S. Davis, Two- Multi-Similarity Loss With General Pair
WeightStream Neural Networks for Tampered Face Detec- ing for Deep Metric Learning, in: 2019 IEEE/CVF
tion, in: IEEE/CVF Conference on Computer Vision Conference on Computer Vision and Pattern
Recogand Pattern Recognition (CVPR) Workshops, 2017. nition (CVPR), 2019, pp. 5017–5025.
[10] N. Bonettini, E. D. Cannas, S. Mandelli, L. Bondi, [22] Y. Wen, K. Zhang, Z. Li, Y. Qiao, A Discriminative
P. Bestagini, S. Tubaro, Video Face Manipulation Feature Learning Approach for Deep Face
RecogDetection Through Ensemble of CNNs, in: 2020 nition, in: Computer Vision - ECCV 2016 - 14th
25th International Conference on Pattern Recogni- European Conference, Amsterdam, 2016, pp. 499–
tion (ICPR), 2020, pp. 5012–5019. 515.
[11] L. Li, J. Bao, T. Zhang, H. Yang, D. Chen, F. Wen,</p>
        <p>B. Guo, Face X-Ray for More General Face Forgery</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>