=Paper=
{{Paper
|id=Vol-3026/paper16
|storemode=property
|title=Learning Contextual Representations of Citations via Graph Transformer
|pdfUrl=https://ceur-ws.org/Vol-3026/paper16.pdf
|volume=Vol-3026
|authors=Hyeon-Ju Jeon,Gyu-Sik Choi,Se-Young Cho,Hanbin Lee,Hee Yeon Ko,Jason J. Jung,O-Joun Lee,Myeong-Yeon Yi
}}
==Learning Contextual Representations of Citations via Graph Transformer==
<pdf width="1500px">https://ceur-ws.org/Vol-3026/paper16.pdf</pdf>
<pre>
            Learning Contextual Representations
            of Citations via Graph Transformer ⋆

     Hyeon-Ju Jeon1,2[0000−0002−2400−8360] , Gyu-Sik Choi3 , Se-Young Cho3 ,
      Hanbin Lee4 , Hee Yeon Ko5 , Jason J. Jung1,⋆⋆[0000−0003−0050−7445] ,
           O-Joun Lee6[0000−0001−8921−5443] , and Myeong-Yeon Yi7
                  1
                     Chung-Ang University, Dongjak-gu, Seoul, Korea
                             {hyeonju,j3ung}@cau.ac.kr
    2
      Korea Institute of Atmospheric Prediction Systems, Dongjak-gu, Seoul, Korea
                                  hjjeon@kiaps.org
                      3
                        Sogang University, Mapo-gu, Seoul, Korea
                          {gyusik19,wnl383}@sogang.ac.kr
              4
                 Incheon National University, Yeonsu-gu, Incheon, Korea
                                  gksqls@inu.ac.kr
                    5
                      Soongsil University, Dongjak-gu, Seoul, Korea
                              0525ojkmt@soongsil.ac.kr
           6
             Catholic University of Korea, Bucheon-si, Gyeonggi-do, Korea
                                ojlee@catholic.ac.kr
                  7
                    NAVER Corp., Seongnam-si, Gyeonggi-do, Korea
                            myeongyeon.yi@navercorp.com


        Abstract. This study aims at representing the citation based on the
        citation context extracted from the citation network. Researchers cite
        papers for various purposes to describe their arguments in a logical struc-
        ture. Thus, citations have different roles depending on what structure
        they are cited in the paper. In this paper, we first present a definition of
        the citation context and initialize the embedding vector based on the ci-
        tation order and location. Then, based on the graph transformer model,
        we learn contextual citation embeddings. To represent citation context,
        we consider the following three parts: (i) textual features of paper, (ii)
        positional features of the citation context, and (iii) structural features of
        the citation network by applying the self-attention mechanism.

                                         ·                    · Graph Transformer
        ·                       ·
        Keywords: Citation Context Citation Network
         Network Embedding Positional Embedding

⋆
   Copyright  ©     by the paper’s authors. Use permitted under Creative Commons Li-
   cense Attribution 4.0 International (CC BY 4.0). In: N. D. Vo, O.-J. Lee, K.-H. N.
   Bui, H. G. Lim, H.-J. Jeon, P.-M. Nguyen, B. Q. Tuyen, J.-T. Kim, J. J. Jung, T. A.
   Vo (eds.): Proceedings of the 2nd International Conference on Human-centered Arti-
   ficial Intelligence (Computing4Human 2021), Da Nang, Viet Nam, 28-October-2021,
   published at http://ceur-ws.org
⋆⋆
   Corresponding Author.
    Learning Contextual Representations of Citations via Graph Transformer      151


Fig. 1: Illustration of the citation embedding. There are citing paper pi and
cited papers p1 , p2 , p3 . In the citation network, each node represents a paper.
The citation embedding refers to the context based embedding of citation. If the
citation location is different, the distinguish meaning is reflected in the citation
embedding, even if cited in the same paper.


1   Introduction

The exponentially increment academic papers cause various services (e.g., cita-
tion recommendation [3, 7, 13], bibliographical retrieval [15], and so on). Such
services need exquisite analysis of the scientific impact and content of papers [9].
    There have been various studies [14, 16] on citation analysis to assess the
quality of the paper and understand the context. These studies have mostly
applied citation frequency-based and content-based approaches. The frequency-
based approaches was only given the same weight regardless of the purposes of
citation. As shown in Fig. 1, when two papers p1 and p2 are cited by pi , suppose
that p1 is located in introduction section, and p2 is located in evaluation section.
In this case, p1 and p2 are cited for different purposes, and their importance is
also different.
    To solve this problem, it is necessary to understand the overall context of
the citation in the paper. The content-based approaches [6, 19] attempted to
learn the contextual features of the paper using a language model based on
RNN/LSTM. Nevertheless, these studies only concentrated on not discovering
a citation context or their roles but measuring contents similarity between two
papers.
    Thereby, in this paper, we define and extract the citation context in citation
networks. First of all, we assume that the cited papers compose the contents of
the citing paper, and the order and location of the cited papers reflect the role of
each paper in the citing paper. To represent citations, we propose an embedding
method considering (i) textual features of paper, (ii) positional features of the
citation context, and (iii) structural features of the citation network by applying
the self-attention mechanism [18]. The proposed method can represent global
152     Jeon et al.

citation features using fewer layers than the convectional GCN model. It is also
efficient to learn the context of long papers.
    Finally, based on the graph-transformer [20], the proposed method generates
pre-training citation vectors considering the influence and correlation between
citation papers. This result can be used in various tasks such as citation classi-
fication, research topic discovery, and paper evaluation in the future.


2     Related work


This section introduces the existing methods for analyzing the citation relation-
ship in the citation network. To deal with the large citation network, various
studies investigated the co-citation frequency.
     Boyack and Klavans [2] focused on the network theory which can measure
node importance and weight to analyze co-citation relationship and bibliographic
coupling. Although this approach reflects the feature of network structure level,
it is difficult to say that the different roles of citations are considered. To solve
this problem, Habib and Afzal [5] exploited the distribution of citations in sec-
tions to capture the citation context. Nevertheless, it is necessary to analyze
the distinguishing characteristics of co-citation papers at the content level. The
proximity based methods [4, 12] was proposed for weighting edges of the co-
citation network by using contexts. The edge weight was based on the strength
of co-citation context in the sentence level. Also, Ahmad and Afzal [1] showed
that traditional co-citation analysis can produce better results when combined
with metadata information of the paper (e.g., author, affiliation, venue, and so
on.)
    The above approaches focused on comparing content-based similarities in
consideration of the relationship between cited papers. While these are effective
for application to specific tasks such as citation recommendations and searches,
it is difficult to generate widely used representation by unsupervised learning.
Thereby, a few studies conducted network representation learning [11] for em-
bedding the paper node based on the citation context in network structure level.
VOPRec [10] learned vector representation of paper by combining text infor-
mation with structural identity in the citation network. DocCit2Vec [21] which
represents paper based on the citation context at the document level is used for
the recommendation system by applying the attention mechanism.
    However, it is difficult to consider the contextual features reflected in the
structure of papers. From this perspectives, we extract the context of a cita-
tion through citation networks constructed according to the citation section.
After that, initial embedding is performed considering the network structure
and textual features so that the transformer model can learn various features of
citations.
      Learning Contextual Representations of Citations via Graph Transformer   153


         Fig. 2: Architecture of the contextual citation embedding model.


3     Learning representation of citation context

In this section, we will introduce the detailed approach about the contextual
citation embedding model. As illustrated in Fig. 2, the model composes three
components: (1) extracting citation context, (2) initializing the citation embed-
ding, (3) graph-transformer based encoder. Therefore, the graph transformer
model learns a representation a target citation by fusing the input initial em-
bedding vectors. To extract the context of the citation in the first component,
we define our citation network as follows.

Definition 1 (Citation Network). The citation networks (N ) contains paper
node (P). There are citation relationship (C ∈ R|P|×|P| ) between paper nodes.
When paper pi cites paper pj in the nth section, the citation relationship has
weights (w ∈ {0, · · · , n, · · · , N }). This can be formulated as follows:

                                  N = ⟨P, C, w, t⟩ ,                           (1)

where t refers to a textual feature vector of P.

   To consider the different compositions of the sections of the paper, we rear-
range the paper into four sections from 0 to 3: 0 represents an introduction, 1
represents a related task, 2 represents a methodology, and 3 represents a result.
In this case, the maximum section number is 3. Also, for the text features of
each paper pi , different word embedding models can be used.


3.1     Extracting citation context

Instead of working on the entire citation network N , we extract the citation
context from the citation network. The existing network embedding method
uses a node sampling approach that is weighted according to the importance of
the node. However, since the importance of cited papers is determined by the
154     Jeon et al.

purpose of citations, analysis of the purpose and characteristics of each cited
paper is necessary.
   As stated in Sect. 1, we assume that the citation order and location of the
paper relates to the purpose of the citation paper. Thus, we extract various
subgraphs for the target paper by sampling the cited paper for each section
rather than sampling the entire citation paper. In this section, we define the
subgraphs as citation context;
Definition 2 (Citation Context). Given an input citation network N , for
paper pi in the network, citation context is a set of sampled paper at each section
n ∈ [0, N ] This can be formulated as follows:

                            Γ (pi ) = ⟨Γ (pi,0 ), · · · , Γ (pi,N )⟩ ,                   (2)

where Γ (pi,n ) represents the contextual citation in section n. This can be formu-
lated as follows:

                      Γ (pi,n ) = {pj |pj ∈ P∖ {pi } ∧ w(i, j) = n}.                     (3)

     To efficiently extract citation context for a batch of papers during the train-
ing of the embedding model, we extend a node sampling algorithm to enable
node sampling for each section. The sampling method iteratively samples a list
of papers for a target paper pi using adaptive sampling depth Kn by section. Let
Spkin −1 refer to the bag of papers sampled at the (k − 1)th step in nth section. For
each paper node pi in Spkin −1 , we randomly sample cited papers in citation net-
work with replacement from pi ’s one-hop neighbors at the k th step. Through this
process, the papers in pi ’s citation context Γ (pi ) can cover both local neighbors
of pi and papers far away.

3.2    Initializing the citation embedding
Based on the citation context concept, we obtain the set of sampled subgraph
batches for all the nodes as get G = {g1 , g2 , · · · , g|P| }, where gi represents the
subgraph sampled for target paper pi . Different from general graph in which
the nodes are orderless, in the paper, cited papers are logically constructed, so
the order of citations is meaningful. Therefore, the citation context is serialized
in the order cited in the paper. Formally, we concatenate the target paper pi
and its ordered contextual citation g1 , denoted by Ipi = [pi , pi,1 , pi,2 , · · · , pi,S ],
where pi,j is the j th node in gj , and 1 ≤ j ≤ S. In this section, we define paper
embeddings along the citation order quoted in a paper. The paper embeddings
will be the input to the graph-transformer model.
    For textual embedding, we can embed textual feature vector tj into a shared
feature space for each paper pj ∈ I(pi ) in the citation context gi . Simple fully
connected layers can be used for the textual input. This can be formulated as
follows:

                           xtext (pj ) = Embedding(tj ) ∈ Rd ,                           (4)
      Learning Contextual Representations of Citations via Graph Transformer     155

where d indicates dimension of the shared feature space.
    The position of a paper in the citation context Ipi reflects the purpose and
characteristic of the citation to the target paper pi . Thus, we suggest that the
order of papers in Ipi is significant in learning citation representations. The
following position-id embedding is used to identify the cited paper order infor-
mation of an input list,
                         xpos (pj ) = Embedding[p(j)] ∈ Rd ,                     (5)
where p(j) indicates the position-id of paper pj in Ii .
   Our main objective is to obtain the representation of the target paper pi
based on the structural roles. To identify the role of each paper, we use the
embedding method based on Weisfeiler-Lehman (WL) algorithm [17]. This can
be formulated as follows:
                         xrole (pj ) = Embedding[r(j)] ∈ Rd ,                    (6)
where r(t) refers to the role label.
    After computing the three terms of embedding, we aggregate them to be the
initial input paper embedding of the graph transformer model. The embedding
fusion is formalized as follows:
                   x(pj ) = xtext (pj ) + xpos (pj ) + xrole (pj ) ∈ Rd .        (7)
We define the embedding fusion function as the summation of three embedding
terms.
   Finally, given a target paper pi , we obtain the initial paper embedding of
each paper in its substructure cited paper set. The initial paper embedding for
the paper in the citation context Ipi can be stacked to a embedding matrix. The
embedding matrix is represented by X(pi ) = [x(p1 ), x(p2 ), · · · , x(pS )] ∈ RS×d .

3.3     Graph-transformer based encoder
The target of the graph-transformer model is to aggregate the initial embedding
of each paper and generate a low-dimensional embedding vector for each of
paper. A numbers of attention layers are stacked to compose the transformer
module. A single layer can be formulated as:
                                                  Q(l) K(l)⊤ 
            H(l) = attention H(l−1) = sof tmax                  V(l) ,
                                    
                                                      √                     (8)
                                                         d
where H(l) and H(l−1) denote the output embedding of the l and (l − 1) layer,
Q(l) , K(l) , and V(l) are the query matrix, key matrix, and value matrix respec-
tively, and d is the dimension of paper embedding. Specifically, Q(l) , K(l) , and
V(l) are calculated as follows:
                                                (l)
                               
                                  (l)    (l−1)
                               Q = H
                                              WQ ,
                                                 (l)
                                K(l) = H(l−1) WK ,                              (9)
                                                 (l)
                               
                                (l)
                                V = H(l−1) WV ,
156     Jeon et al.

          (l)    (l)         (l)
where WQ , WK , and WV are the weight matrices of the lth attention layer.
    The input of the graph-transformer model H(0) is denoted as the embedding
matrix of the target paper X(pi ). The output of the last attention layer H(L) is
defined as the output paper embedding matrix Z of the transformer model.


4     Conclusion and future work

In this paper, we have proposed the learning representation of contextual citation
network. We have defined the citation context by sampling a different number of
papers per section. Using a graph transformer model, paper vectors were output
based on salient citations within the citation context. According to our initial
assumption, the results of the embedding model can reflect the role of each
citations in the paper.
    The citation purpose of a paper can change dynamically [8]. As future work,
we can represent the paper with the meaning of citations that change over time.
In addition, various bibliographic entities such as high reputed journals and au-
thors affect the citation. If the graph transformer model is extended to heteroge-
neous networks in the future, rich interactions between bibliographic information
are able to analyze. Finally, we intend to examine the proposed embedding model
in a large contextual citation network.


Acknowledgements This work was supported by Korea Foundation for Women
In Science, Engineering and Technology (WISET) grant funded by the Ministry
of Science and ICT(MSIT) under the team research program for female engi-
neering students. (WISET Contract No. 2021-178)


References
 1. Ahmad, S., Afzal, M.T.: Combining co-citation and metadata for recom-
    mending more related papers. In: 2017 International Conference on Fron-
    tiers of Information Technology (FIT 2017). pp. 218–222. IEEE (dec 2017).
    https://doi.org/10.1109/fit.2017.00046
 2. Boyack, K.W., Klavans, R.: Co-citation analysis, bibliographic coupling, and direct
    citation: Which citation approach represents the research front most accurately?
    Journal of the American Society for Information Science and Technology 61(12),
    2389–2404 (dec 2010). https://doi.org/10.1002/asi.21419
 3. Cai, X., Zheng, Y., Yang, L., Dai, T., Guo, L.: Bibliographic network representation
    based personalized citation recommendation. IEEE Access 7, 457–467 (Dec 2019).
    https://doi.org/10.1109/access.2018.2885507
 4. Eto, M.: Extended co-citation search: Graph-based document retrieval on a co-
    citation network containing citation context information. Information Processing &
    Management 56(6), 102046 (nov 2019). https://doi.org/10.1016/j.ipm.2019.05.007
 5. Habib, R., Afzal, M.T.: Sections-based bibliographic coupling for re-
    search paper recommendation. Scientometrics 119(2), 643–656 (mar 2019).
    https://doi.org/10.1007/s11192-019-03053-8
   Learning Contextual Representations of Citations via Graph Transformer        157

 6. Huang, W., Kataria, S., Caragea, C., Mitra, P., Giles, C.L., Rokach, L.: Recom-
    mending citations: translating papers into references. In: Chen, X., Lebanon, G.,
    Wang, H., Zaki, M.J. (eds.) Proceedings of the 21st ACM international conference
    on Information and knowledge management (CIKM 2012). pp. 1910–1914. ACM
    Press, Maui, HI, USA (Oct 2012). https://doi.org/10.1145/2396761.2398542
 7. Huang, W., Wu, Z., Liang, C., Mitra, P., Giles, C.L.: A neural probabilistic model
    for context based citation recommendation. In: Bonet, B., Koenig, S. (eds.) Pro-
    ceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI 2015). pp.
    2404–2410. AAAI Press, Austin, Texas, USA (Jan 2015)
 8. Jeon, H.J., Jung, J.J.: Discovering the role model of authors by embedding re-
    search history. Journal of Information Science 0(0), 01655515211034407 (2021).
    https://doi.org/10.1177/01655515211034407
 9. Jeon, H.J., Lee, O.J., Jung, J.J.: Is performance of scholars correlated to
    their research collaboration patterns? Frontiers in Big Data 2(39) (Nov 2019).
    https://doi.org/10.3389/fdata.2019.00039
10. Kong, X., Mao, M., Wang, W., Liu, J., Xu, B.: VOPRec: Vector representation
    learning of papers with text information and structural identity for recommen-
    dation. IEEE Transactions on Emerging Topics in Computing 9(1), 226–237 (jan
    2021). https://doi.org/10.1109/tetc.2018.2830698
11. Lee, O.J., Jeon, H.J., Jung, J.J.: Learning multi-resolution representations of re-
    search patterns in bibliographic networks. Journal of Informetrics 15(1), 101126
    (Feb 2021). https://doi.org/10.1016/j.joi.2020.101126
12. Liu, S., Chen, C.: The proximity of co-citation. Scientometrics 91(2), 495–511 (dec
    2011). https://doi.org/10.1007/s11192-011-0575-7
13. Ma, S., Zhang, C., Liu, X.: A review of citation recommendation: from tex-
    tual content to enriched context. Scientometrics 122(3), 1445–1472 (jan 2020).
    https://doi.org/10.1007/s11192-019-03336-0
14. MacRoberts, M.H., MacRoberts, B.R.: The mismeasure of science: Citation anal-
    ysis. Journal of the Association for Information Science and Technology 69(3),
    474–482 (nov 2017). https://doi.org/10.1002/asi.23970
15. Raamkumar, A.S., Foo, S., Pang, N.: Using author-specified keywords in build-
    ing an initial reading list of research papers in scientific paper retrieval and rec-
    ommender systems. Information Processing & Management 53(3), 577–594 (May
    2017). https://doi.org/10.1016/j.ipm.2016.12.006
16. Roman, M., Shahid, A., Uddin, M.I., Hua, Q., Maqsood, S.: Exploiting con-
    textual word embedding of authorship and title of articles for discovering ci-
    tation intent classification. Complexity 2021, 5554874:1–5554874:13 (apr 2021).
    https://doi.org/10.1155/2021/5554874
17. Shervashidze, N., Schweitzer, P., van Leeuwen, E.J., Mehlhorn, K., Borgwardt,
    K.M.: Weisfeiler-lehman graph kernels. Journal of Machine Learning Research 12,
    2539–2561 (Sep 2011)
18. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez,
    A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Guyon, I.,
    von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan,
    S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing
    Systems 30th Annual Conference on Neural Information Processing Sys-
    tems (NIPS 2017). pp. 5998–6008. Long Beach, CA, USA (Dec 2017),
    https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-
    Abstract.html
158     Jeon et al.

19. Wang, J., Zhu, L., Dai, T., Wang, Y.: Deep memory network with bi-LSTM for
    personalized context-aware citation recommendation. Neurocomputing 410, 103–
    113 (oct 2020). https://doi.org/10.1016/j.neucom.2020.05.047
20. Zhang, J., Zhang, H., Xia, C., Sun, L.: Graph-bert: Only attention is needed
    for learning graph representations (2020), https://arxiv.org/abs/2001.05140,
    abs/2001.05140
21. Zhang, Y., Ma, Q.: Citation recommendations considering content and structural
    context embedding. In: Lee, W., Chen, L., Moon, Y., Bourgeois, J., Bennis, M., Li,
    Y., Ha, Y., Kwon, H., Cuzzocrea, A. (eds.) 2020 IEEE International Conference on
    Big Data and Smart Computing (BigComp 2020). pp. 1–7. IEEE, Busan, Korea
    (South) (Feb 2020). https://doi.org/10.1109/bigcomp48618.2020.0-109

</pre>