=Paper= {{Paper |id=Vol-3604/paper3 |storemode=property |title=Exploring the Impact of Negative Sampling on Patent Citation Recommendation |pdfUrl=https://ceur-ws.org/Vol-3604/paper3.pdf |volume=Vol-3604 |authors=Rima Dessi,Hidir Aras,Mehwish Alam |dblpUrl=https://dblp.org/rec/conf/patentsemtech/DessiAA23 }} ==Exploring the Impact of Negative Sampling on Patent Citation Recommendation== https://ceur-ws.org/Vol-3604/paper3.pdf
                                Exploring the Impact of Negative Sampling on Patent
                                Citation Recommendation
                                Rima Dessí1 , Hidir Aras1 and Mehwish Alam2
                                1
                                    FIZ Karlsruhe - Leibniz Institute for Information Infrastructure, Germany
                                2
                                    Telecom Paris Institut Polytechnique de Paris, France


                                                                          Abstract
                                                                          Due to the increasing number of patents being published every day, patent citation recommendations have become one of the
                                                                          challenging tasks. Since patent citations may lead to legal and economic consequences, patent recommendations are even
                                                                          more challenging as compared to scientific article citations. One of the crucial components of the patent citation algorithm
                                                                          is negative sampling which is also a part of many other tasks such as text classification, knowledge graph completion, etc.
                                                                          This paper, particularly focuses on proposing a transformer-based ranking model for patent recommendations. It further
                                                                          experimentally compares the performance of patent recommendations based on various state-of-the-art negative sampling
                                                                          approaches to measure and compare the effectiveness of these approaches to aid future developments. These experiments are
                                                                          performed on a newly collected dataset of US patents from Google patents.


                                1. Introduction                                                                                        relevant patents can have critical outcomes for patent
                                                                                                                                       applicants. Furthermore, the number of citations that the
                                Negative sampling is a crucial task for several applica- patent receives can determine the business value of the
                                tions such as recommender systems [1, 2, 3], text clas- patent. Therefore, identifying the right prior art patents
                                sification [4, 5, 6], computer vision [7], etc. In order to to be cited is quite a significant task for both the patent
                                train a machine learning model it is essential to have an applicant and the examiner.
                                accurately labeled dataset that includes sufficient posi-                                                 Recently, several patent citation recommendation sys-
                                tive and negative samples for each class. However, in tems have been proposed [3, 1, 2]. Most of the approaches
                                many applications such as recommender systems obtain- are based on two steps, i.e., retrieval and ranking. While
                                ing negative samples is quite a challenging task. In fact, the retrieval phase aims to find the most relevant cita-
                                it is easy to collect positive samples for the patent ci- tion candidates, the ranking phase focuses on ranking
                                tation recommendation system by considering patents’ the most relevant potential citations from the candidate
                                actual citations, however, generating negative samples list with respect to a score. The ranking function is often
                                (i.e., potential citations that are irrelevant to the given trained by utilizing a large amount of labeled data which
                                patent) is much harder [2]. In this paper, we focus on includes both negative and positive samples.
                                the impact of negative sampling in the context of patent                                                  Several techniques [8, 9] have been proposed to gener-
                                citation recommendation and its role in improving the ate negative samples from a dataset that contains positive
                                performance of citation recommendation systems.                                                        samples as well as unlabeled samples. Negative sampling
                                    Patent citation recommendation [3, 1, 2] is quite chal- aims to find the best technique to select the most rep-
                                lenging due to the ever-increasing number of available resentative negative instances from a given dataset. In
                                patents, as well as their complex structure, and the usage the context of patent citation recommendation systems,
                                of domain-specific vocabulary. Manually, finding poten- the positive samples are the patents’ actual citations, and
                                tially relevant citations from a massive amount of patents each unlabeled sample could belong either to the pos-
                                is time-consuming and expensive. Therefore, efficient itive class or the negative class based on the content
                                and effective tools for automatically recommending cita- of the given patent. The type and proportion of nega-
                                tions for patents have become indispensable. In contrast tive samples play an important role in the performance
                                to the paper citations, patent citations carry economic of such systems. In other words, it is essential for the
                                and legal significance [2]. In other words, missing prior performance of the ranking model to be trained on repre-
                                                                                                                                       sentative samples from each class which helps the model
                                PatentSemTech'23: 4th Workshop on Patent Text Mining and                                               to distinguish between the positive and negative sam-
                                Semantic Technologies, July 27th, 2023, Taipei, Taiwan.
                                                                                                                                       ples. Although several negative sampling approaches
                                $ rima.dessi@fiz-karlsruhe.de (R. Dessí);
                                hidir.aras@fiz-karlsruhe.de (H. Aras);                                                                 have been proposed for the recommender systems [8, 9],
                                mehwish.alam@telecom-paris.fr (M. Alam)                                                                none of the mentioned approaches specifically have been
                                 0000-0001-8332-7241 (R. Dessí); 0000-0002-3117-4885 (H. Aras); applied to the patent domain. They seem to work well
                                0000-0002-7867-6612 (M. Alam)                                                                          with item recommendation systems, however, it is impor-
                                          © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                          Attribution 4.0 International (CC BY 4.0).                                                   tant to note that the user-item relation differs from the
                                    CEUR
                                    Workshop
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073
                                                                       CEUR Workshop Proceedings (CEUR-WS.org)




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings

                                                                                                                               39
patent-citation relation. In other words, each citation ac-        prior art search and assessing the patentability of patent
tually is a patent, so patents and citations can be modeled        applications. To this end, the proposed model exploits,
in the same way to find relevancy. However, users and              textual content, and bibliographic information of the
items should be represented differently. For instance, to          patents as well as the citations assigned by the patent
model a user there exist different types of features such          applicant.
as age, country, gender, purchase history, etc. Yet patents           The aforementioned studies show that there is a large
are mostly modeled based on their textual content, e.g.,           room for improvement in the recommendation results.
title, abstract, and claims.                                       In this paper, we focus on exploring the impact of the
   In this paper, we explore the impact of negative sam-           negative sampling strategy on patent citation recommen-
pling on the ranking of patent citation recommendations.           dation.
To this end, we investigate three different sampling tech-         Negative Sampling Techniques Despite the impor-
niques namely, random, nearest-neighbor, and the Co-               tance of negative sampling for recommender systems,
operative Patent Classification (CPC) code-based. After            literature on this topic is quite limited. [9] proposes a
sampling, we train a transformer-based ranking model               negative sampling method for graph-based user-item rec-
separately for each dataset and compare the results. Addi-         ommendation systems. The model is sophisticated and
tionally, we analyze the impact of different feature combi-        cannot be easily applied to the other recommendation
nations (e.g., abstract, claim, title) as well as the effect of    systems, e.g., citation recommendation due to the nature
varying negative sample proportions on the performance             of the data. The model divides the items into three re-
of the ranking system.                                             gions based on the distance to the positive items. The
   Overall, the main contributions of the paper are as             experiments suggest that selecting negative samples from
follows:                                                           the intermediate level (i.e., items that are not too far from
                                                                   the positive samples) provides better performance than
         • Generating training data for patent citation rank-      the items that are very close or too far from the positive
           ing systems using various negative sampling tech-       samples. [8] presents a negative sampling model which
           niques and different proportions of negative sam-       is specifically designed for graph neural networks for
           ples.                                                   collaborative filtering. The model utilizes a user-item
         • Demonstration of the impact of the negative sam-        graph to generate the negative samples.
           ples on the performance of a transformer-based             The studies discussed in this section are mostly fo-
           ranking model.                                          cused on items and users, however, our study focuses
         • We release 4 different datasets1 which can be ex-       on patents. The patents pose the following challenges
           ploited for the patent citation recommendation          as compared to previously discussed systems: (1) often,
           task.                                                   patent-citation data is more sparse in comparison to user-
                                                                   item interaction data. Therefore, it is quite challenging
                                                                   to find the most relevant and similar patents. (2) Patents
2. Related Work                                                    have a unique structure that consists of textual data (e.g.,
This study aims to explore the impact of negative sam-             title, abstract, claim, description) as well as metadata (e.g.,
pling on patent citation recommendation systems, hence             CPC and IPC code, family information, etc.).
this section presents prior related studies on Patent Ci-
tation Recommendation and Negative Sampling Tech-                  3. Patent Citation Ranking Model
niques.
Patent Citation Recommendation Recent works [3,                    Citation recommendation (CR) systems assist patent ap-
1, 2] employ machine learning approaches for patent                plicants, examiners, etc. to find relevant patents that
citation recommendation. The proposed citation recom-              can be cited for patents under consideration. Similar to
mendation frameworks consist of 2 main phases namely,              general recommendation systems, CR systems consist in
retrieval (i.e., candidate generation) and ranking. The            general of 2 main steps namely, retrieval and ranking.
first stage of [2] is based on textual similarity to gener-        In the retrieval phase, various techniques are used to
ate the candidate list, and for the second step, RankSVM           identify a candidate list of citations that are potentially
is utilized to rank the generated candidates. The most             relevant to the given patent. In the second phase, the
recent study [1] utilizes cosine similarity for the candi-         selected candidates are ranked with the ranking system
date generation phase, whereas for the ranking phase a             often by applying different machine learning methods.
deep neural network model is proposed. Moreover, [3]               The scores are usually 𝑃 (𝑦|𝑋), the probability of y given
presents a patent citation recommendation system for               X such that y is a potential citation and the X is a patent.
patent examiners who are usually responsible for the               In order to compute such probability in the context of
1
                                                                   patent citation ranking systems both contextual features
    https://doi.org/10.5281/zenodo.7870197




                                                              40
                                                                    4.1. Negative Sampling Methods
                                                                    In this study, we investigate three different negative sam-
                                                                    pling methods to assess the performance of the citation
                                                                    ranking model (see Section 3) as well as demonstrate the
                                                                    significance of these samples on the performance.
                                                                       Following the exploited techniques are explained:

                                                                         • Random Sampling: In this method, the negative
                                                                           samples are selected randomly. The recommen-
                                                                           dation datasets consist of positive samples as well
                                                                           as unlabeled samples. The negative samples are
                                                                           randomly selected from the unlabeled samples
                                                                           for each patent.
                                                                         • Nearest Neighbor Sampling: First, all the
Figure 1: The general architectural overview of the ranking                patents and their citations are embedded into
model.                                                                     common vector space by exploiting the Sentence
                                                                           Transformers with BERT for Patents2 which has
                                                                           been trained by Google on over 100M patents.
of the citation and the patent are exploited.                              In order to obtain the embedding representation
   In this paper, we narrow our focus to explore the im-                   of patents and citations the abstracts have been
pact of different negative sampling techniques and pro-                    exploited. In the second step, to find the near-
portions on the performance of the patent citation rank-                   est neighbor for each patent in the vector space,
ing model.                                                                 Faiss3 , a library for efficient similarity search of
   To this end, we design a transformer-based ranking                      dense vectors is used.
model which is capable of ranking relevant as well as                    • CPC code-base Sampling: The Cooperative
irrelevant citations based on a given patent accurately.                   Patent Classification (CPC4 ) is a system that is
Figure 1 illustrates the ranking model, i.e., the deep neural              utilized to classify patents based on their techni-
network model that has been designed for this study. It                    cal features. The classification system consists
consists of a transformer block which is integrated as a                   of 9 main sections A-H and Y. Each main section
layer, followed by a pooling layer, a dense layer, and a                   consists of classes and subclasses. For generating
final sigmoid layer. The model takes as an input textual                   negative samples, given a patent, we select the
parts of patents and potential citations, such as abstracts,               negative samples from the unlabeled examples
claims, and titles. Then the output of the model is 𝑃 (𝑦 =                 of a given dataset by ensuring that the selected
1|𝑋), where Y is a binary class label (either 1 or 0). The                 instances have the identical CPC subclass code
input of the model is 2 pieces of text both from a patent                  as the patent.
and its potential citation (e.g., title, abstract, claim, etc.),
and the output is the relevancy score of the citation to               It should be noted that the Nearest Neighbor Sampling
the given patent. Figure 1 illustrates an example of input          and CPC code-based Sampling techniques aim to enable
patent and its potential citation, first, the abstracts are         the model to distinguish between relevant and irrelevant
tokenized and the embeddings of the tokens are utilized             citations from semantically similar as well as within the
as an input to the transformer block. The embeddings                same technical field, respectively.
are randomly initialized.
                                                                    4.2. Generated Datasets
4. Experimental Results                                 In order to apply different negative sampling methods
                                                        (see Section 4.1) first we randomly collected around
In this section first, we present the negative sampling
                                                        250,000 US patents from Google Patents5 . Each patent
methods that we proposed. Second, the datasets that
                                                        has roughly on average 27 citations. The positive samples
have been generated by applying the selected sampling
                                                        are constructed by pairing patents with their actual cita-
techniques. Finally, we illustrate the obtained experi-
                                                        tions. Since, this paper explores the impact of negative
mental results by exploiting the proposed ranking model
which was trained on the generated datasets.            2
                                                          https://huggingface.co/anferico/bert-for-patents
                                                                    3
                                                                      https://faiss.ai/
                                                                    4
                                                                      https://www.epo.org/searching-for-patents/helpful-
                                                                      resources/first-time-here/classification/cpc.html
                                                                    5
                                                                      https://pypi.org/project/google-patent-scraper/




                                                               41
sampling techniques as well as the proportion of nega-             Table 1
tive samples on the performance of the patent citation             Comparison of Performance for Different Negative Sampling
ranking model, 2 different datasets have been generated.           Techniques
In the first dataset, the focus is on investigating the differ-                Sampling Method         Accuracy
ent negative sampling techniques whereas, in the second                             Random                0.887
dataset, the focus is on examining the impact of different                      nearest-neighbor          0.71
proportions of negative samples.                                                 CPC subclass             0.70
   By applying the above techniques we generated three
different datasets which are utilized to investigate the
impact of negative sampling techniques. The number of              Table 2
                                                                   Comparison of Performance for Different Negative Sampling
generated negative samples is equal to the number of ex-
                                                                   Proportion
isting positive samples in the dataset to ensure a balanced
dataset. Due to the computational difficulties, we selected              Negative Sample Proportion             Accuracy
1 million samples from each generated dataset. In or-                   0.67 (2 neg. samples for each pos.)       0.888
der to compare the performance of the ranking model                     0.75 (3 neg. samples for each pos.)       0.891
on different negative sampling techniques we trained                    0.83 (5 neg. samples for each pos.)       0.911
three distinct ranking models by utilizing the generated
datasets.                                                          Table 3
   Further datasets have been generated to explore the             Comparison of Performance for Different Feature Combina-
effect of negative sample proportions. In other words, for         tions
each positive pair, a varying number of negative samples                     Feature Combination         Accuracy
i.e., 2, 3, and 5 are generated randomly. Similarly, for
                                                                                     Abstract                 0.887
each dataset, three distinct ranking models are trained.                              Claim                   0.868
                                                                                      Title                   0.504
4.3. Evaluation of Patent Citation
     Ranking Model with the Generated
                                                             and CPC subclass-based. The random sampling approach
     Datasets
                                                             which is the most straightforward one provides the best
In order to assess the performance of the ranking model performance with 0.887 accuracy. The reason that more
three different sets of experiments have been conducted. diverse samples have been created with random sampling
In each experiment, the transformer-based ranking model is that this enables the model to distinguish between rele-
(see Section 3) has been trained and evaluated based on a vant and irrelevant citations. According to Table 1 results,
given dataset. As mentioned before, the datasets consist it can be concluded that cited patents are semantically
of positive and negative samples, where each positive similar as well as share the same technical content.
sample is the actual citation of corresponding patents          Table 2 presents experimental results of the ranking
and the negative samples are the generated ones that are model on datasets which contain different proportions
the irrelevant citations of corresponding patents.           of randomly selected negative samples. According to the
   In the first and second sets of experiments (see Table 1 results presented in this table as the number of negative
and 2), the model takes the abstract of a patent and a samples increases, the accuracy also increases. Conven-
potential citation as input and computes the probability tionally, when training a machine-learning model it is a
score which is used as a ranking system for the given common practice to have a balanced dataset that consists
pair. The threshold of the ranking model is set to 0.5. of roughly, an equal number of positive and negative
The potential citation is considered to be relevant if the samples. However, depending on the problem and the
score is above the threshold, otherwise, it is considered domain, an imbalanced dataset could yield higher accu-
to be irrelevant. In the third set of experiments (see racy than a balanced dataset. For instance, for image
Table 3), the same ranking system has been applied with classification, the experimental result of [7] shows that
different features. In other words, abstract, claim, and the imbalanced dataset enhances the performance of the
title of patents and citations have been utilized distinctly ranking algorithm. Similarly in our experiments, the best
as input to the ranking model, to explore the impact performance (see Table 2) has been achieved with the
of individual features on identifying the relevant and imbalanced dataset. The reason here can be attributed to
irrelevant citations.                                        the model’s ability to distinguish positive samples from
   Table 1 illustrates the performance of the ranking negative samples by being trained mostly with negative
model on datasets that have been created by the different samples. Further, the results also show that patents cite
sampling techniques, namely, random, nearest-neighbor relevant patents and often there are no missing citations.




                                                              42
   Finally, Table 3 illustrates the accuracy of the rank-                 References
ing model on different feature combinations. Typically,
claims of a patent give a clear definition of what the      [1] J. Choi, J. Lee, J. Yoon, S. Jang, J. Kim, S. Choi, A two-
patent legally protects, and the abstract gives a brief         stage deep learning-based system for patent citation
summary of the technical content of patent documents.           recommendation, Scientometrics (2022).
Claims are often long and hard to model as a feature of a   [2] S. Oh, Z. Lei, W. Lee, P. Mitra, J. Yen, CV-PCR: a
transformer-based ranking model due to their complex-           context-guided value-driven framework for patent
ity. Therefore, in order to use claims as a feature, we         citation recommendation, in: CIKM, 2013.
collected from each patent and citation their first inde-   [3] T. Fu, Z. Lei, W. Lee, Patent citation recommendation
pendent claims6 which present the fundamental features          for examiners, in: ICDM, IEEE Computer Society,
of the invention. In other words, a claim focuses on a          2015.
single characteristic of the invention, whereas an abstract [4] T. Jiang, D. Wang, L. Sun, et al., Lightxml: Trans-
provides a brief summary of the information presented           former with dynamic negative sampling for high-
in the description, claims, and drawings. Therefore, the        performance extreme multi-label text classification,
abstract carries more information in comparison to single       in: AAAI, 2021.
claims. Titles are often short and do not carry sufficient  [5] R. Türker, L. Zhang, M. Alam, H. Sack, Weakly super-
semantic information alone to help the model distinguish        vised short text categorization using world knowl-
between relevant and irrelevant.                                edge, in: ISWC, 2020.
   Exploding all dependent and independent claims as        [6] R. Türker, L. Zhang, M. Koutraki, H. Sack,
input to the ranking model would probably increase the          Knowledge-based short text categorization using en-
accuracy due to more contextual information. However,           tity and category embedding, in: ESWC, 2019.
claims are often long text, therefore it requires special   [7] F. Perronnin, Z. Akata, Z. Harchaoui, C. Schmid,
effort to be modeled efficiently and effectively. We leave      Towards good practice in large-scale learning for
this as our next future work.                                   image classification, in: 2012 IEEE Conference on
   Overall, based on the experiments it can be concluded        Computer Vision and Pattern Recognition, 2012.
that negative sampling techniques that are being em-        [8] T. Huang, Y. Dong, M. Ding, et al., Mixgcf: An
ployed and the negative sample proportion play a signif-        improved training method for graph neural network-
icant role in the patent recommendation system.                 based recommender systems, in: SIGKDD, 2021.
                                                            [9] Z. Yang, M. Ding, X. Zou, J. Tang, B. Xu, C. Zhou,
                                                                H. Yang, Region or global a principle for negative
5. Conclusion and Future Work                                   sampling in graph-based recommendation, IEEE
                                                                Transactions on Knowledge and Data Engineering
This paper targets the problem of negative sampling ap-         (2022).
proaches for the patent citation recommendation. More
specifically, it proposes a transformers-based architec-
ture for ranking citations for citation recommendation.
The features used for this purpose include patent title,
abstract, and claims. It further performs an experimen-
tal comparison of various negative sampling approaches
for patent recommendations such as random negative
sampling, negative sampling based on nearest neighbor
as well as CPC class hierarchy. The experiments were
conducted on newly generated datasets extracted from
Google patents. The results suggest that random neg-
ative sampling performs the best in terms of accuracy.
Moreover, the most effective features are the patent ab-
stract and the claim. In future work, we plan to employ a
retrieval model to generate a candidate list for each given
patent and then apply the ranking model to the candidate
list to present a complete patent citation recommendation
system.




6
    https://new.epo.org/en/legal/guidelines-epc/2023/f_iv_3_4.html




                                                                     43