1. Introduction

Taipei, Taiwan. $ rima.dessi@fiz-karlsruhe.de (R. Dessí); hidir.aras@fiz-karlsruhe.de (H. Aras); mehwish.alam@telecom-paris.fr (M. Alam)

Exploring the Impact of Negative Sampling on Patent Citation Recommendation

Rima Dessí

Hidir Aras

Mehwish Alam

1 0 FIZ Karlsruhe - Leibniz Institute for Information Infrastructure , Germany 1 Telecom Paris Institut Polytechnique de Paris , France

2023

000 0 0001

Due to the increasing number of patents being published every day, patent citation recommendations have become one of the challenging tasks. Since patent citations may lead to legal and economic consequences, patent recommendations are even more challenging as compared to scientific article citations. One of the crucial components of the patent citation algorithm is negative sampling which is also a part of many other tasks such as text classification, knowledge graph completion, etc. This paper, particularly focuses on proposing a transformer-based ranking model for patent recommendations. It further experimentally compares the performance of patent recommendations based on various state-of-the-art negative sampling approaches to measure and compare the efectiveness of these approaches to aid future developments. These experiments are performed on a newly collected dataset of US patents from Google patents.

1. Introduction

relevant patents can have critical outcomes for patent applicants. Furthermore, the number of citations that the Negative sampling is a crucial task for several applica- patent receives can determine the business value of the tions such as recommender systems [1, 2, 3], text clas- patent. Therefore, identifying the right prior art patents sification [ 4, 5, 6], computer vision [7], etc. In order to to be cited is quite a significant task for both the patent train a machine learning model it is essential to have an applicant and the examiner. accurately labeled dataset that includes suficient posi- Recently, several patent citation recommendation systive and negative samples for each class. However, in tems have been proposed [3, 1, 2]. Most of the approaches many applications such as recommender systems obtain- are based on two steps, i.e., retrieval and ranking. While ing negative samples is quite a challenging task. In fact, the retrieval phase aims to find the most relevant citait is easy to collect positive samples for the patent ci- tion candidates, the ranking phase focuses on ranking tation recommendation system by considering patents’ the most relevant potential citations from the candidate actual citations, however, generating negative samples list with respect to a score. The ranking function is often (i.e., potential citations that are irrelevant to the given trained by utilizing a large amount of labeled data which patent) is much harder [2]. In this paper, we focus on includes both negative and positive samples. the impact of negative sampling in the context of patent Several techniques [8, 9] have been proposed to genercitation recommendation and its role in improving the ate negative samples from a dataset that contains positive performance of citation recommendation systems. samples as well as unlabeled samples. Negative sampling

Patent citation recommendation [3, 1, 2] is quite chal- aims to find the best technique to select the most replenging due to the ever-increasing number of available resentative negative instances from a given dataset. In patents, as well as their complex structure, and the usage the context of patent citation recommendation systems, of domain-specific vocabulary. Manually, finding poten- the positive samples are the patents’ actual citations, and tially relevant citations from a massive amount of patents each unlabeled sample could belong either to the posis time-consuming and expensive. Therefore, eficient itive class or the negative class based on the content and efective tools for automatically recommending cita- of the given patent. The type and proportion of negations for patents have become indispensable. In contrast tive samples play an important role in the performance to the paper citations, patent citations carry economic of such systems. In other words, it is essential for the and legal significance [ 2]. In other words, missing prior performance of the ranking model to be trained on representative samples from each class which helps the model to distinguish between the positive and negative samples. Although several negative sampling approaches have been proposed for the recommender systems [8, 9], none of the mentioned approaches specifically have been applied to the patent domain. They seem to work well with item recommendation systems, however, it is important to note that the user-item relation difers from the patent-citation relation. In other words, each citation ac- prior art search and assessing the patentability of patent tually is a patent, so patents and citations can be modeled applications. To this end, the proposed model exploits, in the same way to find relevancy. However, users and textual content, and bibliographic information of the items should be represented diferently. For instance, to patents as well as the citations assigned by the patent model a user there exist diferent types of features such applicant. as age, country, gender, purchase history, etc. Yet patents The aforementioned studies show that there is a large are mostly modeled based on their textual content, e.g., room for improvement in the recommendation results. title, abstract, and claims. In this paper, we focus on exploring the impact of the

In this paper, we explore the impact of negative sam- negative sampling strategy on patent citation recommenpling on the ranking of patent citation recommendations. dation.

To this end, we investigate three diferent sampling tech- Negative Sampling Techniques Despite the imporniques namely, random, nearest-neighbor, and the Co- tance of negative sampling for recommender systems, operative Patent Classification (CPC) code-based. After literature on this topic is quite limited. [9] proposes a sampling, we train a transformer-based ranking model negative sampling method for graph-based user-item recseparately for each dataset and compare the results. Addi- ommendation systems. The model is sophisticated and tionally, we analyze the impact of diferent feature combi- cannot be easily applied to the other recommendation nations (e.g., abstract, claim, title) as well as the efect of systems, e.g., citation recommendation due to the nature varying negative sample proportions on the performance of the data. The model divides the items into three reof the ranking system. gions based on the distance to the positive items. The

Overall, the main contributions of the paper are as experiments suggest that selecting negative samples from follows: the intermediate level (i.e., items that are not too far from the positive samples) provides better performance than • Generating training data for patent citation rank- the items that are very close or too far from the positive ing systems using various negative sampling tech- samples. [8] presents a negative sampling model which niques and diferent proportions of negative sam- is specifically designed for graph neural networks for ples. collaborative filtering. The model utilizes a user-item • Demonstration of the impact of the negative sam- graph to generate the negative samples. ples on the performance of a transformer-based The studies discussed in this section are mostly foranking model. cused on items and users, however, our study focuses • We release 4 diferent datasets 1 which can be ex- on patents. The patents pose the following challenges ploited for the patent citation recommendation as compared to previously discussed systems: (1) often, task. patent-citation data is more sparse in comparison to useritem interaction data. Therefore, it is quite challenging 2. Related Work to find the most relevant and similar patents. (2) Patents have a unique structure that consists of textual data (e.g., title, abstract, claim, description) as well as metadata (e.g., CPC and IPC code, family information, etc.).

This study aims to explore the impact of negative sam

pling on patent citation recommendation systems, hence this section presents prior related studies on Patent Citation Recommendation and Negative Sampling Tech- 3. Patent Citation Ranking Model niques.

Patent Citation Recommendation Recent works [3, Citation recommendation (CR) systems assist patent ap1, 2] employ machine learning approaches for patent plicants, examiners, etc. to find relevant patents that citation recommendation. The proposed citation recom- can be cited for patents under consideration. Similar to mendation frameworks consist of 2 main phases namely, general recommendation systems, CR systems consist in retrieval (i.e., candidate generation) and ranking. The general of 2 main steps namely, retrieval and ranking. ifrst stage of [ 2] is based on textual similarity to gener- In the retrieval phase, various techniques are used to ate the candidate list, and for the second step, RankSVM identify a candidate list of citations that are potentially is utilized to rank the generated candidates. The most relevant to the given patent. In the second phase, the recent study [1] utilizes cosine similarity for the candi- selected candidates are ranked with the ranking system date generation phase, whereas for the ranking phase a often by applying diferent machine learning methods. deep neural network model is proposed. Moreover, [3] The scores are usually (|), the probability of y given presents a patent citation recommendation system for X such that y is a potential citation and the X is a patent. patent examiners who are usually responsible for the In order to compute such probability in the context of patent citation ranking systems both contextual features of the citation and the patent are exploited.

In this paper, we narrow our focus to explore the impact of diferent negative sampling techniques and proportions on the performance of the patent citation ranking model.

To this end, we design a transformer-based ranking model which is capable of ranking relevant as well as irrelevant citations based on a given patent accurately.

Figure 1 illustrates the ranking model, i.e., the deep neural network model that has been designed for this study. It consists of a transformer block which is integrated as a layer, followed by a pooling layer, a dense layer, and a ifnal sigmoid layer. The model takes as an input textual parts of patents and potential citations, such as abstracts, claims, and titles. Then the output of the model is ( = 1|), where Y is a binary class label (either 1 or 0). The input of the model is 2 pieces of text both from a patent and its potential citation (e.g., title, abstract, claim, etc.), and the output is the relevancy score of the citation to the given patent. Figure 1 illustrates an example of input patent and its potential citation, first, the abstracts are tokenized and the embeddings of the tokens are utilized as an input to the transformer block. The embeddings are randomly initialized.

4. Experimental Results

In this section first, we present the negative sampling methods that we proposed. Second, the datasets that have been generated by applying the selected sampling techniques. Finally, we illustrate the obtained experimental results by exploiting the proposed ranking model which was trained on the generated datasets.

4.1. Negative Sampling Methods In this study, we investigate three diferent negative sam

pling methods to assess the performance of the citation ranking model (see Section 3) as well as demonstrate the significance of these samples on the performance.

Following the exploited techniques are explained: • Random Sampling: In this method, the negative samples are selected randomly. The recommendation datasets consist of positive samples as well as unlabeled samples. The negative samples are randomly selected from the unlabeled samples for each patent. • Nearest Neighbor Sampling: First, all the patents and their citations are embedded into common vector space by exploiting the Sentence Transformers with BERT for Patents2 which has been trained by Google on over 100M patents. In order to obtain the embedding representation of patents and citations the abstracts have been exploited. In the second step, to find the nearest neighbor for each patent in the vector space, Faiss3, a library for eficient similarity search of dense vectors is used. • CPC code-base Sampling: The Cooperative Patent Classification (CPC 4) is a system that is utilized to classify patents based on their technical features. The classification system consists of 9 main sections A-H and Y. Each main section consists of classes and subclasses. For generating negative samples, given a patent, we select the negative samples from the unlabeled examples of a given dataset by ensuring that the selected instances have the identical CPC subclass code as the patent.

It should be noted that the Nearest Neighbor Sampling

and CPC code-based Sampling techniques aim to enable the model to distinguish between relevant and irrelevant citations from semantically similar as well as within the same technical field, respectively.

4.2. Generated Datasets

In order to apply diferent negative sampling methods (see Section 4.1) first we randomly collected around 250,000 US patents from Google Patents5. Each patent has roughly on average 27 citations. The positive samples are constructed by pairing patents with their actual citations. Since, this paper explores the impact of negative 2https://huggingface.co/anferico/bert-for-patents 3https://faiss.ai/ 4https://www.epo.org/searching-for-patents/helpfulresources/first-time-here/classification/cpc.html 5https://pypi.org/project/google-patent-scraper/ sampling techniques as well as the proportion of nega- Table 1 tive samples on the performance of the patent citation Comparison of Performance for Diferent Negative Sampling ranking model, 2 diferent datasets have been generated. Techniques In the first dataset, the focus is on investigating the difer- Sampling Method Accuracy ent negative sampling techniques whereas, in the second Random 0.887 dataset, the focus is on examining the impact of diferent nearest-neighbor 0.71 proportions of negative samples. CPC subclass 0.70

By applying the above techniques we generated three diferent datasets which are utilized to investigate the impact of negative sampling techniques. The number of Table 2 generated negative samples is equal to the number of ex- Comparison of Performance for Diferent Negative Sampling isting positive samples in the dataset to ensure a balanced Proportion dataset. Due to the computational dificulties, we selected Negative Sample Proportion Accuracy 1 million samples from each generated dataset. In or- 0.67 (2 neg. samples for each pos.) 0.888 der to compare the performance of the ranking model 0.75 (3 neg. samples for each pos.) 0.891 on diferent negative sampling techniques we trained 0.83 (5 neg. samples for each pos.) 0.911 three distinct ranking models by utilizing the generated datasets. Table 3

Further datasets have been generated to explore the Comparison of Performance for Diferent Feature Combinaefect of negative sample proportions. In other words, for tions each positive pair, a varying number of negative samples Feature Combination Accuracy i.e., 2, 3, and 5 are generated randomly. Similarly, for each dataset, three distinct ranking models are trained. 0.887 0.868 0.504

4.3. Evaluation of Patent Citation Ranking Model with the Generated Datasets

and CPC subclass-based. The random sampling approach which is the most straightforward one provides the best In order to assess the performance of the ranking model performance with 0.887 accuracy. The reason that more three diferent sets of experiments have been conducted. diverse samples have been created with random sampling In each experiment, the transformer-based ranking model is that this enables the model to distinguish between rele(see Section 3) has been trained and evaluated based on a vant and irrelevant citations. According to Table 1 results, given dataset. As mentioned before, the datasets consist it can be concluded that cited patents are semantically of positive and negative samples, where each positive similar as well as share the same technical content. sample is the actual citation of corresponding patents Table 2 presents experimental results of the ranking and the negative samples are the generated ones that are model on datasets which contain diferent proportions the irrelevant citations of corresponding patents. of randomly selected negative samples. According to the

In the first and second sets of experiments (see Table 1 results presented in this table as the number of negative and 2), the model takes the abstract of a patent and a samples increases, the accuracy also increases. Convenpotential citation as input and computes the probability tionally, when training a machine-learning model it is a score which is used as a ranking system for the given common practice to have a balanced dataset that consists pair. The threshold of the ranking model is set to 0.5. of roughly, an equal number of positive and negative The potential citation is considered to be relevant if the samples. However, depending on the problem and the score is above the threshold, otherwise, it is considered domain, an imbalanced dataset could yield higher accuto be irrelevant. In the third set of experiments (see racy than a balanced dataset. For instance, for image Table 3), the same ranking system has been applied with classification, the experimental result of [ 7] shows that diferent features. In other words, abstract, claim, and the imbalanced dataset enhances the performance of the title of patents and citations have been utilized distinctly ranking algorithm. Similarly in our experiments, the best as input to the ranking model, to explore the impact performance (see Table 2) has been achieved with the of individual features on identifying the relevant and imbalanced dataset. The reason here can be attributed to irrelevant citations. the model’s ability to distinguish positive samples from

Table 1 illustrates the performance of the ranking negative samples by being trained mostly with negative model on datasets that have been created by the diferent samples. Further, the results also show that patents cite sampling techniques, namely, random, nearest-neighbor relevant patents and often there are no missing citations.

Finally, Table 3 illustrates the accuracy of the rank

ing model on diferent feature combinations. Typically, claims of a patent give a clear definition of what the [1] J. Choi, J. Lee, J. Yoon, S. Jang, J. Kim, S. Choi, A twopatent legally protects, and the abstract gives a brief stage deep learning-based system for patent citation summary of the technical content of patent documents. recommendation, Scientometrics (2022). Claims are often long and hard to model as a feature of a [2] S. Oh, Z. Lei, W. Lee, P. Mitra, J. Yen, CV-PCR: a transformer-based ranking model due to their complex- context-guided value-driven framework for patent ity. Therefore, in order to use claims as a feature, we citation recommendation, in: CIKM, 2013. collected from each patent and citation their first inde- [3] T. Fu, Z. Lei, W. Lee, Patent citation recommendation pendent claims6 which present the fundamental features for examiners, in: ICDM, IEEE Computer Society, of the invention. In other words, a claim focuses on a 2015. single characteristic of the invention, whereas an abstract [4] T. Jiang, D. Wang, L. Sun, et al., Lightxml: Transprovides a brief summary of the information presented former with dynamic negative sampling for highin the description, claims, and drawings. Therefore, the performance extreme multi-label text classification, abstract carries more information in comparison to single in: AAAI, 2021. claims. Titles are often short and do not carry suficient [5] R. Türker, L. Zhang, M. Alam, H. Sack, Weakly supersemantic information alone to help the model distinguish vised short text categorization using world knowlbetween relevant and irrelevant. edge, in: ISWC, 2020.

Exploding all dependent and independent claims as [6] R. Türker, L. Zhang, M. Koutraki, H. Sack, input to the ranking model would probably increase the Knowledge-based short text categorization using enaccuracy due to more contextual information. However, tity and category embedding, in: ESWC, 2019. claims are often long text, therefore it requires special [7] F. Perronnin, Z. Akata, Z. Harchaoui, C. Schmid, efort to be modeled eficiently and efectively. We leave Towards good practice in large-scale learning for this as our next future work. image classification, in: 2012 IEEE Conference on

Overall, based on the experiments it can be concluded Computer Vision and Pattern Recognition, 2012. that negative sampling techniques that are being em- [8] T. Huang, Y. Dong, M. Ding, et al., Mixgcf: An ployed and the negative sample proportion play a signif- improved training method for graph neural networkicant role in the patent recommendation system. based recommender systems, in: SIGKDD, 2021. [9] Z. Yang, M. Ding, X. Zou, J. Tang, B. Xu, C. Zhou,

H. Yang, Region or global a principle for negative 5. Conclusion and Future Work sampling in graph-based recommendation, IEEE Transactions on Knowledge and Data Engineering (2022).

This paper targets the problem of negative sampling ap

proaches for the patent citation recommendation. More specifically, it proposes a transformers-based architecture for ranking citations for citation recommendation. The features used for this purpose include patent title, abstract, and claims. It further performs an experimental comparison of various negative sampling approaches for patent recommendations such as random negative sampling, negative sampling based on nearest neighbor as well as CPC class hierarchy. The experiments were conducted on newly generated datasets extracted from Google patents. The results suggest that random negative sampling performs the best in terms of accuracy. Moreover, the most efective features are the patent abstract and the claim. In future work, we plan to employ a retrieval model to generate a candidate list for each given patent and then apply the ranking model to the candidate list to present a complete patent citation recommendation system.