=Paper=
{{Paper
|id=Vol-3604/paper3
|storemode=property
|title=Exploring the Impact of Negative Sampling on Patent Citation Recommendation
|pdfUrl=https://ceur-ws.org/Vol-3604/paper3.pdf
|volume=Vol-3604
|authors=Rima Dessi,Hidir Aras,Mehwish Alam
|dblpUrl=https://dblp.org/rec/conf/patentsemtech/DessiAA23
}}
==Exploring the Impact of Negative Sampling on Patent Citation Recommendation==
Exploring the Impact of Negative Sampling on Patent Citation Recommendation Rima Dessí1 , Hidir Aras1 and Mehwish Alam2 1 FIZ Karlsruhe - Leibniz Institute for Information Infrastructure, Germany 2 Telecom Paris Institut Polytechnique de Paris, France Abstract Due to the increasing number of patents being published every day, patent citation recommendations have become one of the challenging tasks. Since patent citations may lead to legal and economic consequences, patent recommendations are even more challenging as compared to scientific article citations. One of the crucial components of the patent citation algorithm is negative sampling which is also a part of many other tasks such as text classification, knowledge graph completion, etc. This paper, particularly focuses on proposing a transformer-based ranking model for patent recommendations. It further experimentally compares the performance of patent recommendations based on various state-of-the-art negative sampling approaches to measure and compare the effectiveness of these approaches to aid future developments. These experiments are performed on a newly collected dataset of US patents from Google patents. 1. Introduction relevant patents can have critical outcomes for patent applicants. Furthermore, the number of citations that the Negative sampling is a crucial task for several applica- patent receives can determine the business value of the tions such as recommender systems [1, 2, 3], text clas- patent. Therefore, identifying the right prior art patents sification [4, 5, 6], computer vision [7], etc. In order to to be cited is quite a significant task for both the patent train a machine learning model it is essential to have an applicant and the examiner. accurately labeled dataset that includes sufficient posi- Recently, several patent citation recommendation sys- tive and negative samples for each class. However, in tems have been proposed [3, 1, 2]. Most of the approaches many applications such as recommender systems obtain- are based on two steps, i.e., retrieval and ranking. While ing negative samples is quite a challenging task. In fact, the retrieval phase aims to find the most relevant cita- it is easy to collect positive samples for the patent ci- tion candidates, the ranking phase focuses on ranking tation recommendation system by considering patents’ the most relevant potential citations from the candidate actual citations, however, generating negative samples list with respect to a score. The ranking function is often (i.e., potential citations that are irrelevant to the given trained by utilizing a large amount of labeled data which patent) is much harder [2]. In this paper, we focus on includes both negative and positive samples. the impact of negative sampling in the context of patent Several techniques [8, 9] have been proposed to gener- citation recommendation and its role in improving the ate negative samples from a dataset that contains positive performance of citation recommendation systems. samples as well as unlabeled samples. Negative sampling Patent citation recommendation [3, 1, 2] is quite chal- aims to find the best technique to select the most rep- lenging due to the ever-increasing number of available resentative negative instances from a given dataset. In patents, as well as their complex structure, and the usage the context of patent citation recommendation systems, of domain-specific vocabulary. Manually, finding poten- the positive samples are the patents’ actual citations, and tially relevant citations from a massive amount of patents each unlabeled sample could belong either to the pos- is time-consuming and expensive. Therefore, efficient itive class or the negative class based on the content and effective tools for automatically recommending cita- of the given patent. The type and proportion of nega- tions for patents have become indispensable. In contrast tive samples play an important role in the performance to the paper citations, patent citations carry economic of such systems. In other words, it is essential for the and legal significance [2]. In other words, missing prior performance of the ranking model to be trained on repre- sentative samples from each class which helps the model PatentSemTech'23: 4th Workshop on Patent Text Mining and to distinguish between the positive and negative sam- Semantic Technologies, July 27th, 2023, Taipei, Taiwan. ples. Although several negative sampling approaches $ rima.dessi@fiz-karlsruhe.de (R. Dessí); hidir.aras@fiz-karlsruhe.de (H. Aras); have been proposed for the recommender systems [8, 9], mehwish.alam@telecom-paris.fr (M. Alam) none of the mentioned approaches specifically have been 0000-0001-8332-7241 (R. Dessí); 0000-0002-3117-4885 (H. Aras); applied to the patent domain. They seem to work well 0000-0002-7867-6612 (M. Alam) with item recommendation systems, however, it is impor- © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). tant to note that the user-item relation differs from the CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 39 patent-citation relation. In other words, each citation ac- prior art search and assessing the patentability of patent tually is a patent, so patents and citations can be modeled applications. To this end, the proposed model exploits, in the same way to find relevancy. However, users and textual content, and bibliographic information of the items should be represented differently. For instance, to patents as well as the citations assigned by the patent model a user there exist different types of features such applicant. as age, country, gender, purchase history, etc. Yet patents The aforementioned studies show that there is a large are mostly modeled based on their textual content, e.g., room for improvement in the recommendation results. title, abstract, and claims. In this paper, we focus on exploring the impact of the In this paper, we explore the impact of negative sam- negative sampling strategy on patent citation recommen- pling on the ranking of patent citation recommendations. dation. To this end, we investigate three different sampling tech- Negative Sampling Techniques Despite the impor- niques namely, random, nearest-neighbor, and the Co- tance of negative sampling for recommender systems, operative Patent Classification (CPC) code-based. After literature on this topic is quite limited. [9] proposes a sampling, we train a transformer-based ranking model negative sampling method for graph-based user-item rec- separately for each dataset and compare the results. Addi- ommendation systems. The model is sophisticated and tionally, we analyze the impact of different feature combi- cannot be easily applied to the other recommendation nations (e.g., abstract, claim, title) as well as the effect of systems, e.g., citation recommendation due to the nature varying negative sample proportions on the performance of the data. The model divides the items into three re- of the ranking system. gions based on the distance to the positive items. The Overall, the main contributions of the paper are as experiments suggest that selecting negative samples from follows: the intermediate level (i.e., items that are not too far from the positive samples) provides better performance than • Generating training data for patent citation rank- the items that are very close or too far from the positive ing systems using various negative sampling tech- samples. [8] presents a negative sampling model which niques and different proportions of negative sam- is specifically designed for graph neural networks for ples. collaborative filtering. The model utilizes a user-item • Demonstration of the impact of the negative sam- graph to generate the negative samples. ples on the performance of a transformer-based The studies discussed in this section are mostly fo- ranking model. cused on items and users, however, our study focuses • We release 4 different datasets1 which can be ex- on patents. The patents pose the following challenges ploited for the patent citation recommendation as compared to previously discussed systems: (1) often, task. patent-citation data is more sparse in comparison to user- item interaction data. Therefore, it is quite challenging to find the most relevant and similar patents. (2) Patents 2. Related Work have a unique structure that consists of textual data (e.g., This study aims to explore the impact of negative sam- title, abstract, claim, description) as well as metadata (e.g., pling on patent citation recommendation systems, hence CPC and IPC code, family information, etc.). this section presents prior related studies on Patent Ci- tation Recommendation and Negative Sampling Tech- 3. Patent Citation Ranking Model niques. Patent Citation Recommendation Recent works [3, Citation recommendation (CR) systems assist patent ap- 1, 2] employ machine learning approaches for patent plicants, examiners, etc. to find relevant patents that citation recommendation. The proposed citation recom- can be cited for patents under consideration. Similar to mendation frameworks consist of 2 main phases namely, general recommendation systems, CR systems consist in retrieval (i.e., candidate generation) and ranking. The general of 2 main steps namely, retrieval and ranking. first stage of [2] is based on textual similarity to gener- In the retrieval phase, various techniques are used to ate the candidate list, and for the second step, RankSVM identify a candidate list of citations that are potentially is utilized to rank the generated candidates. The most relevant to the given patent. In the second phase, the recent study [1] utilizes cosine similarity for the candi- selected candidates are ranked with the ranking system date generation phase, whereas for the ranking phase a often by applying different machine learning methods. deep neural network model is proposed. Moreover, [3] The scores are usually 𝑃 (𝑦|𝑋), the probability of y given presents a patent citation recommendation system for X such that y is a potential citation and the X is a patent. patent examiners who are usually responsible for the In order to compute such probability in the context of 1 patent citation ranking systems both contextual features https://doi.org/10.5281/zenodo.7870197 40 4.1. Negative Sampling Methods In this study, we investigate three different negative sam- pling methods to assess the performance of the citation ranking model (see Section 3) as well as demonstrate the significance of these samples on the performance. Following the exploited techniques are explained: • Random Sampling: In this method, the negative samples are selected randomly. The recommen- dation datasets consist of positive samples as well as unlabeled samples. The negative samples are randomly selected from the unlabeled samples for each patent. • Nearest Neighbor Sampling: First, all the Figure 1: The general architectural overview of the ranking patents and their citations are embedded into model. common vector space by exploiting the Sentence Transformers with BERT for Patents2 which has been trained by Google on over 100M patents. of the citation and the patent are exploited. In order to obtain the embedding representation In this paper, we narrow our focus to explore the im- of patents and citations the abstracts have been pact of different negative sampling techniques and pro- exploited. In the second step, to find the near- portions on the performance of the patent citation rank- est neighbor for each patent in the vector space, ing model. Faiss3 , a library for efficient similarity search of To this end, we design a transformer-based ranking dense vectors is used. model which is capable of ranking relevant as well as • CPC code-base Sampling: The Cooperative irrelevant citations based on a given patent accurately. Patent Classification (CPC4 ) is a system that is Figure 1 illustrates the ranking model, i.e., the deep neural utilized to classify patents based on their techni- network model that has been designed for this study. It cal features. The classification system consists consists of a transformer block which is integrated as a of 9 main sections A-H and Y. Each main section layer, followed by a pooling layer, a dense layer, and a consists of classes and subclasses. For generating final sigmoid layer. The model takes as an input textual negative samples, given a patent, we select the parts of patents and potential citations, such as abstracts, negative samples from the unlabeled examples claims, and titles. Then the output of the model is 𝑃 (𝑦 = of a given dataset by ensuring that the selected 1|𝑋), where Y is a binary class label (either 1 or 0). The instances have the identical CPC subclass code input of the model is 2 pieces of text both from a patent as the patent. and its potential citation (e.g., title, abstract, claim, etc.), and the output is the relevancy score of the citation to It should be noted that the Nearest Neighbor Sampling the given patent. Figure 1 illustrates an example of input and CPC code-based Sampling techniques aim to enable patent and its potential citation, first, the abstracts are the model to distinguish between relevant and irrelevant tokenized and the embeddings of the tokens are utilized citations from semantically similar as well as within the as an input to the transformer block. The embeddings same technical field, respectively. are randomly initialized. 4.2. Generated Datasets 4. Experimental Results In order to apply different negative sampling methods (see Section 4.1) first we randomly collected around In this section first, we present the negative sampling 250,000 US patents from Google Patents5 . Each patent methods that we proposed. Second, the datasets that has roughly on average 27 citations. The positive samples have been generated by applying the selected sampling are constructed by pairing patents with their actual cita- techniques. Finally, we illustrate the obtained experi- tions. Since, this paper explores the impact of negative mental results by exploiting the proposed ranking model which was trained on the generated datasets. 2 https://huggingface.co/anferico/bert-for-patents 3 https://faiss.ai/ 4 https://www.epo.org/searching-for-patents/helpful- resources/first-time-here/classification/cpc.html 5 https://pypi.org/project/google-patent-scraper/ 41 sampling techniques as well as the proportion of nega- Table 1 tive samples on the performance of the patent citation Comparison of Performance for Different Negative Sampling ranking model, 2 different datasets have been generated. Techniques In the first dataset, the focus is on investigating the differ- Sampling Method Accuracy ent negative sampling techniques whereas, in the second Random 0.887 dataset, the focus is on examining the impact of different nearest-neighbor 0.71 proportions of negative samples. CPC subclass 0.70 By applying the above techniques we generated three different datasets which are utilized to investigate the impact of negative sampling techniques. The number of Table 2 Comparison of Performance for Different Negative Sampling generated negative samples is equal to the number of ex- Proportion isting positive samples in the dataset to ensure a balanced dataset. Due to the computational difficulties, we selected Negative Sample Proportion Accuracy 1 million samples from each generated dataset. In or- 0.67 (2 neg. samples for each pos.) 0.888 der to compare the performance of the ranking model 0.75 (3 neg. samples for each pos.) 0.891 on different negative sampling techniques we trained 0.83 (5 neg. samples for each pos.) 0.911 three distinct ranking models by utilizing the generated datasets. Table 3 Further datasets have been generated to explore the Comparison of Performance for Different Feature Combina- effect of negative sample proportions. In other words, for tions each positive pair, a varying number of negative samples Feature Combination Accuracy i.e., 2, 3, and 5 are generated randomly. Similarly, for Abstract 0.887 each dataset, three distinct ranking models are trained. Claim 0.868 Title 0.504 4.3. Evaluation of Patent Citation Ranking Model with the Generated and CPC subclass-based. The random sampling approach Datasets which is the most straightforward one provides the best In order to assess the performance of the ranking model performance with 0.887 accuracy. The reason that more three different sets of experiments have been conducted. diverse samples have been created with random sampling In each experiment, the transformer-based ranking model is that this enables the model to distinguish between rele- (see Section 3) has been trained and evaluated based on a vant and irrelevant citations. According to Table 1 results, given dataset. As mentioned before, the datasets consist it can be concluded that cited patents are semantically of positive and negative samples, where each positive similar as well as share the same technical content. sample is the actual citation of corresponding patents Table 2 presents experimental results of the ranking and the negative samples are the generated ones that are model on datasets which contain different proportions the irrelevant citations of corresponding patents. of randomly selected negative samples. According to the In the first and second sets of experiments (see Table 1 results presented in this table as the number of negative and 2), the model takes the abstract of a patent and a samples increases, the accuracy also increases. Conven- potential citation as input and computes the probability tionally, when training a machine-learning model it is a score which is used as a ranking system for the given common practice to have a balanced dataset that consists pair. The threshold of the ranking model is set to 0.5. of roughly, an equal number of positive and negative The potential citation is considered to be relevant if the samples. However, depending on the problem and the score is above the threshold, otherwise, it is considered domain, an imbalanced dataset could yield higher accu- to be irrelevant. In the third set of experiments (see racy than a balanced dataset. For instance, for image Table 3), the same ranking system has been applied with classification, the experimental result of [7] shows that different features. In other words, abstract, claim, and the imbalanced dataset enhances the performance of the title of patents and citations have been utilized distinctly ranking algorithm. Similarly in our experiments, the best as input to the ranking model, to explore the impact performance (see Table 2) has been achieved with the of individual features on identifying the relevant and imbalanced dataset. The reason here can be attributed to irrelevant citations. the model’s ability to distinguish positive samples from Table 1 illustrates the performance of the ranking negative samples by being trained mostly with negative model on datasets that have been created by the different samples. Further, the results also show that patents cite sampling techniques, namely, random, nearest-neighbor relevant patents and often there are no missing citations. 42 Finally, Table 3 illustrates the accuracy of the rank- References ing model on different feature combinations. Typically, claims of a patent give a clear definition of what the [1] J. Choi, J. Lee, J. Yoon, S. Jang, J. Kim, S. Choi, A two- patent legally protects, and the abstract gives a brief stage deep learning-based system for patent citation summary of the technical content of patent documents. recommendation, Scientometrics (2022). Claims are often long and hard to model as a feature of a [2] S. Oh, Z. Lei, W. Lee, P. Mitra, J. Yen, CV-PCR: a transformer-based ranking model due to their complex- context-guided value-driven framework for patent ity. Therefore, in order to use claims as a feature, we citation recommendation, in: CIKM, 2013. collected from each patent and citation their first inde- [3] T. Fu, Z. Lei, W. Lee, Patent citation recommendation pendent claims6 which present the fundamental features for examiners, in: ICDM, IEEE Computer Society, of the invention. In other words, a claim focuses on a 2015. single characteristic of the invention, whereas an abstract [4] T. Jiang, D. Wang, L. Sun, et al., Lightxml: Trans- provides a brief summary of the information presented former with dynamic negative sampling for high- in the description, claims, and drawings. Therefore, the performance extreme multi-label text classification, abstract carries more information in comparison to single in: AAAI, 2021. claims. Titles are often short and do not carry sufficient [5] R. Türker, L. Zhang, M. Alam, H. Sack, Weakly super- semantic information alone to help the model distinguish vised short text categorization using world knowl- between relevant and irrelevant. edge, in: ISWC, 2020. Exploding all dependent and independent claims as [6] R. Türker, L. Zhang, M. Koutraki, H. Sack, input to the ranking model would probably increase the Knowledge-based short text categorization using en- accuracy due to more contextual information. However, tity and category embedding, in: ESWC, 2019. claims are often long text, therefore it requires special [7] F. Perronnin, Z. Akata, Z. Harchaoui, C. Schmid, effort to be modeled efficiently and effectively. We leave Towards good practice in large-scale learning for this as our next future work. image classification, in: 2012 IEEE Conference on Overall, based on the experiments it can be concluded Computer Vision and Pattern Recognition, 2012. that negative sampling techniques that are being em- [8] T. Huang, Y. Dong, M. Ding, et al., Mixgcf: An ployed and the negative sample proportion play a signif- improved training method for graph neural network- icant role in the patent recommendation system. based recommender systems, in: SIGKDD, 2021. [9] Z. Yang, M. Ding, X. Zou, J. Tang, B. Xu, C. Zhou, H. Yang, Region or global a principle for negative 5. Conclusion and Future Work sampling in graph-based recommendation, IEEE Transactions on Knowledge and Data Engineering This paper targets the problem of negative sampling ap- (2022). proaches for the patent citation recommendation. More specifically, it proposes a transformers-based architec- ture for ranking citations for citation recommendation. The features used for this purpose include patent title, abstract, and claims. It further performs an experimen- tal comparison of various negative sampling approaches for patent recommendations such as random negative sampling, negative sampling based on nearest neighbor as well as CPC class hierarchy. The experiments were conducted on newly generated datasets extracted from Google patents. The results suggest that random neg- ative sampling performs the best in terms of accuracy. Moreover, the most effective features are the patent ab- stract and the claim. In future work, we plan to employ a retrieval model to generate a candidate list for each given patent and then apply the ranking model to the candidate list to present a complete patent citation recommendation system. 6 https://new.epo.org/en/legal/guidelines-epc/2023/f_iv_3_4.html 43