<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Taipei, Taiwan.
$ rima.dessi@fiz-karlsruhe.de (R. Dessí);
hidir.aras@fiz-karlsruhe.de (H. Aras);
mehwish.alam@telecom-paris.fr (M. Alam)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Exploring the Impact of Negative Sampling on Patent Citation Recommendation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rima Dessí</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hidir Aras</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mehwish Alam</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FIZ Karlsruhe - Leibniz Institute for Information Infrastructure</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Telecom Paris Institut Polytechnique de Paris</institution>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Due to the increasing number of patents being published every day, patent citation recommendations have become one of the challenging tasks. Since patent citations may lead to legal and economic consequences, patent recommendations are even more challenging as compared to scientific article citations. One of the crucial components of the patent citation algorithm is negative sampling which is also a part of many other tasks such as text classification, knowledge graph completion, etc. This paper, particularly focuses on proposing a transformer-based ranking model for patent recommendations. It further experimentally compares the performance of patent recommendations based on various state-of-the-art negative sampling approaches to measure and compare the efectiveness of these approaches to aid future developments. These experiments are performed on a newly collected dataset of US patents from Google patents.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>relevant patents can have critical outcomes for patent
applicants. Furthermore, the number of citations that the
Negative sampling is a crucial task for several applica- patent receives can determine the business value of the
tions such as recommender systems [1, 2, 3], text clas- patent. Therefore, identifying the right prior art patents
sification [ 4, 5, 6], computer vision [7], etc. In order to to be cited is quite a significant task for both the patent
train a machine learning model it is essential to have an applicant and the examiner.
accurately labeled dataset that includes suficient posi- Recently, several patent citation recommendation
systive and negative samples for each class. However, in tems have been proposed [3, 1, 2]. Most of the approaches
many applications such as recommender systems obtain- are based on two steps, i.e., retrieval and ranking. While
ing negative samples is quite a challenging task. In fact, the retrieval phase aims to find the most relevant
citait is easy to collect positive samples for the patent ci- tion candidates, the ranking phase focuses on ranking
tation recommendation system by considering patents’ the most relevant potential citations from the candidate
actual citations, however, generating negative samples list with respect to a score. The ranking function is often
(i.e., potential citations that are irrelevant to the given trained by utilizing a large amount of labeled data which
patent) is much harder [2]. In this paper, we focus on includes both negative and positive samples.
the impact of negative sampling in the context of patent Several techniques [8, 9] have been proposed to
genercitation recommendation and its role in improving the ate negative samples from a dataset that contains positive
performance of citation recommendation systems. samples as well as unlabeled samples. Negative sampling</p>
      <p>Patent citation recommendation [3, 1, 2] is quite chal- aims to find the best technique to select the most
replenging due to the ever-increasing number of available resentative negative instances from a given dataset. In
patents, as well as their complex structure, and the usage the context of patent citation recommendation systems,
of domain-specific vocabulary. Manually, finding poten- the positive samples are the patents’ actual citations, and
tially relevant citations from a massive amount of patents each unlabeled sample could belong either to the
posis time-consuming and expensive. Therefore, eficient itive class or the negative class based on the content
and efective tools for automatically recommending cita- of the given patent. The type and proportion of
negations for patents have become indispensable. In contrast tive samples play an important role in the performance
to the paper citations, patent citations carry economic of such systems. In other words, it is essential for the
and legal significance [ 2]. In other words, missing prior performance of the ranking model to be trained on
representative samples from each class which helps the model
to distinguish between the positive and negative
samples. Although several negative sampling approaches
have been proposed for the recommender systems [8, 9],
none of the mentioned approaches specifically have been
applied to the patent domain. They seem to work well
with item recommendation systems, however, it is
important to note that the user-item relation difers from the
patent-citation relation. In other words, each citation ac- prior art search and assessing the patentability of patent
tually is a patent, so patents and citations can be modeled applications. To this end, the proposed model exploits,
in the same way to find relevancy. However, users and textual content, and bibliographic information of the
items should be represented diferently. For instance, to patents as well as the citations assigned by the patent
model a user there exist diferent types of features such applicant.
as age, country, gender, purchase history, etc. Yet patents The aforementioned studies show that there is a large
are mostly modeled based on their textual content, e.g., room for improvement in the recommendation results.
title, abstract, and claims. In this paper, we focus on exploring the impact of the</p>
      <p>In this paper, we explore the impact of negative sam- negative sampling strategy on patent citation
recommenpling on the ranking of patent citation recommendations. dation.</p>
      <p>To this end, we investigate three diferent sampling tech- Negative Sampling Techniques Despite the
imporniques namely, random, nearest-neighbor, and the Co- tance of negative sampling for recommender systems,
operative Patent Classification (CPC) code-based. After literature on this topic is quite limited. [9] proposes a
sampling, we train a transformer-based ranking model negative sampling method for graph-based user-item
recseparately for each dataset and compare the results. Addi- ommendation systems. The model is sophisticated and
tionally, we analyze the impact of diferent feature combi- cannot be easily applied to the other recommendation
nations (e.g., abstract, claim, title) as well as the efect of systems, e.g., citation recommendation due to the nature
varying negative sample proportions on the performance of the data. The model divides the items into three
reof the ranking system. gions based on the distance to the positive items. The</p>
      <p>Overall, the main contributions of the paper are as experiments suggest that selecting negative samples from
follows: the intermediate level (i.e., items that are not too far from
the positive samples) provides better performance than
• Generating training data for patent citation rank- the items that are very close or too far from the positive
ing systems using various negative sampling tech- samples. [8] presents a negative sampling model which
niques and diferent proportions of negative sam- is specifically designed for graph neural networks for
ples. collaborative filtering. The model utilizes a user-item
• Demonstration of the impact of the negative sam- graph to generate the negative samples.
ples on the performance of a transformer-based The studies discussed in this section are mostly
foranking model. cused on items and users, however, our study focuses
• We release 4 diferent datasets 1 which can be ex- on patents. The patents pose the following challenges
ploited for the patent citation recommendation as compared to previously discussed systems: (1) often,
task. patent-citation data is more sparse in comparison to
useritem interaction data. Therefore, it is quite challenging
2. Related Work to find the most relevant and similar patents. (2) Patents
have a unique structure that consists of textual data (e.g.,
title, abstract, claim, description) as well as metadata (e.g.,
CPC and IPC code, family information, etc.).</p>
      <sec id="sec-1-1">
        <title>This study aims to explore the impact of negative sam</title>
        <p>pling on patent citation recommendation systems, hence
this section presents prior related studies on Patent
Citation Recommendation and Negative Sampling Tech- 3. Patent Citation Ranking Model
niques.</p>
        <p>Patent Citation Recommendation Recent works [3, Citation recommendation (CR) systems assist patent
ap1, 2] employ machine learning approaches for patent plicants, examiners, etc. to find relevant patents that
citation recommendation. The proposed citation recom- can be cited for patents under consideration. Similar to
mendation frameworks consist of 2 main phases namely, general recommendation systems, CR systems consist in
retrieval (i.e., candidate generation) and ranking. The general of 2 main steps namely, retrieval and ranking.
ifrst stage of [ 2] is based on textual similarity to gener- In the retrieval phase, various techniques are used to
ate the candidate list, and for the second step, RankSVM identify a candidate list of citations that are potentially
is utilized to rank the generated candidates. The most relevant to the given patent. In the second phase, the
recent study [1] utilizes cosine similarity for the candi- selected candidates are ranked with the ranking system
date generation phase, whereas for the ranking phase a often by applying diferent machine learning methods.
deep neural network model is proposed. Moreover, [3] The scores are usually  (|), the probability of y given
presents a patent citation recommendation system for X such that y is a potential citation and the X is a patent.
patent examiners who are usually responsible for the In order to compute such probability in the context of
patent citation ranking systems both contextual features
of the citation and the patent are exploited.</p>
        <p>In this paper, we narrow our focus to explore the
impact of diferent negative sampling techniques and
proportions on the performance of the patent citation
ranking model.</p>
        <p>To this end, we design a transformer-based ranking
model which is capable of ranking relevant as well as
irrelevant citations based on a given patent accurately.</p>
        <p>Figure 1 illustrates the ranking model, i.e., the deep neural
network model that has been designed for this study. It
consists of a transformer block which is integrated as a
layer, followed by a pooling layer, a dense layer, and a
ifnal sigmoid layer. The model takes as an input textual
parts of patents and potential citations, such as abstracts,
claims, and titles. Then the output of the model is  ( =
1|), where Y is a binary class label (either 1 or 0). The
input of the model is 2 pieces of text both from a patent
and its potential citation (e.g., title, abstract, claim, etc.),
and the output is the relevancy score of the citation to
the given patent. Figure 1 illustrates an example of input
patent and its potential citation, first, the abstracts are
tokenized and the embeddings of the tokens are utilized
as an input to the transformer block. The embeddings
are randomly initialized.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4. Experimental Results</title>
      <p>In this section first, we present the negative sampling
methods that we proposed. Second, the datasets that
have been generated by applying the selected sampling
techniques. Finally, we illustrate the obtained
experimental results by exploiting the proposed ranking model
which was trained on the generated datasets.</p>
      <sec id="sec-2-1">
        <title>4.1. Negative Sampling Methods</title>
        <sec id="sec-2-1-1">
          <title>In this study, we investigate three diferent negative sam</title>
          <p>pling methods to assess the performance of the citation
ranking model (see Section 3) as well as demonstrate the
significance of these samples on the performance.</p>
          <p>Following the exploited techniques are explained:
• Random Sampling: In this method, the negative
samples are selected randomly. The
recommendation datasets consist of positive samples as well
as unlabeled samples. The negative samples are
randomly selected from the unlabeled samples
for each patent.
• Nearest Neighbor Sampling: First, all the
patents and their citations are embedded into
common vector space by exploiting the Sentence
Transformers with BERT for Patents2 which has
been trained by Google on over 100M patents.
In order to obtain the embedding representation
of patents and citations the abstracts have been
exploited. In the second step, to find the
nearest neighbor for each patent in the vector space,
Faiss3, a library for eficient similarity search of
dense vectors is used.
• CPC code-base Sampling: The Cooperative
Patent Classification (CPC 4) is a system that is
utilized to classify patents based on their
technical features. The classification system consists
of 9 main sections A-H and Y. Each main section
consists of classes and subclasses. For generating
negative samples, given a patent, we select the
negative samples from the unlabeled examples
of a given dataset by ensuring that the selected
instances have the identical CPC subclass code
as the patent.</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>It should be noted that the Nearest Neighbor Sampling</title>
          <p>and CPC code-based Sampling techniques aim to enable
the model to distinguish between relevant and irrelevant
citations from semantically similar as well as within the
same technical field, respectively.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>4.2. Generated Datasets</title>
        <p>In order to apply diferent negative sampling methods
(see Section 4.1) first we randomly collected around
250,000 US patents from Google Patents5. Each patent
has roughly on average 27 citations. The positive samples
are constructed by pairing patents with their actual
citations. Since, this paper explores the impact of negative
2https://huggingface.co/anferico/bert-for-patents
3https://faiss.ai/
4https://www.epo.org/searching-for-patents/helpfulresources/first-time-here/classification/cpc.html
5https://pypi.org/project/google-patent-scraper/
sampling techniques as well as the proportion of nega- Table 1
tive samples on the performance of the patent citation Comparison of Performance for Diferent Negative Sampling
ranking model, 2 diferent datasets have been generated. Techniques
In the first dataset, the focus is on investigating the difer- Sampling Method Accuracy
ent negative sampling techniques whereas, in the second Random 0.887
dataset, the focus is on examining the impact of diferent nearest-neighbor 0.71
proportions of negative samples. CPC subclass 0.70</p>
        <p>By applying the above techniques we generated three
diferent datasets which are utilized to investigate the
impact of negative sampling techniques. The number of Table 2
generated negative samples is equal to the number of ex- Comparison of Performance for Diferent Negative Sampling
isting positive samples in the dataset to ensure a balanced Proportion
dataset. Due to the computational dificulties, we selected Negative Sample Proportion Accuracy
1 million samples from each generated dataset. In or- 0.67 (2 neg. samples for each pos.) 0.888
der to compare the performance of the ranking model 0.75 (3 neg. samples for each pos.) 0.891
on diferent negative sampling techniques we trained 0.83 (5 neg. samples for each pos.) 0.911
three distinct ranking models by utilizing the generated
datasets. Table 3</p>
        <p>Further datasets have been generated to explore the Comparison of Performance for Diferent Feature
Combinaefect of negative sample proportions. In other words, for tions
each positive pair, a varying number of negative samples Feature Combination Accuracy
i.e., 2, 3, and 5 are generated randomly. Similarly, for
each dataset, three distinct ranking models are trained.
0.887
0.868
0.504</p>
      </sec>
      <sec id="sec-2-3">
        <title>4.3. Evaluation of Patent Citation</title>
      </sec>
      <sec id="sec-2-4">
        <title>Ranking Model with the Generated</title>
      </sec>
      <sec id="sec-2-5">
        <title>Datasets</title>
        <p>and CPC subclass-based. The random sampling approach
which is the most straightforward one provides the best
In order to assess the performance of the ranking model performance with 0.887 accuracy. The reason that more
three diferent sets of experiments have been conducted. diverse samples have been created with random sampling
In each experiment, the transformer-based ranking model is that this enables the model to distinguish between
rele(see Section 3) has been trained and evaluated based on a vant and irrelevant citations. According to Table 1 results,
given dataset. As mentioned before, the datasets consist it can be concluded that cited patents are semantically
of positive and negative samples, where each positive similar as well as share the same technical content.
sample is the actual citation of corresponding patents Table 2 presents experimental results of the ranking
and the negative samples are the generated ones that are model on datasets which contain diferent proportions
the irrelevant citations of corresponding patents. of randomly selected negative samples. According to the</p>
        <p>In the first and second sets of experiments (see Table 1 results presented in this table as the number of negative
and 2), the model takes the abstract of a patent and a samples increases, the accuracy also increases.
Convenpotential citation as input and computes the probability tionally, when training a machine-learning model it is a
score which is used as a ranking system for the given common practice to have a balanced dataset that consists
pair. The threshold of the ranking model is set to 0.5. of roughly, an equal number of positive and negative
The potential citation is considered to be relevant if the samples. However, depending on the problem and the
score is above the threshold, otherwise, it is considered domain, an imbalanced dataset could yield higher
accuto be irrelevant. In the third set of experiments (see racy than a balanced dataset. For instance, for image
Table 3), the same ranking system has been applied with classification, the experimental result of [ 7] shows that
diferent features. In other words, abstract, claim, and the imbalanced dataset enhances the performance of the
title of patents and citations have been utilized distinctly ranking algorithm. Similarly in our experiments, the best
as input to the ranking model, to explore the impact performance (see Table 2) has been achieved with the
of individual features on identifying the relevant and imbalanced dataset. The reason here can be attributed to
irrelevant citations. the model’s ability to distinguish positive samples from</p>
        <p>Table 1 illustrates the performance of the ranking negative samples by being trained mostly with negative
model on datasets that have been created by the diferent samples. Further, the results also show that patents cite
sampling techniques, namely, random, nearest-neighbor relevant patents and often there are no missing citations.</p>
        <sec id="sec-2-5-1">
          <title>Finally, Table 3 illustrates the accuracy of the rank</title>
          <p>ing model on diferent feature combinations. Typically,
claims of a patent give a clear definition of what the [1] J. Choi, J. Lee, J. Yoon, S. Jang, J. Kim, S. Choi, A
twopatent legally protects, and the abstract gives a brief stage deep learning-based system for patent citation
summary of the technical content of patent documents. recommendation, Scientometrics (2022).
Claims are often long and hard to model as a feature of a [2] S. Oh, Z. Lei, W. Lee, P. Mitra, J. Yen, CV-PCR: a
transformer-based ranking model due to their complex- context-guided value-driven framework for patent
ity. Therefore, in order to use claims as a feature, we citation recommendation, in: CIKM, 2013.
collected from each patent and citation their first inde- [3] T. Fu, Z. Lei, W. Lee, Patent citation recommendation
pendent claims6 which present the fundamental features for examiners, in: ICDM, IEEE Computer Society,
of the invention. In other words, a claim focuses on a 2015.
single characteristic of the invention, whereas an abstract [4] T. Jiang, D. Wang, L. Sun, et al., Lightxml:
Transprovides a brief summary of the information presented former with dynamic negative sampling for
highin the description, claims, and drawings. Therefore, the performance extreme multi-label text classification,
abstract carries more information in comparison to single in: AAAI, 2021.
claims. Titles are often short and do not carry suficient [5] R. Türker, L. Zhang, M. Alam, H. Sack, Weakly
supersemantic information alone to help the model distinguish vised short text categorization using world
knowlbetween relevant and irrelevant. edge, in: ISWC, 2020.</p>
          <p>Exploding all dependent and independent claims as [6] R. Türker, L. Zhang, M. Koutraki, H. Sack,
input to the ranking model would probably increase the Knowledge-based short text categorization using
enaccuracy due to more contextual information. However, tity and category embedding, in: ESWC, 2019.
claims are often long text, therefore it requires special [7] F. Perronnin, Z. Akata, Z. Harchaoui, C. Schmid,
efort to be modeled eficiently and efectively. We leave Towards good practice in large-scale learning for
this as our next future work. image classification, in: 2012 IEEE Conference on</p>
          <p>Overall, based on the experiments it can be concluded Computer Vision and Pattern Recognition, 2012.
that negative sampling techniques that are being em- [8] T. Huang, Y. Dong, M. Ding, et al., Mixgcf: An
ployed and the negative sample proportion play a signif- improved training method for graph neural
networkicant role in the patent recommendation system. based recommender systems, in: SIGKDD, 2021.
[9] Z. Yang, M. Ding, X. Zou, J. Tang, B. Xu, C. Zhou,</p>
          <p>H. Yang, Region or global a principle for negative
5. Conclusion and Future Work sampling in graph-based recommendation, IEEE
Transactions on Knowledge and Data Engineering
(2022).</p>
        </sec>
        <sec id="sec-2-5-2">
          <title>This paper targets the problem of negative sampling ap</title>
          <p>proaches for the patent citation recommendation. More
specifically, it proposes a transformers-based
architecture for ranking citations for citation recommendation.
The features used for this purpose include patent title,
abstract, and claims. It further performs an
experimental comparison of various negative sampling approaches
for patent recommendations such as random negative
sampling, negative sampling based on nearest neighbor
as well as CPC class hierarchy. The experiments were
conducted on newly generated datasets extracted from
Google patents. The results suggest that random
negative sampling performs the best in terms of accuracy.
Moreover, the most efective features are the patent
abstract and the claim. In future work, we plan to employ a
retrieval model to generate a candidate list for each given
patent and then apply the ranking model to the candidate
list to present a complete patent citation recommendation
system.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>