<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Eficient Patent Searching Using Graph Transformers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Krzysztof Daniell</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Igor Buzhinsky</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sebastian Björkqvist</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IPRally Technologies Oy</institution>
          ,
          <addr-line>Helsinki</addr-line>
          ,
          <country country="FI">Finland</country>
        </aff>
      </contrib-group>
      <fpage>51</fpage>
      <lpage>59</lpage>
      <abstract>
        <p>Finding relevant prior art is crucial when deciding whether to file a new patent application or invalidate an existing patent. However, searching for prior art is challenging due to the large number of patent documents and the need for nuanced comparisons to determine novelty. An accurate search engine is therefore invaluable for speeding up the process. We present a Graph Transformer-based dense retrieval method for patent searching where each invention is represented by a graph describing its features and their relationships. Our model processes these invention graphs and is trained using prior art citations from patent ofice examiners as relevance signals. Using graphs as input significantly improves the computational eficiency of processing long documents, while leveraging examiner citations allows the model to learn domain-specific similarities beyond simple text-based matching. The result is a search engine that emulates how professional patent examiners identify relevant documents. We compare our approach against publicly available text embedding models and show substantial improvements in both prior art retrieval quality and computational eficiency.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;patent search</kwd>
        <kwd>prior art search</kwd>
        <kwd>information retrieval</kwd>
        <kwd>dense retrieval</kwd>
        <kwd>graph neural networks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Companies and individuals can protect their intellectual property by filing a patent application for
their inventions. When granted, a patent provides its holder exclusive rights to the invention for a
limited time. In exchange, the invention is publicly disclosed. The patent application process can be
quite costly and time-consuming, so applicants typically seek a high certainty of it eventually being
granted before initiating the application process. Since patents are granted only for novel inventions,
conducting a prior art search is important to avoid unnecessary costs and delays. Finding relevant
prior art is challenging due to the large number of existing patent documents and the detailed analysis
required to distinguish novelty-destroying prior art from merely similar inventions. An efective patent
search engine will both speed up the process and improve the results.</p>
      <p>In this work, we present an approach for patent searching using graph representations of inventions
as input to a Graph Transformer network. The graph of a patent document describes the core of the
invention disclosed in that document, namely the features and the relationships between them. This
representation condenses the original document, significantly reducing the computational resources
required to process the document. The graphs are used as input to a Graph Transformer model trained
on patent ofice examiner citations created during the application process for a patent. This enables
the model to capture relationships between similar inventions, even when described using diferent
terms. Furthermore, examiner citations help the model learn domain-specific terminology across
diferent technical domains. The result is a patent search engine that eficiently retrieves relevant
novelty-destroying prior art.
6th Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech) 2025
$ krzysztof@iprally.com (K. Daniell); igor@iprally.com (I. Buzhinsky); sebastian@iprally.com (S. Björkqvist)
0009-0006-5959-1804 (K. Daniell); 0000-0003-3713-6051 (I. Buzhinsky); 0009-0006-9039-8623 (S. Björkqvist)
© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
a motor;
an auger driven by the motor to rotate;
a handle device for a user to operate;
an auger housing for containing the auger; and
a frame for connecting the handle device and the auger housing;
wherein the auger housing is made of at least two different materials.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <sec id="sec-2-1">
        <title>2.1. Patent Search Methods</title>
        <p>
          Traditionally, patent searching has relied on Boolean search, where text-based matching is done using
Boolean operators like AND, OR, and NOT [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Multiple Boolean search tools for patent searching exist,
both paid [
          <xref ref-type="bibr" rid="ref2 ref3 ref4">2, 3, 4</xref>
          ] and free [
          <xref ref-type="bibr" rid="ref5 ref6 ref7">5, 6, 7</xref>
          ]. Performing high-quality Boolean searches often requires multiple
iterations and substantial domain expertise to identify the most relevant terminology [8].
        </p>
        <p>Machine learning methods have been explored for patent searching to address the limitations of
Boolean searching. Traditional approaches including TF-IDF [9, 10] and word embeddings [11] have
been applied. More recently, deep learning models such as BERT [12, 13, 14] and GPT-2 [15] have also
been explored. In addition, several commercial machine learning-based patent search tools are available
[16, 17, 18].</p>
        <p>In our previous work [19], we describe a graph-based patent search engine where a Tree-LSTM model
is trained for dense retrieval using patent ofice examiner citations as relevance signals. This paper
extends that work by replacing the Tree-LSTM model with a Graph Transformer and improving the
training procedure, as detailed in Section 3.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Graph Transformers</title>
        <p>Graph Transformers adapt the Transformer architecture to graph-structured data [20]. Early models,
such as Graphormer [21] and SAN [22], used full attention over all node pairs, which becomes
computationally prohibitive as graph size grows—a critical concern for large patent corpora. Sparse Graph
Transformers [23] address this by restricting attention to actual edges or small neighborhoods,
drastically reducing overhead while preserving power for long-range dependencies. This is useful for patent
searching where long documents revolving around a limited set of distinct features or relationships are
typical. Mapping these concepts to nodes, with edges only where genuine conceptual links exist, yields
a much sparser graph than a fully connected alternative.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Context-Based Representations</title>
        <p>A recent approach in language modeling is using concept-level representations instead of token-based
architectures. Large Concept Models [24] can be used to embed entire sentences—or “concepts”—into
a high-dimensional, language-agnostic space rather than embedding every token separately. This
perspective parallels our approach, where each key textual segment (e.g., a feature of the invention) is
treated as a “concept node.” We reduce computational overhead by applying sparse attention to these
conceptual links yet still preserve the broad relationships essential for performing patent searches.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Approach</title>
      <p>Our patent search engine consists of two main parts: We first convert each patent document into a
graph that describes the invention, as described in Section 3.1. We then use a Graph Transformer
model, described in Section 3.2, to embed the graph into a vector space for dense retrieval. The Graph
Transformer model is trained for patent searching using patent examiner citations as relevance signals,
as described in Sections 3.3 and 3.4.</p>
      <p>The main diferences to our previous work [ 19] are replacing the Tree-LSTM model with a Graph
Transformer and enhancements to the training procedure.</p>
      <sec id="sec-3-1">
        <title>3.1. Graphs</title>
        <p>We convert each patent document into a graph to capture its core features and relationships while
avoiding the overhead of full-text processing. An example of a graph is seen in Figure 1. Each node
corresponds to a key feature of the invention (e.g., snowthrower or motor) or a text snippet indicating a
relationship (e.g., frame for connecting handle device and auger housing). These relationships include
hierarchical ones—such as part-of (meronym) or example-of (hyponym)—and functional ones, linking
specific features to describe how they interoperate. By focusing on the essential technical structure,
the graph becomes much smaller than the raw text while still conveying the essential features of the
invention. This method aligns with how professional examiners perform novelty evaluations: they
isolate core features and examine how they interrelate.</p>
        <p>The details of how the graphs are created are described in our previous work [19]. In short, we first
detect the features of the invention by doing a linguistic analysis with a natural language processing
model. To find the relationships between the features, we use a set of hand-crafted rules designed to
mirror the way patent professionals identify core inventive concepts and their interconnections from
patent text. These rules detect terms describing relationships (e.g., comprising, connecting, containing)
and use the output from the linguistic analysis phase to match the relationships with the correct features.</p>
        <p>For each patent document, we create three diferent graphs: One containing only the first independent
claim (later referred to as first claim graph ), another containing all the claims (all claims graph), and one
containing both the claims and the full description of the document (description graph).</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Model Architecture</title>
        <p>Our approach embeds each invention graph into a vector space for dense retrieval. Figure 2 illustrates
four key steps, which we describe in detail below.</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. Node Embedding Initialization</title>
          <p>Each token sequence (e.g. sentence or phrase) is tokenized using a BPE tokenizer [25] trained on patent
documents. We pre-train the token embeddings using FastText [26] to improve the convergence speed
of the model training. For each token sequence, we apply a Simple Word-Embedding-based Model
(SWEM) [27], which computes mean and max pooled embeddings combined with a linear projection.
This approach captures contextual information with a lower overhead than LSTMs or CNNs.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Graph Transformer Layers</title>
          <p>
            Next, we feed the node embeddings into a Graph Transformer consisting of several layers that operate
on the invention graph. We use Query-Key normalization [28] to stabilize training and reduce gradient
variance. We also adopt a Pre-LayerNorm [
            <xref ref-type="bibr" rid="ref8">29</xref>
            ] architecture for consistent gradient flow. Each
feedforward sublayer uses Gated Linear Units with a GELU activation (GEGLU) [
            <xref ref-type="bibr" rid="ref9">30</xref>
            ]. We define adjacency
according to the edges in the invention graph, ensuring that sparse attention focuses only on closely
related nodes. This lowers computational costs while preserving crucial inter-node relationships [23].
          </p>
          <p>Token
Embeddings</p>
          <p>Node
Embeddings</p>
          <p>Node
Embeddings</p>
          <p>Graph
Embedding
Text Sequences</p>
          <p>Graph</p>
          <p>Structure
BPE Tokenizer &amp;
Embedding Layer</p>
          <p>SWEM
Graph Transformer</p>
          <p>Pooling</p>
          <p>Mixture of Experts</p>
          <p>Final Graph Embedding
3.2.3. Pooling
After the Graph Transformer layers refine each node representation, we assign each node a learned
importance weight. We then compute a graph embedding using a weighted sum of all node embeddings,
emphasizing the nodes most relevant to downstream tasks.</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>3.2.4. Dimensionality Reduction Layer</title>
          <p>
            Lastly, we use a densely gated Mixture of Experts (MoE) [
            <xref ref-type="bibr" rid="ref10">31</xref>
            ] network to project the graph embedding
to a lower dimension.
          </p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Training Data</title>
        <p>The relevance signals used for training our model are citations extracted from patent examiner reports
of patent applications. This approach ensures high-quality, expert-curated training data, as examiner
citations highlight legally and technically relevant prior art—what we wish our search engine to place
on top of search results.</p>
        <p>
          We use the following citation categories: novelty-destroying prior art (labeled X by the EPO [
          <xref ref-type="bibr" rid="ref11">32</xref>
          ]),
relevant prior art that does not invalidate novelty (A citation), and documents that show that the
invention follows in an obvious way as a combination of existing inventions (Y citation). We include
citations from more than 40 jurisdictions, with around 90% coming from US, EP, WO, JP, or CN. In
total, we utilize approximately 31.7M citations formed from around 8.7M applications and 14.2M cited
documents.
        </p>
        <p>Each citation is represented by a pair of graphs, where the citing graph is typically the first claim graph,
and the cited graph is the description graph. At training time, we use the following data augmentations
and regularization techniques:
1. Trivial citations: We create artificial citations from the first claim graph and the description graph
of the same patent. We observe that having such citations stabilizes the training.
2. Graph type augmentation: For each sample, with a predefined probability (0.4 in our experiments),
we randomly replace either the citing graph or the cited graph (but not both) with the all claims
Our, base stage
Our, dimensionality reduction stage
Tree-LSTM, base stage
PaECTER1
Stella2
KaLM3
GTE-ModernBert4</p>
        <p>N/A
N/A</p>
        <p>N/A
4×512
3×2048
1×4096
1×4096
Dense retrieval models trained to embed patent documents</p>
        <p>Dense retrieval models trained to embed text in general</p>
        <p>Non-intersection between training, validation, and test sets is achieved on the document level, and no
citations cross the diferent sets. When excluding a document from a set, we exclude all the documents
of the same patent family (i.e., other applications corresponding to the same invention).</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Training Procedure</title>
        <p>
          We train our model using the PyTorch [
          <xref ref-type="bibr" rid="ref12">33</xref>
          ] and DGL [
          <xref ref-type="bibr" rid="ref13">34</xref>
          ] libraries. We employ triplet loss [
          <xref ref-type="bibr" rid="ref14">35</xref>
          ] as our
loss function, assigning diferent margins to diferent citation categories. We use the AdamW [
          <xref ref-type="bibr" rid="ref15">36</xref>
          ]
optimizer, reducing the learning rate by two on plateaus. Negatives are obtained using online hard
negative mining [
          <xref ref-type="bibr" rid="ref16">37</xref>
          ] over the current batch. Batch creation accounts for graph sizes, making batch size
dynamic, with 2100–2260 anchors and 900–960 positives on average. To make batches harder, we group
samples into batches using International Patent Classification (IPC) [
          <xref ref-type="bibr" rid="ref17">38</xref>
          ] classes.
        </p>
        <p>The training has two stages: A base stage with an output dimension of 2048, and a dimensionality
reduction stage reducing the output dimension to 150 by fine-tuning the base stage and adding the
MoE layer described in Section 3.2.4. While the base stage achieves higher recall, the dimensionality
reduction stage strikes a better balance in terms of performance and vector storage cost.</p>
        <p>The model is evaluated thrice per epoch. The training of each stage stops when the top-3 X citation
recall (Recall@3, see Section 4.1) does not improve for three subsequent evaluation runs. When trained
on eight L4 GPUs, the training process for both stages combined required approximately 185k updates
(12 epochs) and took about 4.6 days.
1mpi-inno-comp/paecter
2NovaSearch/stella_en_400M_v5
3HIT-TMG/KaLM-embedding-multilingual-mini-v1
4Alibaba-NLP/gte-modernbert-base</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>To evaluate the performance of our model in patent retrieval, we measure its efectiveness in retrieving
novelty-destroying (X) citations of patent applications, as described in Section 4.1. The query is the
ifrst independent claim of the application, while we use the full text (all claims and description) for the
search candidate documents. For our Graph Transformer model, we use the graphs corresponding to
the query and search candidate documents as input, and the other approaches use the original texts.
Any non-English documents are machine translated into English before being processed.</p>
      <p>We compare our model with four text embedding models (Section 4.2), our previous Tree-LSTM-based
approach (Section 4.3) and Okapi BM25 (Section 4.4).</p>
      <sec id="sec-4-1">
        <title>4.1. Evaluation Procedure</title>
        <p>
          To perform the evaluation, we use a test set containing about 161,000 search candidate documents
and around 96,000 queries that cite one of the candidate documents as an X citation. We populate the
search space by embedding each search candidate into a single vector, i.e., we do document retrieval,
not passage retrieval. For each query we then perform a nearest neighbor search among all the search
candidate documents and measure how often, on average, the document cited as X appears in the top
three results (Recall@3). As an auxiliary metric, we use nDCG [
          <xref ref-type="bibr" rid="ref18">39</xref>
          ], which scores relevant document
hits based on their positions. To compute nDCG, we use the top 150 search results (nDCG@150).
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Comparison with Text Embedding Models</title>
        <p>
          Based on MTEB leaderboard [
          <xref ref-type="bibr" rid="ref19">40</xref>
          ] scores, we selected three publicly available models to evaluate:
Stella [
          <xref ref-type="bibr" rid="ref20">41</xref>
          ], KaLM [
          <xref ref-type="bibr" rid="ref21">42</xref>
          ] and GTE-ModernBert [
          <xref ref-type="bibr" rid="ref22">43</xref>
          ]. We selected these instead of larger models since they
are of similar size as our model. We also compare our approach with PaECTER [12] as it is specifically
trained using patent data.
        </p>
        <p>We tuned the sequence length of these models to achieve the best Recall@3 on the validation set
and then evaluated the models on the test set. To preserve more text, the two models that performed
best on our evaluation sets (PaECTER and Stella) were applied on several chunks of the input text and
averaged the embeddings.</p>
        <p>PaECTER was trained on patent titles concatenated with abstracts as both queries and search
candidates. However, we found that using the text of first claims as queries does not afect model performance,
while using the full text (truncated to fit the 512 token window) for candidates even improves it. Thus,
the only diference in inputs to PaECTER compared to other text embedding models is prepending
input texts with patent titles.</p>
        <p>Table 1 shows the chosen parameters of all models (resultant sequence lengths and the maximum
number of input chunks, which is indicated by a digit before the “× ” symbol), model sizes and the
obtained metric values. The results show that our Graph Transformer-based approach outperforms the
other models on novelty-destroying patent document retrieval. The recall of PaECTER is about 31%
lower than our base stage model, while PaECTER performs about as well as Stella and better than the
other text embedding models.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Comparison with Tree-LSTM</title>
        <p>We also compare our approach with our previous approach based on Tree-LSTM [19], which was
re-trained using the same training data used to train the Graph Transformer. Table 1 shows that its
recall is 22% worse than our Graph Transformer base stage model, while outperforming PaECTER and
all other evaluated text embedding models.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Comparison with BM25</title>
        <p>
          BM25 [
          <xref ref-type="bibr" rid="ref23">44</xref>
          ] is a popular traditional word counting-based information retrieval approach that does not
use machine learning. We tuned the hyperparameters of the Okapi BM25 version of this approach (1
and ) on 1,000 citations from our validation set (but preserving all available search candidates) and
then computed its metrics on the test set. As seen in Table 1, the recall of BM25 is less than half of that
of our base stage model and also trails all text embedding models we evaluated.
        </p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Limitations of Comparisons</title>
        <p>Our comparison in Section 4.2 is limited since, apart from PaECTER, the other models were not trained
specifically for patent retrieval. On the other hand, we found that using a large batch size is crucial
for high recall when using online hard negative mining. If we would fine-tune text embedding models
we couldn’t use nearly as large batches as with our approach. With a sequence length of 512 and the
GPU setup used for training our model, we could fit around 45-65 positives in a batch (depending on
the model), which is more than 13 times less than we achieve with our Graph Transformer approach.
This would also make the training process significantly slower and could impact the results due to the
truncated texts.</p>
        <p>We also note that this evaluation focuses on first-stage retrieval eficiency and efectiveness;
comparison with computationally intensive re-ranking models like cross-encoders was considered beyond the
scope of this specific study but represents a potential area for future investigation.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this paper, we presented an approach for patent searching that utilizes a Graph Transformer model
as a dense retriever. We compared the performance of our approach to existing text embedding models
and showed that our approach achieves a significantly higher recall on novelty-destroying citation
retrieval. Additionally, we demonstrated that our approach is more computationally eficient than
existing text-based Transformer models.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly in order to: Grammar and spelling
check. And further ChatGPT-4o in order to: Paraphrase and reword, as well as improve writing style.
After using these tools, the authors reviewed and edited the content as needed and take full responsibility
for the publication’s content.
[8] A. Ali, A. Tufail, L. C. De Silva, P. E. Abas, Innovating patent retrieval: a comprehensive review of
techniques, trends, and challenges in prior art searches, Applied System Innovation 7 (2024) 91.
[9] B. Herbert, G. Szarvas, I. Gurevych, Prior art search using international patent classification codes
and all-claims-queries, in: C. Peters, G. M. Di Nunzio, M. Kurimo, T. Mandl, D. Mostefa, A. Peñas,
G. Roda (Eds.), Multilingual Information Access Evaluation I. Text Retrieval Experiments, Springer
Berlin Heidelberg, Berlin, Heidelberg, 2010, pp. 452–459.
[10] E. D’hondt, S. Verberne, CLEF-IP 2010: Prior art retrieval using the diferent sections in patent
documents (2010).
[11] L. Helmers, F. Horn, F. Biegler, T. Oppermann, K.-R. Müller, Automating the search for a patent’s
prior art with a full text similarity search, PloS one 14 (2019) e0212103.
[12] M. Ghosh, S. Erhardt, M. E. Rose, E. Buunk, D. Harhof, PaECTER: Patent-level
representation learning using citation-informed transformers, 2024. URL: https://arxiv.org/abs/2402.19411.
arXiv:2402.19411.
[13] U. U. Acikalin, M. Kutlu, Patent search using triplet networks based fine-tuned SciBERT, 2022.</p>
      <p>URL: https://arxiv.org/abs/2207.11497. arXiv:2207.11497.
[14] M. Freunek, A. Bodmer, Transformer-based patent novelty search by training claims to their own
description, Applied Economics and Finance 8 (2021) 37. doi:10.11114/aef.v8i5.5182.
[15] J.-S. Lee, J. Hsiang, Prior art search and reranking for generated patent text, 2021. URL: https:
//arxiv.org/abs/2009.09132. arXiv:2009.09132.
[16] Ambercite, Ambercite, 2025. URL: https://www.ambercite.com/, [Online; accessed 22-Apr-2025].
[17] Amplified AI, Amplified, 2025. URL: https://www.amplified.ai/, [Online; accessed 22-Apr-2025].
[18] IPScreener, IPScreener, 2025. URL: https://ipscreener.com/, [Online; accessed 22-Apr-2025].
[19] S. Björkqvist, J. Kallio, Building a graph-based patent search engine, in: Proceedings of the 46th
International ACM SIGIR Conference on Research and Development in Information Retrieval,
2023, pp. 3300–3304.
[20] A. Shehzad, F. Xia, S. Abid, C. Peng, S. Yu, D. Zhang, K. Verspoor, Graph transformers: A survey,
2024. URL: https://arxiv.org/abs/2407.09777. arXiv:2407.09777.
[21] C. Ying, T. Cai, S. Luo, S. Zheng, G. Ke, D. He, Y. Shen, T.-Y. Liu, Do transformers really perform
bad for graph representation?, 2021. URL: https://arxiv.org/abs/2106.05234. arXiv:2106.05234.
[22] D. Kreuzer, D. Beaini, W. L. Hamilton, V. Létourneau, P. Tossou, Rethinking graph transformers
with spectral attention, 2021. URL: https://arxiv.org/abs/2106.03893. arXiv:2106.03893.
[23] V. P. Dwivedi, X. Bresson, A generalization of transformer networks to graphs, 2021. URL: https:
//arxiv.org/abs/2012.09699. arXiv:2012.09699.
[24] L. team, L. Barrault, P.-A. Duquenne, M. Elbayad, A. Kozhevnikov, B. Alastruey, P. Andrews,
M. Coria, G. Couairon, M. R. Costa-jussà, D. Dale, H. Elsahar, K. Hefernan, J. M. Janeiro, T. Tran,
C. Ropers, E. Sánchez, R. S. Roman, A. Mourachko, S. Saleem, H. Schwenk, Large concept models:
Language modeling in a sentence representation space, 2024. URL: https://arxiv.org/abs/2412.08821.
arXiv:2412.08821.
[25] R. Sennrich, B. Haddow, A. Birch, Neural machine translation of rare words with subword units,
2016. URL: https://arxiv.org/abs/1508.07909. arXiv:1508.07909.
[26] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching word vectors with subword information,
Transactions of the Association for Computational Linguistics 5 (2017) 135–146. URL: https:
//aclanthology.org/Q17-1010/. doi:10.1162/tacl_a_00051.
[27] D. Shen, G. Wang, W. Wang, M. R. Min, Q. Su, Y. Zhang, C. Li, R. Henao, L. Carin, Baseline
needs more love: On simple word-embedding-based models and associated pooling mechanisms,
in: I. Gurevych, Y. Miyao (Eds.), Proceedings of the 56th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics,
Melbourne, Australia, 2018, pp. 440–450. URL: https://aclanthology.org/P18-1041/. doi:10.18653/
v1/P18-1041.
[28] A. Henry, P. R. Dachapally, S. S. Pawar, Y. Chen, Query-key normalization for transformers,
in: T. Cohn, Y. He, Y. Liu (Eds.), Findings of the Association for Computational Linguistics:
EMNLP 2020, Association for Computational Linguistics, Online, 2020, pp. 4246–4253. URL: https:</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N. S.</given-names>
            <surname>Clarke</surname>
          </string-name>
          ,
          <article-title>The basics of patent searching</article-title>
          ,
          <source>World Patent Information</source>
          <volume>54</volume>
          (
          <year>2018</year>
          )
          <fpage>S4</fpage>
          -
          <lpage>S10</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/S017221901630103X. doi:https://doi.org/ 10.1016/j.wpi.
          <year>2017</year>
          .
          <volume>02</volume>
          .006, best of Search Matters.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Minesoft</surname>
          </string-name>
          , Patbase,
          <year>2025</year>
          . URL: https://minesoft.com/solutions/patent-intelligence/patbase/, [Online; accessed 22-Apr-2025].
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Patsnap</surname>
          </string-name>
          , Patsnap,
          <year>2025</year>
          . URL: https://www.patsnap.com, [Online; accessed 22-Apr-2025].
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Questel</surname>
          </string-name>
          , Orbit intelligence,
          <year>2025</year>
          . URL: https://www.questel.com/patent/ip-intelligence-software/ orbit-intelligence/, [Online; accessed 22-Apr-2025].
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Google</surname>
          </string-name>
          , Google patents,
          <year>2025</year>
          . URL: https://patents.google.com/, [Online; accessed 22-Apr-2025].
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>EPO</given-names>
            ,
            <surname>Espacenet</surname>
          </string-name>
          ,
          <year>2025</year>
          . URL: https://worldwide.espacenet.com/, [Online; accessed 22-Apr-2025].
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>USPTO</surname>
          </string-name>
          , Patent public search,
          <year>2025</year>
          . URL: https://www.uspto.gov/patents/search/ patent-public-search, [Online; accessed 22-Apr-2025]. //aclanthology.org/
          <year>2020</year>
          .findings-emnlp.
          <volume>379</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .findings-emnlp.
          <volume>379</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>R.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          , T. Liu,
          <article-title>On layer normalization in the transformer architecture</article-title>
          , in: H.
          <string-name>
            <surname>D. III</surname>
          </string-name>
          , A. Singh (Eds.),
          <source>Proceedings of the 37th International Conference on Machine Learning</source>
          , volume
          <volume>119</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>10524</fpage>
          -
          <lpage>10533</lpage>
          . URL: https://proceedings.mlr.press/v119/xiong20b.html.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          , GLU variants improve transformer,
          <year>2020</year>
          . URL: https://arxiv.org/abs/
          <year>2002</year>
          .05202. arXiv:
          <year>2002</year>
          .05202.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Jacobs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. I.</given-names>
            <surname>Jordan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Nowlan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>Adaptive mixtures of local experts</article-title>
          ,
          <source>Neural Computation</source>
          <volume>3</volume>
          (
          <year>1991</year>
          )
          <fpage>79</fpage>
          -
          <lpage>87</lpage>
          . doi:
          <volume>10</volume>
          .1162/neco.
          <year>1991</year>
          .
          <volume>3</volume>
          .1.79.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [32]
          <article-title>EPO, Guidelines for Examination in the European Patent Ofice</article-title>
          ,
          <year>2024</year>
          . URL: https://www.epo.org/ en/legal/guidelines-epc/
          <year>2024</year>
          /b_x_
          <article-title>9_2</article-title>
          .html.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>A.</given-names>
            <surname>Paszke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lerer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bradbury</surname>
          </string-name>
          , G. Chanan,
          <string-name>
            <given-names>T.</given-names>
            <surname>Killeen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gimelshein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Antiga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Desmaison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Köpf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>DeVito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Raison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tejani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chilamkurthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Steiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bai</surname>
          </string-name>
          , S. Chintala,
          <article-title>PyTorch: an imperative style, high-performance deep learning library</article-title>
          , Curran Associates Inc.,
          <string-name>
            <surname>Red</surname>
            <given-names>Hook</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Gan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Karypis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z. Zhang,</surname>
          </string-name>
          <article-title>Deep graph library: A graph-centric, highly-performant package for graph neural networks</article-title>
          ,
          <year>2020</year>
          . URL: https://arxiv.org/abs/
          <year>1909</year>
          .01315. arXiv:
          <year>1909</year>
          .01315.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>F.</given-names>
            <surname>Schrof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kalenichenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Philbin</surname>
          </string-name>
          ,
          <article-title>Facenet: A unified embedding for face recognition and clustering</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>815</fpage>
          -
          <lpage>823</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>I.</given-names>
            <surname>Loshchilov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          , Decoupled weight decay regularization,
          <year>2019</year>
          . URL: https://arxiv.org/abs/ 1711.05101. arXiv:
          <volume>1711</volume>
          .
          <fpage>05101</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>A.</given-names>
            <surname>Shrivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <article-title>Training region-based object detectors with online hard example mining</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>761</fpage>
          -
          <lpage>769</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>World</given-names>
            <surname>Intellectual Property Organization</surname>
          </string-name>
          (WIPO),
          <source>International Patent Classification (IPC)</source>
          ,
          <year>2025</year>
          . URL: https://www.wipo.int/classifications/ipc/en/, version
          <year>2025</year>
          .
          <volume>01</volume>
          , [Online; accessed 22-Apr-2025].
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>K.</given-names>
            <surname>Järvelin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kekäläinen</surname>
          </string-name>
          ,
          <article-title>Cumulated gain-based evaluation of IR techniques</article-title>
          ,
          <source>ACM Transactions on Information Systems (TOIS) 20</source>
          (
          <year>2002</year>
          )
          <fpage>422</fpage>
          -
          <lpage>446</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>M.</given-names>
            <surname>Team</surname>
          </string-name>
          , MTEB Leaderboard, https://huggingface.co/spaces/mteb/leaderboard,
          <year>2023</year>
          . [Online; accessed 22-Apr-2025].
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Jasper and stella: distillation of sota embedding models</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2412.19048. arXiv:
          <volume>2412</volume>
          .
          <fpage>19048</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Kalm-embedding:
          <article-title>Superior training data brings a stronger embedding model</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2501.01028. arXiv:
          <volume>2501</volume>
          .
          <fpage>01028</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. Zhang,</surname>
          </string-name>
          <article-title>mGTE: Generalized long-context text representation and reranking models for multilingual text retrieval</article-title>
          , in: F.
          <string-name>
            <surname>Dernoncourt</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Preoţiuc-Pietro</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Shimorina (Eds.),
          <source>Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track</source>
          , Association for Computational Linguistics, Miami, Florida,
          <string-name>
            <surname>US</surname>
          </string-name>
          ,
          <year>2024</year>
          , pp.
          <fpage>1393</fpage>
          -
          <lpage>1412</lpage>
          . URL: https: //aclanthology.org/
          <year>2024</year>
          .emnlp-industry.
          <volume>103</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          .emnlp-industry.
          <volume>103</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Robertson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Walker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. M.</given-names>
            <surname>Hancock-Beaulieu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gatford</surname>
          </string-name>
          , et al.,
          <source>Okapi at TREC-3</source>
          , Nist Special Publication Sp
          <volume>109</volume>
          (
          <year>1995</year>
          )
          <fpage>109</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>