=Paper=
{{Paper
|id=Vol-3818/paper1
|storemode=property
|title=Augmenting Patent Summarization using Large Language Model with Knowledge Graph
|pdfUrl=https://ceur-ws.org/Vol-3818/paper1.pdf
|volume=Vol-3818
|authors=Shoon Lei Phyu,Murataly Uchkempirov,Mayesha Maliha Proma,Parag Kulkarni
|dblpUrl=https://dblp.org/rec/conf/lkm/PhyuUPK24
}}
==Augmenting Patent Summarization using Large Language Model with Knowledge Graph==
Augmenting Patent Summarization using Large Language
Model with Knowledge Graph
Shoon Lei Phyu1 , Murataly Uchkempirov1 , Mayesha Maliha Proma1 and Parag Kulkarni1
1
Tokyo International University, 4-42-31 Higashi-Ikebukuro, Toshima, Tokyo, 170-0013 Japan
Abstract
The increasing quantity and intricacy of patent papers need the creation of effective and precise summary
techniques. The scientific and comprehensive nature of patents is often excessive for traditional summary
techniques and machine learning-based approaches. This paper explores the integration of large language
models (LLMs) with knowledge graphs (KGs) to improve patent document summarization. LLMs such as GPT-4
provide advanced language understanding and generation capabilities, but can cause problems with domain-
specific content. Knowledge graphs offer a structured representation of knowledge that allows domain-specific
information to be included in the summarization process. The goal of this integration is to enhance the readability
and informativeness of patent summaries. We then propose a framework for combining LLM with KG and
evaluate its performance compared to traditional baseline summarization technique using both ROUGE scores
and human evaluations. The results demonstrate a significant improvement in the quality of patent summary,
highlighting the potential of this approach to produce more informative and accurate reviews of complex technical
documents. This research contributes to the development of more robust artificial intelligence summarization
system capable of performing complex data interpretation and decision making in specialized areas.
Keywords
Patent Summarization, Natural Language Processing, Knowledge Graph, Knowledge Graph Summarization
1. Introduction
Patent documents are a rich source of technical and legal information, but their complexity often
makes them difficult to understand quickly. Traditional summarization methods struggle with the
specificity and technicality of patent language. Recent advancements in large language models (LLMs)
and knowledge graphs (KGs) offer new opportunities for improving patent summarization. This paper
explores the augmentation of LLM-based patent summarization using knowledge graphs to generate
more concise and knowledgeable summaries. The primary focus is on leveraging Ollama3 for entity
and relationship extraction to build a patent knowledge graph and subsequently using it to generate
summaries. This approach aims to enhance the readability and informativeness of patent summaries
compared to traditional methods.
2. Literature Review
Automatic patent document summarization is a critical task that has received significant research
attention. Several methods have been proposed to solve this problem. For example, [1] presented a
non-ontology-based document summarization method that uses keyword recognition and significant
information density to extract paragraphs that best represent the patent text. [1, 2] presented a
new approach called PQPS, a state-of-the-art query-based patent summarizer that uses RBM and
Bi-LSTM models for patent summarization. [2, 3] used surface mining, NER, and machine learning
techniques to extract concepts and conceptual relationships from the claims section of patent documents.
Moreover, [4] described a summarization method for extracting surrogate patents to effectively support
The First international OpenKG Workshop: Large Knowledge-Enhanced Models, August 03, 2024, Jeju Island, South Korea
*
Corresponding author.
$ s22167405@al.tiu.ac.jp (S. L. Phyu); s22177901@al.tiu.ac.jp (M. Uchkempirov); s22224532@al.tiu.ac.jp (M. M. Proma);
pkulkarni@al.tiu.ac.jp (P. Kulkarni)
https://rei-kunn.github.io/ (S. L. Phyu)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
patent mapping. [5, 4] examined the identification of conflicting proposals in patents using the BERT
summarization method. Additionally, [6] discussed clustering patent generalizations into homogeneous
groups for knowledge generation. Moreover, [7] studied feature term extraction from patent documents.
[8] developed a multilingual patent text mining approach to compute relatedness scores of patent
documents. [9] used graph representations to improve abstract summarization of multiple documents,
demonstrating the potential of graph-based methods in summarization tasks.
Despite progress in this area, several shortcomings in patent document summarization methods have
been identified, highlighting the need for more sophisticated and efficient methods to address them. The
literature has noted the shortcomings of methods for summarizing patent documentation. According to
[4] there is a growing demand for automated tools to support patent engineers and decision makers in
patent analysis. Traditional methods face problems such as difficulties in natural language translation
and the lack of parallel sets of bilingual documents [10]. Existing patent summarization strategies
often fail due to the unique characteristics of patents and the emphasis on specific sections, such as
claims, while ignoring important details from other sections [11]. In addition, the huge number of
patent documents leads to information overload, which requires more effective summarization and
analysis methods [12]. [13] provided a survey on Graph-based representation for text analysis and
found out that the graph shows superiority over traditional methods. The author also suggested the
novel approach in construction of semi-graph and sentence extraction, which helps finding linear and
non-linear relationship between features [14].
Despite advances in text abstracting, annotations generated for patent documents still do not match
human annotations [2]. In addition, the complexity and length of patents, as well as the requirement for
domain knowledge, contribute to their under-utilization [5]. Moreover, limitations of existing patent
analysis tools hinder the creation of impact technology patent maps, highlighting the need for improved
methods for efficiently extracting information from patent documents [15]. Problems associated with
cryptic descriptions in patent texts limit the effectiveness and comprehensiveness of patent analysis
[16].
Several studies show that integrating LLMs into patent document summarization provides significant
benefits in terms of accuracy and efficiency. Large language models (LLMs) have significantly impacted
the field of automatic abstracting, especially in the context of patent document abstracting. The use of
LLMs in summarizing patent documents has various advantages and disadvantages, as follows:
• LLMs are excellent at capturing the semantic meaning of text, which allows them to create
contextually rich and coherent summaries [17],
• LLMs are adept at handling the intricacies of patent documents, which often contain technical
terminology and detailed information, resulting in more accurate and informative summaries
[18],
• LLMs demonstrate versatility in summarizing different types of documents, such as scientific
literature, radiology reports, and patient problem lists [18],
• LLMs have the ability to process voluminous input data, making them suitable for summarizing
extensive patent documents without leaving out important details [17],
• LLMs can be combined with knowledge graphs to improve generalization by integrating domain-
specific knowledge and relationships extracted from the graph
At the same time, the readability of summaries created using LLM may vary, making it difficult to
ensure that all users, including non-specialists, can easily understand [Decker, 2023]. According to
Decker [2023], graduates may have difficulty ensuring that their resumes are complete, especially when
gathering all the relevant details from lengthy and detailed patent documents. Bush [2024] points out
that LLMs can reflect biases present in the training data, potentially affecting the quality and objectivity
of the summaries produced. The comprehension of LLM-based summaries can be a concern because
understanding the rationale behind a particular summary generated by the model can be challenging
[19].
2.1. Integration of LLM and KG in patent document summarization
As a solution to these problems, this paper proposes an integrated approach to summarizing patent
documents using LLM with knowledge graphs. Knowledge graphs have become integral to various
applications, gaining popularity particularly after the comprehensive representation by [20]. The
concept of "Knowledge Graph" dates back to 1972 with the work of Schneider, E. W. Today, knowledge
graphs are widely utilized in implementations such as the Google Knowledge Graph, Airbnb, and other
social application analyses.
Despite their broad applications, there has been limited research focusing specifically on patent
datasets. By combining the power of understanding of the LLM with the structured representation of
knowledge in the form of graphs, you can create more accurate, informative, and contextually rich
summaries of patent documents. Large language models, such as pre-trained encoders [21], have shown
promise in text summarization tasks. These models can effectively capture the semantic meaning of text
and generate coherent summaries. Knowledge graphs play a critical role in organizing and presenting
structured information. Research has shown that incorporating knowledge graphs into language models
can improve the representation of domain information [22]. Research on question-answering systems
has demonstrated the effectiveness of combining large language models with knowledge graphs to
improve answer selection [23]. By using semantic relationships encoded in knowledge graphs, the
system can better understand and extract relevant information to answer questions. This approach can
be extended to patent document summarization to extract key ideas and relationships embedded in
patent texts. [24] proposes the framework for predictive summarization highlighting the possible areas
of patent applications in the future by analyzing the past data in the legal domain.
Moreover, leveraging external knowledge from open source knowledge graphs has been shown to
improve the performance of document summarization models [25]. By incorporating external knowledge
into the summarization process, the model gains access to a broader context and can generate more
informative and complete summaries. Several studies have explored the integration of knowledge
graphs into patent-related tasks, demonstrating their potential benefits. [26] provided a comprehensive
overview of knowledge graph embedding approaches and applications, highlighting the importance
of structured knowledge representation. [27] discusses the role of knowledge graphs in integrating
health information and their application in healthcare, highlighting the benefits of incorporating
knowledge graphs into data analytics. [28] implement knowledge proximity in a US patent database
using a knowledge graph called “PatNet” built from patent metadata. Chen and Deng [2023] propose an
accountable patent recommendation method using a knowledge graph to demonstrate the practical
application of structured knowledge in patent-related problems. [29] extract knowledge graphs from
patent applications, filling the gap in knowledge extraction from patent documents. [30] use text mining
and machine learning techniques to visualize processes in utility chemical patents, demonstrating the
utility of knowledge graphs in patent analysis. Additionally, [31] extend patent searching to include
text embedding and knowledge graphs, emphasizing the importance of capturing various aspects of
knowledge in patent documents.
In [3], the researchers highlighted that patent summarizations could be significantly enhanced using
conceptual knowledge graphs. Similarly, [32] demonstrated the implementation of knowledge graphs
using Open-IE to produce summaries through Cloze Reward. This paper builds on these studies by
integrating language models to generate human-readable summaries from patent documents. LAUREN,
as detailed by [33], focuses on summarizing knowledge graphs specifically for question answering
systems. This approach leverages graph-based techniques to resize knowledge graphs, making them
more manageable and tailored for answering user queries effectively. The summarization process
ensures that the key information necessary for answering questions is retained while extraneous details
are minimized. [34] addresses the summarization of knowledge graphs for search purposes. Their
work proposes a method to compute a summarization set that provides maximal descriptive power
with minimal redundancy. This approach enhances the efficiency of knowledge graph searches by
presenting users with the most relevant information. [35] discuss the concept of adaptive summarization
to create personalized knowledge graphs. This method adapts the summarization process to the user’s
Figure 1: The proposed system architecture for producing KG Summary
specific information needs, ensuring that the resulting knowledge graph is both concise and highly
relevant. The adaptability of this method makes it particularly useful for applications requiring tailored
knowledge delivery. The GATES framework, presented by [36], utilizes graph attention networks for
entity summarization within knowledge graphs. This technique addresses the challenge of summarizing
entities by focusing on the most significant relationships and attributes. The use of attention mechanisms
allows for the generation of summaries that are both comprehensive and focused on the most pertinent
aspects of the knowledge graph.
These studies collectively advance the field of knowledge graph summarization by addressing various
challenges such as customization, scalability, and efficiency. They provide a solid foundation for
further research and development in creating concise, informative, and user-tailored knowledge graph
summaries.
3. Methodology
3.1. Building the Patent Knowledge Graph
The initial step involves constructing the knowledge graph by creating triplets that represent entities
and relationships extracted from patent documents. This requires identifying subjects, actions, and
objects within the text. Patent documents, characterized by complex sentence structures and noun-
heavy content, present a challenge for entity extraction. Preliminary attempts using BERT models
proved inadequate for the patent dataset.
To address this, we employ llama-3 pre-trained language model1 with the parameter of 8.0B, which
effectively identifies entities and relationships from the text, specifically from the description and claims
sections of patents. The description section provides key terms, while the claims section offers detailed
information about the invention, both crucial for generating a comprehensive summary. By excluding
other sections, we aim to minimize the hallucination effect of the language model. The extracted data is
then used to construct a knowledge graph representing the core elements of the patent.
1
pre-trained llama3 model is available to download online at: https://ollama.com/library/llama3
Figure 2: Knowledge Graph of Patent Number US14996316 after constructing in the Neo4J Database
The knowledge graph is created in the Neo4j database using Cypher queries with the extracted
relationships and entities on the previous step, ensuring a structured and accessible representation of
the patent as shown in Figure 2.
3.2. Generating Summaries from the Knowledge Graph
With the knowledge graph established, the next step involves generating summaries using the language
model (llama-3:8.0B). This involves transforming the structured data from the knowledge graph into
a human-readable format that emphasizes conciseness and informativeness. To obtain an effective
abstractive summary, it is essential to instruct the language model with a custom prompt. This step
ensures the generation of a coherent summary rather than a disjointed question-and-answer format.
Additionally, Cypher queries are utilized to retrieve all relevant nodes and relationships from the
knowledge graph, facilitating the creation of a comprehensive summary.
This methodology leverages the strengths of both knowledge graphs and advanced language models,
aiming to produce high-quality, informative summaries of patent documents, thereby enhancing the
accessibility and understanding of patent information.
3.2.1. Baseline Method: Language Model Summarization
The performance of the proposed method is compared against a baseline summarizing technique that
solely utilizes a language model for document summarization. The baseline approach is straightforward:
the language model is employed to summarize the content of the patent documents to generate a relevant
summarization. We defined this baseline to suit the summarization of one document rather than multiple
documents. In this step, we come to notice that we can use multiple document summarization using
RAG architecture, which we aims to do the further research. Nevertheless, in this current baseline, the
user input is the patent document itself, which is then processed to retrieve relevant sections within the
document (such as the description and claims). These sections are fed into the language model, which
generates a cohesive summary of the patent.
Our adaptation involves ensuring that the retrieval mechanism focuses on internal sections of the
patent document, effectively treating it as a self-contained database. This approach leverages the
structure of the patent document to enhance the quality of the generated summary. This summary is
intended to capture the essence of the invention described in the patent document. The simplicity of
this approach allows for a clear comparison with more complex methods.
Algorithm 1 Summarizing Patent Documents Using Knowledge RAG and Llama3
Input: HUPD dataset, Neo4j credentials, patent number.
Output: Generation of the structured summary of the patent document.
1: Load HUPD dataset
2: Filter patents with required labels
3: Initialize Llama3 model
4: Define prompt to extract entities/relationships
5: Process text with Llama3
6: if extraction fails then
7: return Error
8: end if
9: Parse and sanitize data
10: if parsing fails then
11: return Error
12: end if
13: Create Neo4j knowledge graph
14: Define prompt for summary generation
15: Execute Cypher query to retrieve data
16: Format data for input
17: if formatting fails then
18: return Error
19: end if
20: Generate summary using Llama3
21: if generation fails then
22: return Error
23: end if
24: Evaluate summary using ROUGE metric
25: return Structured summary
4. Dataset
We utilized the HUPD (Harvard University Patent Dataset), which provides an extensive range of
raw patent claim texts. This dataset was chosen over other prominent datasets such as BIGPATENT2 ,
CLEF-IP 2011, and USPTO-2M for several key reasons. Firstly, HUPD houses approximately 4.5 million
patent documents, significantly more than BIGPATENT’s 1.3 million [37]. This expansive volume offers
a richer and more varied base for analysis. Secondly, HUPD encompasses 34 distinct sections, including
claims, offering a more comprehensive view compared to BIGPATENT’s limitation to just 4 sections.
Lastly, unlike the pre-tokenized texts in BIGPATENT, HUPD’s provision of raw texts affords greater
flexibility and depth in our analysis, allowing for a more tailored approach to data processing [38].
2
BIGPATENT dataset is available to download online at: https://evasharma.github.io/bigpatent/
To test our proposed method, we randomly selected approximately 10 documents from the ’G’
category of the CPC code (Physics) within the HUPD dataset as representative samples for our study.
We generated both baseline summaries and knowledge graph summaries for each patent document and
used ROUGE scores to track the performance metrics. Additionally, we employed human evaluation to
assess the real-world clarity of the generated summaries.
5. Evaluation
This section presents the results of our proposed method compared to the baseline summarization
technique.The proposed method was evaluated using ROUGE scores to measure the quality of the
generated summaries. The evaluation metrics included ROUGE-1, ROUGE-2, and ROUGE-L, which
assess the overlap of n-grams and longest common subsequences between the generated and reference
summaries. Moreover, we evaluated the performance with human evaluations to assess the quality and
understandability of the summaries generated.
5.1. Quantitative Result
The ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metric, introduced by Chin-Yew Lin
in their pivotal work, is a cornerstone in the field of natural language processing, particularly for
evaluating automatic summarization and machine translation [39]. ROUGE assesses the quality of a
summary by comparing it with reference summaries.
ROUGE scores are often reported in terms of precision, recall, and F1 score, providing a comprehensive
evaluation framework:
• Precision measures the fraction of n-grams in the generated summary that are present in the
reference summary.
• Recall assesses the fraction of n-grams in the reference summary that appear in the generated
summary.
• F1 Score is the harmonic mean of precision and recall, offering a balanced metric.
The formulas for these metrics are as follows:
Number of overlapping n-grams
Precision = (1)
Total n-grams in the generated summary
Number of overlapping n-grams
Recall = (2)
Total n-grams in the reference summary
Precision × Recall
𝐹1 = 2 × (3)
Precision + Recall
Table 1 shows the ROUGE F1 scores for both the baseline and the proposed method. The baseline
method outperforms the proposed method across all ROUGE metrics, indicating a higher F1 score for
summarization.
Table 1
ROUGE F1 Scores Comparison of Patent Number US14996316
Method ROUGE-1 ROUGE-2 ROUGE-L
Baseline 0.461538 0.235294 0.461538
KG Sum 0.343434 0.136054 0.323232
5.2. Human Evaluation Result
The human evaluation of our summarization method was a critical component of our research. We
aimed to determine the effectiveness of our Knowledge Graph (KG) based summaries in terms of under-
standability and informativeness. To this end, we conducted an evaluation involving 12 respondents
who reviewed summaries of 10 different patent documents.
We selected 10 patent documents from the Harvard University Patent dataset [38] to ensure a diverse
and representative sample. For each patent, three types of summaries were generated: the original
summary, a baseline summary, and the KG-based summary. Each respondent was presented with these
three summaries for each patent document without being informed of their origins to prevent any bias.
5.2.1. Voting Process
Respondents were asked to vote for the summary they found easiest to understand and most informative.
The summaries were presented in a randomized order to eliminate order effects. The voting was
conducted using a simple voting system where respondents chose one summary out of the three options.
The criteria for voting were clearly defined for understandability: how easily the summary could be
comprehended, and, informativeness: how well the summary conveyed the essential information of the
patent document.
To ensure the integrity of the evaluation, respondents were chosen from a pool of individuals with
no prior experience in reading or writing patent documents. This choice was made to simulate the
perspective of an average reader who might not have specialized knowledge.
Table 2
The voting result of Human Evaluation
Patent KGSUM Original BaselineSUM Result
US14997116 9 0 3 KGSUM
US14986817 5 0 7 BaselineSUM
US14990669 8 1 0 KGSUM
US14998490 2 1 4 BaselineSUM
US14990535 11 1 0 KGSUM
US14989021 7 2 3 KGSUM
US15000497 5 1 6 BaselineSUM
US14994324 6 2 4 KGSUM
US15001450 6 3 3 KGSUM
US14996316 8 2 2 KGSUM
The human-evaluation results are summarized in Table 2. The KG-based summaries received the
highest number of votes in most cases, indicating a clear preference for their comprehensibility and
informativeness over the other two types of summaries.
5.3. Discussion of Results
The results from the ROUGE F1 score evaluations indicate that the baseline method outperforms
the proposed Knowledge Graph (KG)-based summarization technique in terms of automated metrics.
ROUGE scores measure the overlap between the generated summaries and reference summaries,
focusing on precision, recall, and F1 score.
However, the positive feedback from human evaluations suggests that the proposed KG-based
method generates clearer, more coherent, and more informative summaries. This contrast highlights
the importance of human evaluations in assessing summarization quality, as they capture aspects of
readability and informativeness that automated metrics might miss.
Notably, for Patent Number "US14990535," the baseline method failed to produce a summary, likely
due to hallucination effects commonly observed in large language models (LLMs). In contrast, the
KG-based method successfully generated a precise and informative summary from the original patent
document, demonstrating its robustness.
It is important to note that our primary focus is on understandability and informativeness as perceived
by human readers, rather than solely relying on automated metrics like ROUGE scores. This approach
ensures that the summaries effectively convey the essence of the inventions described in the patents.
The integration of knowledge graphs and the adaptation of the RAG architecture have proven effective
in enhancing the readability and informativeness of patent document summarization. These findings
underscore the value of human-centered evaluations in developing summarization tools.
6. Limitations
Despite the promising results, our research faced several challenges and limitations. The primary
limitation stems from the reliance on large language models (LLMs) LLaMA3 for both entity recognition
and summary generation introduces potential vulnerabilities. If the LLM fails to accurately produce
entities, the subsequent knowledge graph cannot be generated correctly. Any errors in the entity
recognition process can lead to inaccuracies in the final summary. This dependency on LLMs introduces
a potential source of error that can affect the overall performance of the summarization method.
Additionally, the LLM used for generating summaries, specifically LLaMA3, may produce sub-optimal
results if the prompts are not clearly defined or easy to understand. This can result in summaries that do
not accurately reflect the content of the original document. Another limitation is the language support;
currently, our method is only capable of handling English documents, which restricts its applicability
to patents written in other languages.
Moreover, our approach is designed for single-document summarization. This limits its effectiveness
when dealing with multiple documents, which is a common scenario in patent analysis and other legal
contexts. Addressing this limitation is a key area for future research.
Nevertheless, the human evaluation results suggest that despite the lower ROUGE scores, the sum-
maries generated by the proposed method are more understandable and informative to readers unfamiliar
with patent documents. This highlights the importance of human-centered evaluation metrics in as-
sessing the effectiveness of summarization methods, as automated metrics like ROUGE may not fully
capture the nuances of human understanding.
7. Future Work
To address the limitations identified, future work will focus on several key areas of improvement:
Enhanced Entity Recognition: We plan to develop a dedicated entity recognition model trained
on annotated patent data. By creating a custom model tailored to the specificities of patent documents,
we aim to improve the accuracy of entity recognition and, consequently, the quality of the knowledge
graphs and summaries.
Improved LLM Training: We will train the LLM, such as LLaMA3, on a specialized legal database
to enhance its performance in summarization tasks. This targeted training will help mitigate issues
arising from unclear prompts and improve the model’s ability to generate accurate and informative
summaries.
Multilingual Support: Expanding our model to support multiple languages is a priority. This
involves training the entity recognition and summarization models on multilingual datasets to ensure
accurate performance across different languages.
Multi-Document Summarization: Developing our method to handle multi-document summa-
rization will significantly increase its applicability. We plan to extend the architecture to aggregate
and summarize information from multiple documents, which is particularly important in the legal and
patent analysis fields.
By addressing these areas, we aim to refine our summarization approach, making it more robust,
versatile, and applicable to a wider range of documents and languages. This will enhance its utility in
practical, real-world scenarios, particularly in legal and technical domains.
8. Conclusion
In this paper, we introduced a novel summarization approach that integrates knowledge graphs and
large language models (LLMs) to enhance the readability and informativeness of patent document
summaries. Our main contribution lies in generating summaries that are easier to understand while
maintaining the key inventions from the patent documents. Additionally, our research contributes
towards advancing the legal technology by analyzing entities in the patent domain and producing
structured data through knowledge graphs combined with the contextual understanding of LLMs for
automatic summarization tasks. We emphasize the importance of human-centered evaluations and
contextual understandability over automated evaluations when producing summaries. Despite the
limitations, our research lays a foundation for future advancements in patent summarization technology,
aiming to develop a more robust tool for analyzing complex technical documents in legal domain.
Acknowledgments
First and foremost, we would like to express our gratitude to the creators of the Harvard University
Patent Dataset (HUPD) for providing the legal dataset as publicly available, which was instrumental in
our study. We also appreciate the developers of the Llama 3 model, which we utilized for testing the
knowledge graph summarization on the legal dataset.
Lastly, a big token of appreciations to the testimonies who participated in the human evaluation of
the summaries. Their insights and feedback were invaluable in assessing the quality and readability of
our summarization methods.
References
[1] A. J. Trappey, C. V. Trappey, C.-Y. Wu, Automatic patent document summarization for collaborative
knowledge systems and services, Journal of Systems Science and Systems Engineering 18 (2009)
71–94. URL: https://doi.org/10.1007/s11518-009-5100-7. doi:10.1007/s11518-009-5100-7.
[2] G. Kumaravel, S. Sankaranarayanan, Pqps: Prior-art query-based patent summarizer using rbm
and bi-lstm, Mobile Information Systems 2021 (2021) 2497770. URL: https://doi.org/10.1155/2021/
2497770. doi:10.1155/2021/2497770.
[3] P. Rk, S. Lalithadevi, Patent document summarization using conceptual graphs, SSRN Electronic
Journal (2017). doi:10.2139/ssrn.3441645.
[4] Y.-H. Tseng, Y.-W. Wang, Y.-I. Lin, C.-H. Lin, D.-R. Juang, Patent surrogate extraction and evaluation
in the context of patent mapping, Journal of Information Science 33 (2007) 718–736. URL: https:
//doi.org/10.1177/0165551507077406. doi:10.1177/0165551507077406.
[5] G. Guarino, A. Samet, A. Nafi, D. Cavallucci, Summatriz : Summarization networks for min-
ing patent contradiction, in: 2020 19th IEEE International Conference on Machine Learning
and Applications (ICMLA), 2020 19th IEEE International Conference on Machine Learning and
Applications (ICMLA), 2020, pp. 979–986. URL: https://doi.org/10.1109/ICMLA51294.2020.00159.
doi:10.1109/ICMLA51294.2020.00159.
[6] A. J. Trappey, C. V. Trappey, An r&d knowledge management method for patent document
summarization, Industrial Management & Data Systems 108 (2008) 245–257. URL: https://doi.org/
10.1108/02635570810847608. doi:10.1108/02635570810847608.
[7] K. Takano, M. Tanaka, H. Sakai, R. Kitajima, T. Ota, C. Tanabe, H. Sakaji, Extracting characteristic
terms from patent documents, International Journal of Smart Computing and Artificial Intelligence
4 (2020) 19–38. URL: https://doi.org/10.52731/ijscai.v4.i2.568. doi:10.52731/ijscai.v4.i2.568.
[8] C.-H. Lee, H.-C. Yang, C.-H. Wu, Y.-J. Li, A multilingual patent text-mining approach for computing
relatedness evaluation of patent documents, in: 2009 Fifth International Conference on Intelligent
Information Hiding and Multimedia Signal Processing, 2009, pp. 612–615. doi:10.1109/IIH-MSP.
2009.162.
[9] W. Li, X. Xiao, J. Liu, H. Wu, H. Wang, J. Du, Leveraging graph to improve abstractive multi-
document summarization, in: D. Jurafsky, J. Chai, N. Schluter, J. Tetreault (Eds.), Proceedings of the
58th Annual Meeting of the Association for Computational Linguistics, Association for Computa-
tional Linguistics, Online, 2020, pp. 6232–6243. URL: https://aclanthology.org/2020.acl-main.555.
doi:10.18653/v1/2020.acl-main.555.
[10] Y.-L. Chen, Y.-T. Chiu, Cross-language patent matching via an international patent
classification-based concept bridge, Journal of Information Science 39 (2013) 737–
753. URL: https://doi.org/10.1177/0165551513494641. doi:10.1177/0165551513494641.
arXiv:https://doi.org/10.1177/0165551513494641.
[11] J. Codina-Filbà, N. Bouayad-Agha, A. Burga, G. Casamayor, S. Mille, A. Müller, H. Saggion, L. Wan-
ner, Using genre-specific features for patent summaries, Information Processing and Management
53 (2017) 151–174. URL: https://www.sciencedirect.com/science/article/pii/S0306457316302825.
doi:https://doi.org/10.1016/j.ipm.2016.07.002.
[12] H.-F. Wang, Y.-H. Chang, P.-Y. Hsin, Constructing patent maps using text mining to sustainably
detect potential technological opportunities, Sustainability 10 (2018) 3729. URL: https://doi.org/10.
3390/su10103729. doi:10.3390/su10103729.
[13] S. Sonawane, P. Kulkarni, Graph based representation and analysis of text document: A survey
of techniques, International Journal of Computer Applications 96 (2014) 1–8. doi:10.5120/
16899-6972.
[14] S. Sonawane, P. Kulkarni, C. Deshpande, B. Athawale, Extractive summarization using semigraph
(essg), Evolving Systems 10 (2019) 409–424. URL: https://doi.org/10.1007/s12530-018-9246-8.
doi:10.1007/s12530-018-9246-8.
[15] H. Nonaka, A. Kobayashi, H. Sakaji, Y. Suzuki, H. Sakai, S. Masuyama, Extraction of the effect
and the technology terms from a patent document, in: Proceedings of the 2010 International
Conference on Computational Intelligence and Education (ICCIE), 2010. URL: https://doi.org/10.
1109/iccie.2010.5668167. doi:10.1109/iccie.2010.5668167.
[16] J. Ma, A. L. Porter, Analyzing patent topical information to identify technology pathways
and potential opportunities, Scientometrics 102 (2014) 811–827. URL: https://doi.org/10.1007/
s11192-014-1392-6. doi:10.1007/s11192-014-1392-6.
[17] S. Bao, Chain-of-event prompting for multi-document summarization by large language models,
International Journal of Web Information Systems 20 (2024) 229–247. URL: https://doi.org/10.1108/
ijwis-12-2023-0249. doi:10.1108/ijwis-12-2023-0249.
[18] C. Williams, Evaluating large language models for drafting emergency department discharge
summaries (2024). URL: https://doi.org/10.1101/2024.04.03.24305088. doi:10.1101/2024.04.03.
24305088.
[19] S. O’Neil, Phenomics assistant: an interface for llm-based biomedical knowledge graph exploration
(2024). URL: https://doi.org/10.1101/2024.01.31.578275. doi:10.1101/2024.01.31.578275.
[20] A. Hogan, E. Blomqvist, M. Cochez, C. D’amato, G. D. Melo, C. Gutierrez, S. Kirrane, J. E. L. Gayo,
R. Navigli, S. Neumaier, A.-C. N. Ngomo, A. Polleres, S. M. Rashid, A. Rula, L. Schmelzeisen,
J. Sequeda, S. Staab, A. Zimmermann, Knowledge graphs, ACM Computing Surveys 54 (2021)
1–37. URL: http://dx.doi.org/10.1145/3447772. doi:10.1145/3447772.
[21] Y. Liu, M. Lapata, Text summarization with pretrained encoders, in: K. Inui, J. Jiang, V. Ng,
X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language
Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-
IJCNLP), Association for Computational Linguistics, Hong Kong, China, 2019, pp. 3730–3740. URL:
https://aclanthology.org/D19-1387. doi:10.18653/v1/D19-1387.
[22] W. Liu, P. Zhou, Z. Zhao, Z. Wang, Q. Ju, H. Deng, P. Wang, K-bert: enabling language
representation with knowledge graph, in: Proceedings of the AAAI Conference on Artifi-
cial Intelligence, volume 34, 2020, pp. 2901–2908. URL: https://doi.org/10.1609/aaai.v34i03.5681.
doi:10.1609/aaai.v34i03.5681.
[23] N. Hu, Y. Wu, G. Qi, D. Min, J. Chen, J. Pan, A. Zafar, An empirical study of pre-trained language
models in simple knowledge graph question answering (2022). URL: https://doi.org/10.21203/rs.3.
rs-2184834/v1. doi:10.21203/rs.3.rs-2184834/v1.
[24] M. Uchkempirov, P. Shoon Lei, L. R. Aadhya, P. Kulkarni, A survey of automatic text
summarization systems, applications, challenges, and future trends, International Journal of Com-
puter Applications 186 (2024) 8–15. URL: https://ijcaonline.org/archives/volume186/number23/
a-survey-of-automatic-text-summarization-systems-applications-challenges-and-future-trends/.
doi:10.5120/ijca2024923672.
[25] T. Tang, T. Yuan, X. Tang, D. Chen, Incorporating external knowledge into unsupervised graph
model for document summarization, Electronics 9 (2020). URL: https://www.mdpi.com/2079-9292/
9/9/1520. doi:10.3390/electronics9091520.
[26] Q. Wang, Z. Mao, B. Wang, L. Guo, Knowledge graph embedding: a survey of approaches and
applications, IEEE Transactions on Knowledge and Data Engineering 29 (2017) 2724–2743. URL:
https://doi.org/10.1109/tkde.2017.2754499. doi:10.1109/tkde.2017.2754499.
[27] A. Akinnubi, Knowledge graph: a survey (2023). URL: https://doi.org/10.31219/osf.io/8xm2g.
doi:10.31219/osf.io/8xm2g.
[28] G. Li, L. Siddharth, J. Luo, Embedding knowledge graph of patent metadata to measure knowledge
proximity, Journal of the Association for Information Science and Technology 74 (2023) 476–490.
URL: https://doi.org/10.1002/asi.24736. doi:10.1002/asi.24736.
[29] L. Siddharth, L. Blessing, K. Wood, J. Luo, Engineering knowledge graph from patent database,
Journal of Computing and Information Science in Engineering 22 (2021). URL: https://doi.org/10.
1115/1.4052293. doi:10.1115/1.4052293.
[30] A. J. C. Trappey, C. V. Trappey, C.-W. Liang, H.-S. Lin, Ip analytics and machine learning applied
to create process visualization graphs for chemical utility patents, Processes 9 (2021) 1342. URL:
https://doi.org/10.3390/pr9081342. doi:10.3390/pr9081342.
[31] W. Du, Q. Yan, W. Zhang, J. Ma, Leveraging online behaviors for interpretable knowledge-
aware patent recommendation, Internet Research 32 (2021) 568–587. URL: https://doi.org/10.1108/
intr-08-2020-0473. doi:10.1108/intr-08-2020-0473.
[32] L. Huang, L. Wu, L. Wang, Knowledge graph-augmented abstractive summarization with semantic-
driven cloze reward, in: D. Jurafsky, J. Chai, N. Schluter, J. Tetreault (Eds.), Proceedings of the 58th
Annual Meeting of the Association for Computational Linguistics, Association for Computational
Linguistics, Online, 2020, pp. 5094–5107. URL: https://aclanthology.org/2020.acl-main.457. doi:10.
18653/v1/2020.acl-main.457.
[33] R. Jalota, D. Vollmers, D. Moussallem, A.-C. N. Ngomo, Lauren - knowledge graph summarization
for question answering, in: 2021 IEEE 15th International Conference on Semantic Computing
(ICSC), 2021, pp. 221–226. doi:10.1109/ICSC50631.2021.00047.
[34] Q. Song, Y. Wu, P. Lin, L. X. Dong, H. Sun, Mining summaries for knowledge graph search,
IEEE Transactions on Knowledge and Data Engineering 30 (2018) 1887–1900. doi:10.1109/TKDE.
2018.2807442.
[35] de, T. Safavi, Adaptive personalized knowledge graph summarization, 2018. URL: https://api.
semanticscholar.org/CorpusID:52510379.
[36] A. F. Firmansyah, D. Moussallem, A.-C. N. Ngomo, Gates: Using graph attention networks
for entity summarization, in: Proceedings of the 11th Knowledge Capture Conference, K-CAP
’21, Association for Computing Machinery, New York, NY, USA, 2021, p. 73–80. URL: https:
//doi.org/10.1145/3460210.3493574. doi:10.1145/3460210.3493574.
[37] A. C. Sharma, C. Li, J. Wang, C.-Y. Huang, Z. Hu, Bigpatent: A large-scale dataset for abstractive
and coherent summarization, in: Proceedings of the 57th Annual Meeting of the Association
for Computational Linguistics, 2019, pp. 2204–2213. URL: https://doi.org/10.18653/v1/P19-1213.
doi:10.18653/v1/P19-1213.
[38] M. Suzgun, L. Melas-Kyriazi, S. K. Sarkar, S. D. Kominers, S. M. Shieber, The harvard uspto patent
dataset: A large-scale, well-structured, and multi-purpose corpus of patent applications, 2022. URL:
https://arxiv.org/abs/2207.04043.
[39] C.-Y. Lin, ROUGE: A package for automatic evaluation of summaries, in: Text Summarization
Branches Out, Association for Computational Linguistics, Barcelona, Spain, 2004, pp. 74–81. URL:
https://aclanthology.org/W04-1013.
A. Summaries for Evaluation
In this appendix section, the summaries titled "Original Summary, KG Summary (our approach), and
Baseline Summary" are presented. These summaries are evaluated against each other to the testimonies
to evaluate which summary is easier to understand by voting. Below are the summaries evaluated for
Patent No: US14997116.
Original Summary
Systems, methods, and non-transitory computer readable media are configured to receive a
resume corpus. A machine learning model is trained based on terms from the resume corpus. A
job title for a user is determined based on profile information provided to the model.
KG Summary
A machine learning model was trained to analyze profile information from job applicants, which
was provided to the model via a computer system. The computer system received resumes as
input data and used this information to train the model. This training process enables the model
to learn patterns and correlations between resume data and job title. The organization recruited
for by a specific job title, leveraging the trained machine learning model to improve its hiring
processes.
Baseline Summary
The invention relates to a system and method for determining a job title for a user based on their
profile information. A machine learning model is trained using terms from a resume corpus, and
then the user’s profile information is provided to the model to determine their most likely job
title. The model uses techniques such as vector space creation to represent terms in the resume
corpus, and anchor points are used to identify the closest match for each term.
We have presented the other patent summaries, that we used in evaluation and generated from our
approach, in the online resource. As our approach uses prompts in order to generate the nodes and
entities from the original documents, we have fully provided the prompts we used as well as the feature
processing functions before we proceed to the graph construction.
B. Online Resources
The code for this paper is available at :
• GitHub