Augmenting Patent Summarization using Large Language Model with Knowledge Graph Shoon Lei Phyu1 , Murataly Uchkempirov1 , Mayesha Maliha Proma1 and Parag Kulkarni1 1 Tokyo International University, 4-42-31 Higashi-Ikebukuro, Toshima, Tokyo, 170-0013 Japan Abstract The increasing quantity and intricacy of patent papers need the creation of effective and precise summary techniques. The scientific and comprehensive nature of patents is often excessive for traditional summary techniques and machine learning-based approaches. This paper explores the integration of large language models (LLMs) with knowledge graphs (KGs) to improve patent document summarization. LLMs such as GPT-4 provide advanced language understanding and generation capabilities, but can cause problems with domain- specific content. Knowledge graphs offer a structured representation of knowledge that allows domain-specific information to be included in the summarization process. The goal of this integration is to enhance the readability and informativeness of patent summaries. We then propose a framework for combining LLM with KG and evaluate its performance compared to traditional baseline summarization technique using both ROUGE scores and human evaluations. The results demonstrate a significant improvement in the quality of patent summary, highlighting the potential of this approach to produce more informative and accurate reviews of complex technical documents. This research contributes to the development of more robust artificial intelligence summarization system capable of performing complex data interpretation and decision making in specialized areas. Keywords Patent Summarization, Natural Language Processing, Knowledge Graph, Knowledge Graph Summarization 1. Introduction Patent documents are a rich source of technical and legal information, but their complexity often makes them difficult to understand quickly. Traditional summarization methods struggle with the specificity and technicality of patent language. Recent advancements in large language models (LLMs) and knowledge graphs (KGs) offer new opportunities for improving patent summarization. This paper explores the augmentation of LLM-based patent summarization using knowledge graphs to generate more concise and knowledgeable summaries. The primary focus is on leveraging Ollama3 for entity and relationship extraction to build a patent knowledge graph and subsequently using it to generate summaries. This approach aims to enhance the readability and informativeness of patent summaries compared to traditional methods. 2. Literature Review Automatic patent document summarization is a critical task that has received significant research attention. Several methods have been proposed to solve this problem. For example, [1] presented a non-ontology-based document summarization method that uses keyword recognition and significant information density to extract paragraphs that best represent the patent text. [1, 2] presented a new approach called PQPS, a state-of-the-art query-based patent summarizer that uses RBM and Bi-LSTM models for patent summarization. [2, 3] used surface mining, NER, and machine learning techniques to extract concepts and conceptual relationships from the claims section of patent documents. Moreover, [4] described a summarization method for extracting surrogate patents to effectively support The First international OpenKG Workshop: Large Knowledge-Enhanced Models, August 03, 2024, Jeju Island, South Korea * Corresponding author. $ s22167405@al.tiu.ac.jp (S. L. Phyu); s22177901@al.tiu.ac.jp (M. Uchkempirov); s22224532@al.tiu.ac.jp (M. M. Proma); pkulkarni@al.tiu.ac.jp (P. Kulkarni) € https://rei-kunn.github.io/ (S. L. Phyu) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings patent mapping. [5, 4] examined the identification of conflicting proposals in patents using the BERT summarization method. Additionally, [6] discussed clustering patent generalizations into homogeneous groups for knowledge generation. Moreover, [7] studied feature term extraction from patent documents. [8] developed a multilingual patent text mining approach to compute relatedness scores of patent documents. [9] used graph representations to improve abstract summarization of multiple documents, demonstrating the potential of graph-based methods in summarization tasks. Despite progress in this area, several shortcomings in patent document summarization methods have been identified, highlighting the need for more sophisticated and efficient methods to address them. The literature has noted the shortcomings of methods for summarizing patent documentation. According to [4] there is a growing demand for automated tools to support patent engineers and decision makers in patent analysis. Traditional methods face problems such as difficulties in natural language translation and the lack of parallel sets of bilingual documents [10]. Existing patent summarization strategies often fail due to the unique characteristics of patents and the emphasis on specific sections, such as claims, while ignoring important details from other sections [11]. In addition, the huge number of patent documents leads to information overload, which requires more effective summarization and analysis methods [12]. [13] provided a survey on Graph-based representation for text analysis and found out that the graph shows superiority over traditional methods. The author also suggested the novel approach in construction of semi-graph and sentence extraction, which helps finding linear and non-linear relationship between features [14]. Despite advances in text abstracting, annotations generated for patent documents still do not match human annotations [2]. In addition, the complexity and length of patents, as well as the requirement for domain knowledge, contribute to their under-utilization [5]. Moreover, limitations of existing patent analysis tools hinder the creation of impact technology patent maps, highlighting the need for improved methods for efficiently extracting information from patent documents [15]. Problems associated with cryptic descriptions in patent texts limit the effectiveness and comprehensiveness of patent analysis [16]. Several studies show that integrating LLMs into patent document summarization provides significant benefits in terms of accuracy and efficiency. Large language models (LLMs) have significantly impacted the field of automatic abstracting, especially in the context of patent document abstracting. The use of LLMs in summarizing patent documents has various advantages and disadvantages, as follows: • LLMs are excellent at capturing the semantic meaning of text, which allows them to create contextually rich and coherent summaries [17], • LLMs are adept at handling the intricacies of patent documents, which often contain technical terminology and detailed information, resulting in more accurate and informative summaries [18], • LLMs demonstrate versatility in summarizing different types of documents, such as scientific literature, radiology reports, and patient problem lists [18], • LLMs have the ability to process voluminous input data, making them suitable for summarizing extensive patent documents without leaving out important details [17], • LLMs can be combined with knowledge graphs to improve generalization by integrating domain- specific knowledge and relationships extracted from the graph At the same time, the readability of summaries created using LLM may vary, making it difficult to ensure that all users, including non-specialists, can easily understand [Decker, 2023]. According to Decker [2023], graduates may have difficulty ensuring that their resumes are complete, especially when gathering all the relevant details from lengthy and detailed patent documents. Bush [2024] points out that LLMs can reflect biases present in the training data, potentially affecting the quality and objectivity of the summaries produced. The comprehension of LLM-based summaries can be a concern because understanding the rationale behind a particular summary generated by the model can be challenging [19]. 2.1. Integration of LLM and KG in patent document summarization As a solution to these problems, this paper proposes an integrated approach to summarizing patent documents using LLM with knowledge graphs. Knowledge graphs have become integral to various applications, gaining popularity particularly after the comprehensive representation by [20]. The concept of "Knowledge Graph" dates back to 1972 with the work of Schneider, E. W. Today, knowledge graphs are widely utilized in implementations such as the Google Knowledge Graph, Airbnb, and other social application analyses. Despite their broad applications, there has been limited research focusing specifically on patent datasets. By combining the power of understanding of the LLM with the structured representation of knowledge in the form of graphs, you can create more accurate, informative, and contextually rich summaries of patent documents. Large language models, such as pre-trained encoders [21], have shown promise in text summarization tasks. These models can effectively capture the semantic meaning of text and generate coherent summaries. Knowledge graphs play a critical role in organizing and presenting structured information. Research has shown that incorporating knowledge graphs into language models can improve the representation of domain information [22]. Research on question-answering systems has demonstrated the effectiveness of combining large language models with knowledge graphs to improve answer selection [23]. By using semantic relationships encoded in knowledge graphs, the system can better understand and extract relevant information to answer questions. This approach can be extended to patent document summarization to extract key ideas and relationships embedded in patent texts. [24] proposes the framework for predictive summarization highlighting the possible areas of patent applications in the future by analyzing the past data in the legal domain. Moreover, leveraging external knowledge from open source knowledge graphs has been shown to improve the performance of document summarization models [25]. By incorporating external knowledge into the summarization process, the model gains access to a broader context and can generate more informative and complete summaries. Several studies have explored the integration of knowledge graphs into patent-related tasks, demonstrating their potential benefits. [26] provided a comprehensive overview of knowledge graph embedding approaches and applications, highlighting the importance of structured knowledge representation. [27] discusses the role of knowledge graphs in integrating health information and their application in healthcare, highlighting the benefits of incorporating knowledge graphs into data analytics. [28] implement knowledge proximity in a US patent database using a knowledge graph called “PatNet” built from patent metadata. Chen and Deng [2023] propose an accountable patent recommendation method using a knowledge graph to demonstrate the practical application of structured knowledge in patent-related problems. [29] extract knowledge graphs from patent applications, filling the gap in knowledge extraction from patent documents. [30] use text mining and machine learning techniques to visualize processes in utility chemical patents, demonstrating the utility of knowledge graphs in patent analysis. Additionally, [31] extend patent searching to include text embedding and knowledge graphs, emphasizing the importance of capturing various aspects of knowledge in patent documents. In [3], the researchers highlighted that patent summarizations could be significantly enhanced using conceptual knowledge graphs. Similarly, [32] demonstrated the implementation of knowledge graphs using Open-IE to produce summaries through Cloze Reward. This paper builds on these studies by integrating language models to generate human-readable summaries from patent documents. LAUREN, as detailed by [33], focuses on summarizing knowledge graphs specifically for question answering systems. This approach leverages graph-based techniques to resize knowledge graphs, making them more manageable and tailored for answering user queries effectively. The summarization process ensures that the key information necessary for answering questions is retained while extraneous details are minimized. [34] addresses the summarization of knowledge graphs for search purposes. Their work proposes a method to compute a summarization set that provides maximal descriptive power with minimal redundancy. This approach enhances the efficiency of knowledge graph searches by presenting users with the most relevant information. [35] discuss the concept of adaptive summarization to create personalized knowledge graphs. This method adapts the summarization process to the user’s Figure 1: The proposed system architecture for producing KG Summary specific information needs, ensuring that the resulting knowledge graph is both concise and highly relevant. The adaptability of this method makes it particularly useful for applications requiring tailored knowledge delivery. The GATES framework, presented by [36], utilizes graph attention networks for entity summarization within knowledge graphs. This technique addresses the challenge of summarizing entities by focusing on the most significant relationships and attributes. The use of attention mechanisms allows for the generation of summaries that are both comprehensive and focused on the most pertinent aspects of the knowledge graph. These studies collectively advance the field of knowledge graph summarization by addressing various challenges such as customization, scalability, and efficiency. They provide a solid foundation for further research and development in creating concise, informative, and user-tailored knowledge graph summaries. 3. Methodology 3.1. Building the Patent Knowledge Graph The initial step involves constructing the knowledge graph by creating triplets that represent entities and relationships extracted from patent documents. This requires identifying subjects, actions, and objects within the text. Patent documents, characterized by complex sentence structures and noun- heavy content, present a challenge for entity extraction. Preliminary attempts using BERT models proved inadequate for the patent dataset. To address this, we employ llama-3 pre-trained language model1 with the parameter of 8.0B, which effectively identifies entities and relationships from the text, specifically from the description and claims sections of patents. The description section provides key terms, while the claims section offers detailed information about the invention, both crucial for generating a comprehensive summary. By excluding other sections, we aim to minimize the hallucination effect of the language model. The extracted data is then used to construct a knowledge graph representing the core elements of the patent. 1 pre-trained llama3 model is available to download online at: https://ollama.com/library/llama3 Figure 2: Knowledge Graph of Patent Number US14996316 after constructing in the Neo4J Database The knowledge graph is created in the Neo4j database using Cypher queries with the extracted relationships and entities on the previous step, ensuring a structured and accessible representation of the patent as shown in Figure 2. 3.2. Generating Summaries from the Knowledge Graph With the knowledge graph established, the next step involves generating summaries using the language model (llama-3:8.0B). This involves transforming the structured data from the knowledge graph into a human-readable format that emphasizes conciseness and informativeness. To obtain an effective abstractive summary, it is essential to instruct the language model with a custom prompt. This step ensures the generation of a coherent summary rather than a disjointed question-and-answer format. Additionally, Cypher queries are utilized to retrieve all relevant nodes and relationships from the knowledge graph, facilitating the creation of a comprehensive summary. This methodology leverages the strengths of both knowledge graphs and advanced language models, aiming to produce high-quality, informative summaries of patent documents, thereby enhancing the accessibility and understanding of patent information. 3.2.1. Baseline Method: Language Model Summarization The performance of the proposed method is compared against a baseline summarizing technique that solely utilizes a language model for document summarization. The baseline approach is straightforward: the language model is employed to summarize the content of the patent documents to generate a relevant summarization. We defined this baseline to suit the summarization of one document rather than multiple documents. In this step, we come to notice that we can use multiple document summarization using RAG architecture, which we aims to do the further research. Nevertheless, in this current baseline, the user input is the patent document itself, which is then processed to retrieve relevant sections within the document (such as the description and claims). These sections are fed into the language model, which generates a cohesive summary of the patent. Our adaptation involves ensuring that the retrieval mechanism focuses on internal sections of the patent document, effectively treating it as a self-contained database. This approach leverages the structure of the patent document to enhance the quality of the generated summary. This summary is intended to capture the essence of the invention described in the patent document. The simplicity of this approach allows for a clear comparison with more complex methods. Algorithm 1 Summarizing Patent Documents Using Knowledge RAG and Llama3 Input: HUPD dataset, Neo4j credentials, patent number. Output: Generation of the structured summary of the patent document. 1: Load HUPD dataset 2: Filter patents with required labels 3: Initialize Llama3 model 4: Define prompt to extract entities/relationships 5: Process text with Llama3 6: if extraction fails then 7: return Error 8: end if 9: Parse and sanitize data 10: if parsing fails then 11: return Error 12: end if 13: Create Neo4j knowledge graph 14: Define prompt for summary generation 15: Execute Cypher query to retrieve data 16: Format data for input 17: if formatting fails then 18: return Error 19: end if 20: Generate summary using Llama3 21: if generation fails then 22: return Error 23: end if 24: Evaluate summary using ROUGE metric 25: return Structured summary 4. Dataset We utilized the HUPD (Harvard University Patent Dataset), which provides an extensive range of raw patent claim texts. This dataset was chosen over other prominent datasets such as BIGPATENT2 , CLEF-IP 2011, and USPTO-2M for several key reasons. Firstly, HUPD houses approximately 4.5 million patent documents, significantly more than BIGPATENT’s 1.3 million [37]. This expansive volume offers a richer and more varied base for analysis. Secondly, HUPD encompasses 34 distinct sections, including claims, offering a more comprehensive view compared to BIGPATENT’s limitation to just 4 sections. Lastly, unlike the pre-tokenized texts in BIGPATENT, HUPD’s provision of raw texts affords greater flexibility and depth in our analysis, allowing for a more tailored approach to data processing [38]. 2 BIGPATENT dataset is available to download online at: https://evasharma.github.io/bigpatent/ To test our proposed method, we randomly selected approximately 10 documents from the ’G’ category of the CPC code (Physics) within the HUPD dataset as representative samples for our study. We generated both baseline summaries and knowledge graph summaries for each patent document and used ROUGE scores to track the performance metrics. Additionally, we employed human evaluation to assess the real-world clarity of the generated summaries. 5. Evaluation This section presents the results of our proposed method compared to the baseline summarization technique.The proposed method was evaluated using ROUGE scores to measure the quality of the generated summaries. The evaluation metrics included ROUGE-1, ROUGE-2, and ROUGE-L, which assess the overlap of n-grams and longest common subsequences between the generated and reference summaries. Moreover, we evaluated the performance with human evaluations to assess the quality and understandability of the summaries generated. 5.1. Quantitative Result The ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metric, introduced by Chin-Yew Lin in their pivotal work, is a cornerstone in the field of natural language processing, particularly for evaluating automatic summarization and machine translation [39]. ROUGE assesses the quality of a summary by comparing it with reference summaries. ROUGE scores are often reported in terms of precision, recall, and F1 score, providing a comprehensive evaluation framework: • Precision measures the fraction of n-grams in the generated summary that are present in the reference summary. • Recall assesses the fraction of n-grams in the reference summary that appear in the generated summary. • F1 Score is the harmonic mean of precision and recall, offering a balanced metric. The formulas for these metrics are as follows: Number of overlapping n-grams Precision = (1) Total n-grams in the generated summary Number of overlapping n-grams Recall = (2) Total n-grams in the reference summary Precision × Recall 𝐹1 = 2 × (3) Precision + Recall Table 1 shows the ROUGE F1 scores for both the baseline and the proposed method. The baseline method outperforms the proposed method across all ROUGE metrics, indicating a higher F1 score for summarization. Table 1 ROUGE F1 Scores Comparison of Patent Number US14996316 Method ROUGE-1 ROUGE-2 ROUGE-L Baseline 0.461538 0.235294 0.461538 KG Sum 0.343434 0.136054 0.323232 5.2. Human Evaluation Result The human evaluation of our summarization method was a critical component of our research. We aimed to determine the effectiveness of our Knowledge Graph (KG) based summaries in terms of under- standability and informativeness. To this end, we conducted an evaluation involving 12 respondents who reviewed summaries of 10 different patent documents. We selected 10 patent documents from the Harvard University Patent dataset [38] to ensure a diverse and representative sample. For each patent, three types of summaries were generated: the original summary, a baseline summary, and the KG-based summary. Each respondent was presented with these three summaries for each patent document without being informed of their origins to prevent any bias. 5.2.1. Voting Process Respondents were asked to vote for the summary they found easiest to understand and most informative. The summaries were presented in a randomized order to eliminate order effects. The voting was conducted using a simple voting system where respondents chose one summary out of the three options. The criteria for voting were clearly defined for understandability: how easily the summary could be comprehended, and, informativeness: how well the summary conveyed the essential information of the patent document. To ensure the integrity of the evaluation, respondents were chosen from a pool of individuals with no prior experience in reading or writing patent documents. This choice was made to simulate the perspective of an average reader who might not have specialized knowledge. Table 2 The voting result of Human Evaluation Patent KGSUM Original BaselineSUM Result US14997116 9 0 3 KGSUM US14986817 5 0 7 BaselineSUM US14990669 8 1 0 KGSUM US14998490 2 1 4 BaselineSUM US14990535 11 1 0 KGSUM US14989021 7 2 3 KGSUM US15000497 5 1 6 BaselineSUM US14994324 6 2 4 KGSUM US15001450 6 3 3 KGSUM US14996316 8 2 2 KGSUM The human-evaluation results are summarized in Table 2. The KG-based summaries received the highest number of votes in most cases, indicating a clear preference for their comprehensibility and informativeness over the other two types of summaries. 5.3. Discussion of Results The results from the ROUGE F1 score evaluations indicate that the baseline method outperforms the proposed Knowledge Graph (KG)-based summarization technique in terms of automated metrics. ROUGE scores measure the overlap between the generated summaries and reference summaries, focusing on precision, recall, and F1 score. However, the positive feedback from human evaluations suggests that the proposed KG-based method generates clearer, more coherent, and more informative summaries. This contrast highlights the importance of human evaluations in assessing summarization quality, as they capture aspects of readability and informativeness that automated metrics might miss. Notably, for Patent Number "US14990535," the baseline method failed to produce a summary, likely due to hallucination effects commonly observed in large language models (LLMs). In contrast, the KG-based method successfully generated a precise and informative summary from the original patent document, demonstrating its robustness. It is important to note that our primary focus is on understandability and informativeness as perceived by human readers, rather than solely relying on automated metrics like ROUGE scores. This approach ensures that the summaries effectively convey the essence of the inventions described in the patents. The integration of knowledge graphs and the adaptation of the RAG architecture have proven effective in enhancing the readability and informativeness of patent document summarization. These findings underscore the value of human-centered evaluations in developing summarization tools. 6. Limitations Despite the promising results, our research faced several challenges and limitations. The primary limitation stems from the reliance on large language models (LLMs) LLaMA3 for both entity recognition and summary generation introduces potential vulnerabilities. If the LLM fails to accurately produce entities, the subsequent knowledge graph cannot be generated correctly. Any errors in the entity recognition process can lead to inaccuracies in the final summary. This dependency on LLMs introduces a potential source of error that can affect the overall performance of the summarization method. Additionally, the LLM used for generating summaries, specifically LLaMA3, may produce sub-optimal results if the prompts are not clearly defined or easy to understand. This can result in summaries that do not accurately reflect the content of the original document. Another limitation is the language support; currently, our method is only capable of handling English documents, which restricts its applicability to patents written in other languages. Moreover, our approach is designed for single-document summarization. This limits its effectiveness when dealing with multiple documents, which is a common scenario in patent analysis and other legal contexts. Addressing this limitation is a key area for future research. Nevertheless, the human evaluation results suggest that despite the lower ROUGE scores, the sum- maries generated by the proposed method are more understandable and informative to readers unfamiliar with patent documents. This highlights the importance of human-centered evaluation metrics in as- sessing the effectiveness of summarization methods, as automated metrics like ROUGE may not fully capture the nuances of human understanding. 7. Future Work To address the limitations identified, future work will focus on several key areas of improvement: Enhanced Entity Recognition: We plan to develop a dedicated entity recognition model trained on annotated patent data. By creating a custom model tailored to the specificities of patent documents, we aim to improve the accuracy of entity recognition and, consequently, the quality of the knowledge graphs and summaries. Improved LLM Training: We will train the LLM, such as LLaMA3, on a specialized legal database to enhance its performance in summarization tasks. This targeted training will help mitigate issues arising from unclear prompts and improve the model’s ability to generate accurate and informative summaries. Multilingual Support: Expanding our model to support multiple languages is a priority. This involves training the entity recognition and summarization models on multilingual datasets to ensure accurate performance across different languages. Multi-Document Summarization: Developing our method to handle multi-document summa- rization will significantly increase its applicability. We plan to extend the architecture to aggregate and summarize information from multiple documents, which is particularly important in the legal and patent analysis fields. By addressing these areas, we aim to refine our summarization approach, making it more robust, versatile, and applicable to a wider range of documents and languages. This will enhance its utility in practical, real-world scenarios, particularly in legal and technical domains. 8. Conclusion In this paper, we introduced a novel summarization approach that integrates knowledge graphs and large language models (LLMs) to enhance the readability and informativeness of patent document summaries. Our main contribution lies in generating summaries that are easier to understand while maintaining the key inventions from the patent documents. Additionally, our research contributes towards advancing the legal technology by analyzing entities in the patent domain and producing structured data through knowledge graphs combined with the contextual understanding of LLMs for automatic summarization tasks. We emphasize the importance of human-centered evaluations and contextual understandability over automated evaluations when producing summaries. Despite the limitations, our research lays a foundation for future advancements in patent summarization technology, aiming to develop a more robust tool for analyzing complex technical documents in legal domain. Acknowledgments First and foremost, we would like to express our gratitude to the creators of the Harvard University Patent Dataset (HUPD) for providing the legal dataset as publicly available, which was instrumental in our study. We also appreciate the developers of the Llama 3 model, which we utilized for testing the knowledge graph summarization on the legal dataset. Lastly, a big token of appreciations to the testimonies who participated in the human evaluation of the summaries. Their insights and feedback were invaluable in assessing the quality and readability of our summarization methods. References [1] A. J. Trappey, C. V. Trappey, C.-Y. Wu, Automatic patent document summarization for collaborative knowledge systems and services, Journal of Systems Science and Systems Engineering 18 (2009) 71–94. URL: https://doi.org/10.1007/s11518-009-5100-7. doi:10.1007/s11518-009-5100-7. [2] G. Kumaravel, S. Sankaranarayanan, Pqps: Prior-art query-based patent summarizer using rbm and bi-lstm, Mobile Information Systems 2021 (2021) 2497770. URL: https://doi.org/10.1155/2021/ 2497770. doi:10.1155/2021/2497770. [3] P. Rk, S. Lalithadevi, Patent document summarization using conceptual graphs, SSRN Electronic Journal (2017). doi:10.2139/ssrn.3441645. [4] Y.-H. Tseng, Y.-W. Wang, Y.-I. Lin, C.-H. Lin, D.-R. Juang, Patent surrogate extraction and evaluation in the context of patent mapping, Journal of Information Science 33 (2007) 718–736. URL: https: //doi.org/10.1177/0165551507077406. doi:10.1177/0165551507077406. [5] G. Guarino, A. Samet, A. Nafi, D. Cavallucci, Summatriz : Summarization networks for min- ing patent contradiction, in: 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), 2020, pp. 979–986. URL: https://doi.org/10.1109/ICMLA51294.2020.00159. doi:10.1109/ICMLA51294.2020.00159. [6] A. J. Trappey, C. V. Trappey, An r&d knowledge management method for patent document summarization, Industrial Management & Data Systems 108 (2008) 245–257. URL: https://doi.org/ 10.1108/02635570810847608. doi:10.1108/02635570810847608. [7] K. Takano, M. Tanaka, H. Sakai, R. Kitajima, T. Ota, C. Tanabe, H. Sakaji, Extracting characteristic terms from patent documents, International Journal of Smart Computing and Artificial Intelligence 4 (2020) 19–38. URL: https://doi.org/10.52731/ijscai.v4.i2.568. doi:10.52731/ijscai.v4.i2.568. [8] C.-H. Lee, H.-C. Yang, C.-H. Wu, Y.-J. Li, A multilingual patent text-mining approach for computing relatedness evaluation of patent documents, in: 2009 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2009, pp. 612–615. doi:10.1109/IIH-MSP. 2009.162. [9] W. Li, X. Xiao, J. Liu, H. Wu, H. Wang, J. Du, Leveraging graph to improve abstractive multi- document summarization, in: D. Jurafsky, J. Chai, N. Schluter, J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computa- tional Linguistics, Online, 2020, pp. 6232–6243. URL: https://aclanthology.org/2020.acl-main.555. doi:10.18653/v1/2020.acl-main.555. [10] Y.-L. Chen, Y.-T. Chiu, Cross-language patent matching via an international patent classification-based concept bridge, Journal of Information Science 39 (2013) 737– 753. URL: https://doi.org/10.1177/0165551513494641. doi:10.1177/0165551513494641. arXiv:https://doi.org/10.1177/0165551513494641. [11] J. Codina-Filbà, N. Bouayad-Agha, A. Burga, G. Casamayor, S. Mille, A. Müller, H. Saggion, L. Wan- ner, Using genre-specific features for patent summaries, Information Processing and Management 53 (2017) 151–174. URL: https://www.sciencedirect.com/science/article/pii/S0306457316302825. doi:https://doi.org/10.1016/j.ipm.2016.07.002. [12] H.-F. Wang, Y.-H. Chang, P.-Y. Hsin, Constructing patent maps using text mining to sustainably detect potential technological opportunities, Sustainability 10 (2018) 3729. URL: https://doi.org/10. 3390/su10103729. doi:10.3390/su10103729. [13] S. Sonawane, P. Kulkarni, Graph based representation and analysis of text document: A survey of techniques, International Journal of Computer Applications 96 (2014) 1–8. doi:10.5120/ 16899-6972. [14] S. Sonawane, P. Kulkarni, C. Deshpande, B. Athawale, Extractive summarization using semigraph (essg), Evolving Systems 10 (2019) 409–424. URL: https://doi.org/10.1007/s12530-018-9246-8. doi:10.1007/s12530-018-9246-8. [15] H. Nonaka, A. Kobayashi, H. Sakaji, Y. Suzuki, H. Sakai, S. Masuyama, Extraction of the effect and the technology terms from a patent document, in: Proceedings of the 2010 International Conference on Computational Intelligence and Education (ICCIE), 2010. URL: https://doi.org/10. 1109/iccie.2010.5668167. doi:10.1109/iccie.2010.5668167. [16] J. Ma, A. L. Porter, Analyzing patent topical information to identify technology pathways and potential opportunities, Scientometrics 102 (2014) 811–827. URL: https://doi.org/10.1007/ s11192-014-1392-6. doi:10.1007/s11192-014-1392-6. [17] S. Bao, Chain-of-event prompting for multi-document summarization by large language models, International Journal of Web Information Systems 20 (2024) 229–247. URL: https://doi.org/10.1108/ ijwis-12-2023-0249. doi:10.1108/ijwis-12-2023-0249. [18] C. Williams, Evaluating large language models for drafting emergency department discharge summaries (2024). URL: https://doi.org/10.1101/2024.04.03.24305088. doi:10.1101/2024.04.03. 24305088. [19] S. O’Neil, Phenomics assistant: an interface for llm-based biomedical knowledge graph exploration (2024). URL: https://doi.org/10.1101/2024.01.31.578275. doi:10.1101/2024.01.31.578275. [20] A. Hogan, E. Blomqvist, M. Cochez, C. D’amato, G. D. Melo, C. Gutierrez, S. Kirrane, J. E. L. Gayo, R. Navigli, S. Neumaier, A.-C. N. Ngomo, A. Polleres, S. M. Rashid, A. Rula, L. Schmelzeisen, J. Sequeda, S. Staab, A. Zimmermann, Knowledge graphs, ACM Computing Surveys 54 (2021) 1–37. URL: http://dx.doi.org/10.1145/3447772. doi:10.1145/3447772. [21] Y. Liu, M. Lapata, Text summarization with pretrained encoders, in: K. Inui, J. Jiang, V. Ng, X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP- IJCNLP), Association for Computational Linguistics, Hong Kong, China, 2019, pp. 3730–3740. URL: https://aclanthology.org/D19-1387. doi:10.18653/v1/D19-1387. [22] W. Liu, P. Zhou, Z. Zhao, Z. Wang, Q. Ju, H. Deng, P. Wang, K-bert: enabling language representation with knowledge graph, in: Proceedings of the AAAI Conference on Artifi- cial Intelligence, volume 34, 2020, pp. 2901–2908. URL: https://doi.org/10.1609/aaai.v34i03.5681. doi:10.1609/aaai.v34i03.5681. [23] N. Hu, Y. Wu, G. Qi, D. Min, J. Chen, J. Pan, A. Zafar, An empirical study of pre-trained language models in simple knowledge graph question answering (2022). URL: https://doi.org/10.21203/rs.3. rs-2184834/v1. doi:10.21203/rs.3.rs-2184834/v1. [24] M. Uchkempirov, P. Shoon Lei, L. R. Aadhya, P. Kulkarni, A survey of automatic text summarization systems, applications, challenges, and future trends, International Journal of Com- puter Applications 186 (2024) 8–15. URL: https://ijcaonline.org/archives/volume186/number23/ a-survey-of-automatic-text-summarization-systems-applications-challenges-and-future-trends/. doi:10.5120/ijca2024923672. [25] T. Tang, T. Yuan, X. Tang, D. Chen, Incorporating external knowledge into unsupervised graph model for document summarization, Electronics 9 (2020). URL: https://www.mdpi.com/2079-9292/ 9/9/1520. doi:10.3390/electronics9091520. [26] Q. Wang, Z. Mao, B. Wang, L. Guo, Knowledge graph embedding: a survey of approaches and applications, IEEE Transactions on Knowledge and Data Engineering 29 (2017) 2724–2743. URL: https://doi.org/10.1109/tkde.2017.2754499. doi:10.1109/tkde.2017.2754499. [27] A. Akinnubi, Knowledge graph: a survey (2023). URL: https://doi.org/10.31219/osf.io/8xm2g. doi:10.31219/osf.io/8xm2g. [28] G. Li, L. Siddharth, J. Luo, Embedding knowledge graph of patent metadata to measure knowledge proximity, Journal of the Association for Information Science and Technology 74 (2023) 476–490. URL: https://doi.org/10.1002/asi.24736. doi:10.1002/asi.24736. [29] L. Siddharth, L. Blessing, K. Wood, J. Luo, Engineering knowledge graph from patent database, Journal of Computing and Information Science in Engineering 22 (2021). URL: https://doi.org/10. 1115/1.4052293. doi:10.1115/1.4052293. [30] A. J. C. Trappey, C. V. Trappey, C.-W. Liang, H.-S. Lin, Ip analytics and machine learning applied to create process visualization graphs for chemical utility patents, Processes 9 (2021) 1342. URL: https://doi.org/10.3390/pr9081342. doi:10.3390/pr9081342. [31] W. Du, Q. Yan, W. Zhang, J. Ma, Leveraging online behaviors for interpretable knowledge- aware patent recommendation, Internet Research 32 (2021) 568–587. URL: https://doi.org/10.1108/ intr-08-2020-0473. doi:10.1108/intr-08-2020-0473. [32] L. Huang, L. Wu, L. Wang, Knowledge graph-augmented abstractive summarization with semantic- driven cloze reward, in: D. Jurafsky, J. Chai, N. Schluter, J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online, 2020, pp. 5094–5107. URL: https://aclanthology.org/2020.acl-main.457. doi:10. 18653/v1/2020.acl-main.457. [33] R. Jalota, D. Vollmers, D. Moussallem, A.-C. N. Ngomo, Lauren - knowledge graph summarization for question answering, in: 2021 IEEE 15th International Conference on Semantic Computing (ICSC), 2021, pp. 221–226. doi:10.1109/ICSC50631.2021.00047. [34] Q. Song, Y. Wu, P. Lin, L. X. Dong, H. Sun, Mining summaries for knowledge graph search, IEEE Transactions on Knowledge and Data Engineering 30 (2018) 1887–1900. doi:10.1109/TKDE. 2018.2807442. [35] de, T. Safavi, Adaptive personalized knowledge graph summarization, 2018. URL: https://api. semanticscholar.org/CorpusID:52510379. [36] A. F. Firmansyah, D. Moussallem, A.-C. N. Ngomo, Gates: Using graph attention networks for entity summarization, in: Proceedings of the 11th Knowledge Capture Conference, K-CAP ’21, Association for Computing Machinery, New York, NY, USA, 2021, p. 73–80. URL: https: //doi.org/10.1145/3460210.3493574. doi:10.1145/3460210.3493574. [37] A. C. Sharma, C. Li, J. Wang, C.-Y. Huang, Z. Hu, Bigpatent: A large-scale dataset for abstractive and coherent summarization, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 2204–2213. URL: https://doi.org/10.18653/v1/P19-1213. doi:10.18653/v1/P19-1213. [38] M. Suzgun, L. Melas-Kyriazi, S. K. Sarkar, S. D. Kominers, S. M. Shieber, The harvard uspto patent dataset: A large-scale, well-structured, and multi-purpose corpus of patent applications, 2022. URL: https://arxiv.org/abs/2207.04043. [39] C.-Y. Lin, ROUGE: A package for automatic evaluation of summaries, in: Text Summarization Branches Out, Association for Computational Linguistics, Barcelona, Spain, 2004, pp. 74–81. URL: https://aclanthology.org/W04-1013. A. Summaries for Evaluation In this appendix section, the summaries titled "Original Summary, KG Summary (our approach), and Baseline Summary" are presented. These summaries are evaluated against each other to the testimonies to evaluate which summary is easier to understand by voting. Below are the summaries evaluated for Patent No: US14997116. Original Summary Systems, methods, and non-transitory computer readable media are configured to receive a resume corpus. A machine learning model is trained based on terms from the resume corpus. A job title for a user is determined based on profile information provided to the model. KG Summary A machine learning model was trained to analyze profile information from job applicants, which was provided to the model via a computer system. The computer system received resumes as input data and used this information to train the model. This training process enables the model to learn patterns and correlations between resume data and job title. The organization recruited for by a specific job title, leveraging the trained machine learning model to improve its hiring processes. Baseline Summary The invention relates to a system and method for determining a job title for a user based on their profile information. A machine learning model is trained using terms from a resume corpus, and then the user’s profile information is provided to the model to determine their most likely job title. The model uses techniques such as vector space creation to represent terms in the resume corpus, and anchor points are used to identify the closest match for each term. We have presented the other patent summaries, that we used in evaluation and generated from our approach, in the online resource. As our approach uses prompts in order to generate the nodes and entities from the original documents, we have fully provided the prompts we used as well as the feature processing functions before we proceed to the graph construction. B. Online Resources The code for this paper is available at : • GitHub