Augmenting Patent Summarization using Large Language Model with Knowledge Graph

Augmenting Patent Summarization using Large Language Model with Knowledge Graph ShoonLeiPhyu Tokyo International University

4-42-31 Higashi-Ikebukuro 170-0013 Toshima, Tokyo Japan

MuratalyUchkempirov Tokyo International University

4-42-31 Higashi-Ikebukuro 170-0013 Toshima, Tokyo Japan

MayeshaMalihaProma Tokyo International University

4-42-31 Higashi-Ikebukuro 170-0013 Toshima, Tokyo Japan

ParagKulkarni pkulkarni@al.tiu.ac.jp Tokyo International University

4-42-31 Higashi-Ikebukuro 170-0013 Toshima, Tokyo Japan

Augmenting Patent Summarization using Large Language Model with Knowledge Graph 1613-0073 58B7489EE99782BC9C1D344450641035 GROBID - A machine learning software for extracting information from scholarly documents Patent Summarization Natural Language Processing Knowledge Graph Knowledge Graph Summarization

The increasing quantity and intricacy of patent papers need the creation of effective and precise summary techniques. The scientific and comprehensive nature of patents is often excessive for traditional summary techniques and machine learning-based approaches. This paper explores the integration of large language models (LLMs) with knowledge graphs (KGs) to improve patent document summarization. LLMs such as GPT-4 provide advanced language understanding and generation capabilities, but can cause problems with domainspecific content. Knowledge graphs offer a structured representation of knowledge that allows domain-specific information to be included in the summarization process. The goal of this integration is to enhance the readability and informativeness of patent summaries. We then propose a framework for combining LLM with KG and evaluate its performance compared to traditional baseline summarization technique using both ROUGE scores and human evaluations. The results demonstrate a significant improvement in the quality of patent summary, highlighting the potential of this approach to produce more informative and accurate reviews of complex technical documents. This research contributes to the development of more robust artificial intelligence summarization system capable of performing complex data interpretation and decision making in specialized areas.

Introduction

Patent documents are a rich source of technical and legal information, but their complexity often makes them difficult to understand quickly. Traditional summarization methods struggle with the specificity and technicality of patent language. Recent advancements in large language models (LLMs) and knowledge graphs (KGs) offer new opportunities for improving patent summarization. This paper explores the augmentation of LLM-based patent summarization using knowledge graphs to generate more concise and knowledgeable summaries. The primary focus is on leveraging Ollama3 for entity and relationship extraction to build a patent knowledge graph and subsequently using it to generate summaries. This approach aims to enhance the readability and informativeness of patent summaries compared to traditional methods.

Literature Review

Automatic patent document summarization is a critical task that has received significant research attention. Several methods have been proposed to solve this problem. For example, [1] presented a non-ontology-based document summarization method that uses keyword recognition and significant information density to extract paragraphs that best represent the patent text. [1,2] presented a new approach called PQPS, a state-of-the-art query-based patent summarizer that uses RBM and Bi-LSTM models for patent summarization. [2,3] used surface mining, NER, and machine learning techniques to extract concepts and conceptual relationships from the claims section of patent documents. Moreover, [4] described a summarization method for extracting surrogate patents to effectively support patent mapping. [5,4] examined the identification of conflicting proposals in patents using the BERT summarization method. Additionally, [6] discussed clustering patent generalizations into homogeneous groups for knowledge generation. Moreover, [7] studied feature term extraction from patent documents. [8] developed a multilingual patent text mining approach to compute relatedness scores of patent documents. [9] used graph representations to improve abstract summarization of multiple documents, demonstrating the potential of graph-based methods in summarization tasks.

Despite progress in this area, several shortcomings in patent document summarization methods have been identified, highlighting the need for more sophisticated and efficient methods to address them. The literature has noted the shortcomings of methods for summarizing patent documentation. According to [4] there is a growing demand for automated tools to support patent engineers and decision makers in patent analysis. Traditional methods face problems such as difficulties in natural language translation and the lack of parallel sets of bilingual documents [10]. Existing patent summarization strategies often fail due to the unique characteristics of patents and the emphasis on specific sections, such as claims, while ignoring important details from other sections [11]. In addition, the huge number of patent documents leads to information overload, which requires more effective summarization and analysis methods [12]. [13] provided a survey on Graph-based representation for text analysis and found out that the graph shows superiority over traditional methods. The author also suggested the novel approach in construction of semi-graph and sentence extraction, which helps finding linear and non-linear relationship between features [14].

Despite advances in text abstracting, annotations generated for patent documents still do not match human annotations [2]. In addition, the complexity and length of patents, as well as the requirement for domain knowledge, contribute to their under-utilization [5]. Moreover, limitations of existing patent analysis tools hinder the creation of impact technology patent maps, highlighting the need for improved methods for efficiently extracting information from patent documents [15]. Problems associated with cryptic descriptions in patent texts limit the effectiveness and comprehensiveness of patent analysis [16].

Several studies show that integrating LLMs into patent document summarization provides significant benefits in terms of accuracy and efficiency. Large language models (LLMs) have significantly impacted the field of automatic abstracting, especially in the context of patent document abstracting. The use of LLMs in summarizing patent documents has various advantages and disadvantages, as follows:

• LLMs are excellent at capturing the semantic meaning of text, which allows them to create contextually rich and coherent summaries [17], • LLMs are adept at handling the intricacies of patent documents, which often contain technical terminology and detailed information, resulting in more accurate and informative summaries [18], • LLMs demonstrate versatility in summarizing different types of documents, such as scientific literature, radiology reports, and patient problem lists [18], • LLMs have the ability to process voluminous input data, making them suitable for summarizing extensive patent documents without leaving out important details [17], • LLMs can be combined with knowledge graphs to improve generalization by integrating domainspecific knowledge and relationships extracted from the graph At the same time, the readability of summaries created using LLM may vary, making it difficult to ensure that all users, including non-specialists, can easily understand [Decker, 2023]. According to Decker [2023], graduates may have difficulty ensuring that their resumes are complete, especially when gathering all the relevant details from lengthy and detailed patent documents. Bush [2024] points out that LLMs can reflect biases present in the training data, potentially affecting the quality and objectivity of the summaries produced. The comprehension of LLM-based summaries can be a concern because understanding the rationale behind a particular summary generated by the model can be challenging [19].

Integration of LLM and KG in patent document summarization

As a solution to these problems, this paper proposes an integrated approach to summarizing patent documents using LLM with knowledge graphs. Knowledge graphs have become integral to various applications, gaining popularity particularly after the comprehensive representation by [20]. The concept of "Knowledge Graph" dates back to 1972 with the work of Schneider, E. W. Today, knowledge graphs are widely utilized in implementations such as the Google Knowledge Graph, Airbnb, and other social application analyses. Despite their broad applications, there has been limited research focusing specifically on patent datasets. By combining the power of understanding of the LLM with the structured representation of knowledge in the form of graphs, you can create more accurate, informative, and contextually rich summaries of patent documents. Large language models, such as pre-trained encoders [21], have shown promise in text summarization tasks. These models can effectively capture the semantic meaning of text and generate coherent summaries. Knowledge graphs play a critical role in organizing and presenting structured information. Research has shown that incorporating knowledge graphs into language models can improve the representation of domain information [22]. Research on question-answering systems has demonstrated the effectiveness of combining large language models with knowledge graphs to improve answer selection [23]. By using semantic relationships encoded in knowledge graphs, the system can better understand and extract relevant information to answer questions. This approach can be extended to patent document summarization to extract key ideas and relationships embedded in patent texts. [24] proposes the framework for predictive summarization highlighting the possible areas of patent applications in the future by analyzing the past data in the legal domain.

Moreover, leveraging external knowledge from open source knowledge graphs has been shown to improve the performance of document summarization models [25]. By incorporating external knowledge into the summarization process, the model gains access to a broader context and can generate more informative and complete summaries. Several studies have explored the integration of knowledge graphs into patent-related tasks, demonstrating their potential benefits. [26] provided a comprehensive overview of knowledge graph embedding approaches and applications, highlighting the importance of structured knowledge representation. [27] discusses the role of knowledge graphs in integrating health information and their application in healthcare, highlighting the benefits of incorporating knowledge graphs into data analytics. [28] implement knowledge proximity in a US patent database using a knowledge graph called "PatNet" built from patent metadata. Chen and Deng [2023] propose an accountable patent recommendation method using a knowledge graph to demonstrate the practical application of structured knowledge in patent-related problems. [29] extract knowledge graphs from patent applications, filling the gap in knowledge extraction from patent documents. [30] use text mining and machine learning techniques to visualize processes in utility chemical patents, demonstrating the utility of knowledge graphs in patent analysis. Additionally, [31] extend patent searching to include text embedding and knowledge graphs, emphasizing the importance of capturing various aspects of knowledge in patent documents.

In [3], the researchers highlighted that patent summarizations could be significantly enhanced using conceptual knowledge graphs. Similarly, [32] demonstrated the implementation of knowledge graphs using Open-IE to produce summaries through Cloze Reward. This paper builds on these studies by integrating language models to generate human-readable summaries from patent documents. LAUREN, as detailed by [33], focuses on summarizing knowledge graphs specifically for question answering systems. This approach leverages graph-based techniques to resize knowledge graphs, making them more manageable and tailored for answering user queries effectively. The summarization process ensures that the key information necessary for answering questions is retained while extraneous details are minimized. [34] addresses the summarization of knowledge graphs for search purposes. Their work proposes a method to compute a summarization set that provides maximal descriptive power with minimal redundancy. This approach enhances the efficiency of knowledge graph searches by presenting users with the most relevant information. [35] discuss the concept of adaptive summarization to create personalized knowledge graphs. This method adapts the summarization process to the user's specific information needs, ensuring that the resulting knowledge graph is both concise and highly relevant. The adaptability of this method makes it particularly useful for applications requiring tailored knowledge delivery. The GATES framework, presented by [36], utilizes graph attention networks for entity summarization within knowledge graphs. This technique addresses the challenge of summarizing entities by focusing on the most significant relationships and attributes. The use of attention mechanisms allows for the generation of summaries that are both comprehensive and focused on the most pertinent aspects of the knowledge graph.

These studies collectively advance the field of knowledge graph summarization by addressing various challenges such as customization, scalability, and efficiency. They provide a solid foundation for further research and development in creating concise, informative, and user-tailored knowledge graph summaries.

Methodology

Building the Patent Knowledge Graph

The initial step involves constructing the knowledge graph by creating triplets that represent entities and relationships extracted from patent documents. This requires identifying subjects, actions, and objects within the text. Patent documents, characterized by complex sentence structures and nounheavy content, present a challenge for entity extraction. Preliminary attempts using BERT models proved inadequate for the patent dataset.

To address this, we employ llama-3 pre-trained language model1 with the parameter of 8.0B, which effectively identifies entities and relationships from the text, specifically from the description and claims sections of patents. The description section provides key terms, while the claims section offers detailed information about the invention, both crucial for generating a comprehensive summary. By excluding other sections, we aim to minimize the hallucination effect of the language model. The extracted data is then used to construct a knowledge graph representing the core elements of the patent. The knowledge graph is created in the Neo4j database using Cypher queries with the extracted relationships and entities on the previous step, ensuring a structured and accessible representation of the patent as shown in Figure 2.

Generating Summaries from the Knowledge Graph

With the knowledge graph established, the next step involves generating summaries using the language model (llama-3:8.0B). This involves transforming the structured data from the knowledge graph into a human-readable format that emphasizes conciseness and informativeness. To obtain an effective abstractive summary, it is essential to instruct the language model with a custom prompt. This step ensures the generation of a coherent summary rather than a disjointed question-and-answer format. Additionally, Cypher queries are utilized to retrieve all relevant nodes and relationships from the knowledge graph, facilitating the creation of a comprehensive summary.

This methodology leverages the strengths of both knowledge graphs and advanced language models, aiming to produce high-quality, informative summaries of patent documents, thereby enhancing the accessibility and understanding of patent information.

Baseline Method: Language Model Summarization

The performance of the proposed method is compared against a baseline summarizing technique that solely utilizes a language model for document summarization. The baseline approach is straightforward: the language model is employed to summarize the content of the patent documents to generate a relevant summarization. We defined this baseline to suit the summarization of one document rather than multiple documents. In this step, we come to notice that we can use multiple document summarization using RAG architecture, which we aims to do the further research. Nevertheless, in this current baseline, the user input is the patent document itself, which is then processed to retrieve relevant sections within the document (such as the description and claims). These sections are fed into the language model, which generates a cohesive summary of the patent.

Our adaptation involves ensuring that the retrieval mechanism focuses on internal sections of the patent document, effectively treating it as a self-contained database. This approach leverages the structure of the patent document to enhance the quality of the generated summary. This summary is intended to capture the essence of the invention described in the patent document. The simplicity of this approach allows for a clear comparison with more complex methods.

Algorithm 1 Summarizing Patent Documents Using Knowledge RAG and Llama3

Input: HUPD dataset, Neo4j credentials, patent number.

Output:

Generation of the structured summary of the patent document.

Dataset

We utilized the HUPD (Harvard University Patent Dataset), which provides an extensive range of raw patent claim texts. This dataset was chosen over other prominent datasets such as BIGPATENT2 , CLEF-IP 2011, and USPTO-2M for several key reasons. Firstly, HUPD houses approximately 4.5 million patent documents, significantly more than BIGPATENT's 1.3 million [37]. This expansive volume offers a richer and more varied base for analysis. Secondly, HUPD encompasses 34 distinct sections, including claims, offering a more comprehensive view compared to BIGPATENT's limitation to just 4 sections. Lastly, unlike the pre-tokenized texts in BIGPATENT, HUPD's provision of raw texts affords greater flexibility and depth in our analysis, allowing for a more tailored approach to data processing [38].

To test our proposed method, we randomly selected approximately 10 documents from the 'G' category of the CPC code (Physics) within the HUPD dataset as representative samples for our study. We generated both baseline summaries and knowledge graph summaries for each patent document and used ROUGE scores to track the performance metrics. Additionally, we employed human evaluation to assess the real-world clarity of the generated summaries.

Evaluation

This section presents the results of our proposed method compared to the baseline summarization technique.The proposed method was evaluated using ROUGE scores to measure the quality of the generated summaries. The evaluation metrics included ROUGE-1, ROUGE-2, and ROUGE-L, which assess the overlap of n-grams and longest common subsequences between the generated and reference summaries. Moreover, we evaluated the performance with human evaluations to assess the quality and understandability of the summaries generated.

Quantitative Result

The ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metric, introduced by Chin-Yew Lin in their pivotal work, is a cornerstone in the field of natural language processing, particularly for evaluating automatic summarization and machine translation [39]. ROUGE assesses the quality of a summary by comparing it with reference summaries.

ROUGE scores are often reported in terms of precision, recall, and F1 score, providing a comprehensive evaluation framework:

• Precision measures the fraction of n-grams in the generated summary that are present in the reference summary. • Recall assesses the fraction of n-grams in the reference summary that appear in the generated summary. • F1 Score is the harmonic mean of precision and recall, offering a balanced metric.

The formulas for these metrics are as follows: Precision = Number of overlapping n-grams Total n-grams in the generated summary

Recall = Number of overlapping n-grams Total n-grams in the reference summary ( 2)

𝐹 1 = 2 × Precision × Recall Precision + Recall(3)

Table 1 shows the ROUGE F1 scores for both the baseline and the proposed method. The baseline method outperforms the proposed method across all ROUGE metrics, indicating a higher F1 score for summarization.

Human Evaluation Result

The human evaluation of our summarization method was a critical component of our research. We aimed to determine the effectiveness of our Knowledge Graph (KG) based summaries in terms of understandability and informativeness. To this end, we conducted an evaluation involving 12 respondents who reviewed summaries of 10 different patent documents. We selected 10 patent documents from the Harvard University Patent dataset [38] to ensure a diverse and representative sample. For each patent, three types of summaries were generated: the original summary, a baseline summary, and the KG-based summary. Each respondent was presented with these three summaries for each patent document without being informed of their origins to prevent any bias.

Voting Process

Respondents were asked to vote for the summary they found easiest to understand and most informative. The summaries were presented in a randomized order to eliminate order effects. The voting was conducted using a simple voting system where respondents chose one summary out of the three options. The criteria for voting were clearly defined for understandability: how easily the summary could be comprehended, and, informativeness: how well the summary conveyed the essential information of the patent document.

To ensure the integrity of the evaluation, respondents were chosen from a pool of individuals with no prior experience in reading or writing patent documents. This choice was made to simulate the perspective of an average reader who might not have specialized knowledge. The human-evaluation results are summarized in Table 2. The KG-based summaries received the highest number of votes in most cases, indicating a clear preference for their comprehensibility and informativeness over the other two types of summaries.

Discussion of Results

The results from the ROUGE F1 score evaluations indicate that the baseline method outperforms the proposed Knowledge Graph (KG)-based summarization technique in terms of automated metrics. ROUGE scores measure the overlap between the generated summaries and reference summaries, focusing on precision, recall, and F1 score.

However, the positive feedback from human evaluations suggests that the proposed KG-based method generates clearer, more coherent, and more informative summaries. This contrast highlights the importance of human evaluations in assessing summarization quality, as they capture aspects of readability and informativeness that automated metrics might miss.

Notably, for Patent Number "US14990535, " the baseline method failed to produce a summary, likely due to hallucination effects commonly observed in large language models (LLMs). In contrast, the KG-based method successfully generated a precise and informative summary from the original patent document, demonstrating its robustness.

It is important to note that our primary focus is on understandability and informativeness as perceived by human readers, rather than solely relying on automated metrics like ROUGE scores. This approach ensures that the summaries effectively convey the essence of the inventions described in the patents.

The integration of knowledge graphs and the adaptation of the RAG architecture have proven effective in enhancing the readability and informativeness of patent document summarization. These findings underscore the value of human-centered evaluations in developing summarization tools.

Limitations

Despite the promising results, our research faced several challenges and limitations. The primary limitation stems from the reliance on large language models (LLMs) LLaMA3 for both entity recognition and summary generation introduces potential vulnerabilities. If the LLM fails to accurately produce entities, the subsequent knowledge graph cannot be generated correctly. Any errors in the entity recognition process can lead to inaccuracies in the final summary. This dependency on LLMs introduces a potential source of error that can affect the overall performance of the summarization method.

Additionally, the LLM used for generating summaries, specifically LLaMA3, may produce sub-optimal results if the prompts are not clearly defined or easy to understand. This can result in summaries that do not accurately reflect the content of the original document. Another limitation is the language support; currently, our method is only capable of handling English documents, which restricts its applicability to patents written in other languages.

Moreover, our approach is designed for single-document summarization. This limits its effectiveness when dealing with multiple documents, which is a common scenario in patent analysis and other legal contexts. Addressing this limitation is a key area for future research.

Nevertheless, the human evaluation results suggest that despite the lower ROUGE scores, the summaries generated by the proposed method are more understandable and informative to readers unfamiliar with patent documents. This highlights the importance of human-centered evaluation metrics in assessing the effectiveness of summarization methods, as automated metrics like ROUGE may not fully capture the nuances of human understanding.

Future Work

To address the limitations identified, future work will focus on several key areas of improvement:

Enhanced Entity Recognition: We plan to develop a dedicated entity recognition model trained on annotated patent data. By creating a custom model tailored to the specificities of patent documents, we aim to improve the accuracy of entity recognition and, consequently, the quality of the knowledge graphs and summaries.

Improved LLM Training: We will train the LLM, such as LLaMA3, on a specialized legal database to enhance its performance in summarization tasks. This targeted training will help mitigate issues arising from unclear prompts and improve the model's ability to generate accurate and informative summaries.

Multilingual Support: Expanding our model to support multiple languages is a priority. This involves training the entity recognition and summarization models on multilingual datasets to ensure accurate performance across different languages.

Multi-Document Summarization: Developing our method to handle multi-document summarization will significantly increase its applicability. We plan to extend the architecture to aggregate and summarize information from multiple documents, which is particularly important in the legal and patent analysis fields.

By addressing these areas, we aim to refine our summarization approach, making it more robust, versatile, and applicable to a wider range of documents and languages. This will enhance its utility in practical, real-world scenarios, particularly in legal and technical domains.

Conclusion

In this paper, we introduced a novel summarization approach that integrates knowledge graphs and large language models (LLMs) to enhance the readability and informativeness of patent document summaries. Our main contribution lies in generating summaries that are easier to understand while maintaining the key inventions from the patent documents. Additionally, our research contributes towards advancing the legal technology by analyzing entities in the patent domain and producing structured data through knowledge graphs combined with the contextual understanding of LLMs for automatic summarization tasks. We emphasize the importance of human-centered evaluations and contextual understandability over automated evaluations when producing summaries. Despite the limitations, our research lays a foundation for future advancements in patent summarization technology, aiming to develop a more robust tool for analyzing complex technical documents in legal domain.

Figure 1 :1Figure 1: The proposed system architecture for producing KG Summary

Figure 2 :2Figure 2: Knowledge Graph of Patent Number US14996316 after constructing in the Neo4J Database

Table 11ROUGE F1 Scores Comparison of Patent Number US14996316Method ROUGE-1 ROUGE-2 ROUGE-LBaseline0.4615380.2352940.461538KG Sum0.3434340.1360540.323232

Table 22The voting result of Human EvaluationPatentKGSUM Original BaselineSUMResultUS14997116903KGSUMUS14986817507BaselineSUMUS14990669810KGSUMUS14998490214BaselineSUMUS149905351110KGSUMUS14989021723KGSUMUS15000497516BaselineSUMUS14994324624KGSUMUS15001450633KGSUMUS14996316822KGSUM

pre-trained llama3 model is available to download online at: https://ollama.com/library/llama3 BIGPATENT dataset is available to download online at: https://evasharma.github.io/bigpatent/

Acknowledgments

First and foremost, we would like to express our gratitude to the creators of the Harvard University Patent Dataset (HUPD) for providing the legal dataset as publicly available, which was instrumental in our study. We also appreciate the developers of the Llama 3 model, which we utilized for testing the knowledge graph summarization on the legal dataset.

Lastly, a big token of appreciations to the testimonies who participated in the human evaluation of the summaries. Their insights and feedback were invaluable in assessing the quality and readability of our summarization methods.

A. Summaries for Evaluation

In this appendix section, the summaries titled "Original Summary, KG Summary (our approach), and Baseline Summary" are presented. These summaries are evaluated against each other to the testimonies to evaluate which summary is easier to understand by voting. Below are the summaries evaluated for Patent No: US14997116.

Original Summary

Systems, methods, and non-transitory computer readable media are configured to receive a resume corpus. A machine learning model is trained based on terms from the resume corpus. A job title for a user is determined based on profile information provided to the model.

KG Summary

A machine learning model was trained to analyze profile information from job applicants, which was provided to the model via a computer system. The computer system received resumes as input data and used this information to train the model. This training process enables the model to learn patterns and correlations between resume data and job title. The organization recruited for by a specific job title, leveraging the trained machine learning model to improve its hiring processes.

Baseline Summary

The invention relates to a system and method for determining a job title for a user based on their profile information. A machine learning model is trained using terms from a resume corpus, and then the user's profile information is provided to the model to determine their most likely job title. The model uses techniques such as vector space creation to represent terms in the resume corpus, and anchor points are used to identify the closest match for each term.

We have presented the other patent summaries, that we used in evaluation and generated from our approach, in the online resource. As our approach uses prompts in order to generate the nodes and entities from the original documents, we have fully provided the prompts we used as well as the feature processing functions before we proceed to the graph construction.

B. Online Resources

The code for this paper is available at :

• GitHub

Automatic patent document summarization for collaborative knowledge systems and services AJTrappey CVTrappey C.-YWu 10.1007/s11518-009-5100-7 Journal of Systems Science and Systems Engineering 18 2009 Pqps: Prior-art query-based patent summarizer using rbm and bi-lstm GKumaravel SSankaranarayanan 10.1155/2021/2497770 Mobile Information Systems 2497770 2021. 2021 Patent document summarization using conceptual graphs PRk SLalithadevi 10.2139/ssrn.3441645 SSRN Electronic Journal 2017 Patent surrogate extraction and evaluation in the context of patent mapping Y.-HTseng Y.-WWang Y.-ILin C.-HLin D.-RJuang 10.1177/0165551507077406 Journal of Information Science 33 2007 Summatriz : Summarization networks for mining patent contradiction GGuarino ASamet ANafi DCavallucci 10.1109/ICMLA51294.2020.00159 19th IEEE International Conference on Machine Learning and Applications (ICMLA) 2020. 2020. 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA) An r&d knowledge management method for patent document summarization AJTrappey CVTrappey 10.1108/02635570810847608 Industrial Management & Data Systems 108 2008 Extracting characteristic terms from patent documents KTakano MTanaka HSakai RKitajima TOta CTanabe HSakaji 10.52731/ijscai.v4.i2.568 International Journal of Smart Computing and Artificial Intelligence 4 2020 A multilingual patent text-mining approach for computing relatedness evaluation of patent documents C.-HLee H.-CYang C.-HWu Y.-JLi 10.1109/IIH-MSP.2009.162 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing 2009. 2009 Leveraging graph to improve abstractive multidocument summarization WLi XXiao JLiu HWu HWang JDu 10.18653/v1/2020.acl-main.555 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics DJurafsky JChai NSchluter JTetreault the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics 2020 Cross-language patent matching via an international patent classification-based concept bridge Y.-LChen Y.-TChiu Journal of Information Science 39 737 2013 <idno type="DOI">10.1177/0165551513494641</idno> <idno>arXiv:</idno> <ptr target="https://doi.org/10.1177/0165551513494641" /> <imprint/> </monogr> </biblStruct> <biblStruct xml:id="b11"> <analytic> <title level="a" type="main">Using genre-specific features for patent summaries JCodina-Filbà NBouayad-Agha ABurga GCasamayor SMille AMüller HSaggion LWanner 10.1016/j.ipm.2016.07.002 Information Processing and Management 53 2017 Constructing patent maps using text mining to sustainably detect potential technological opportunities H.-FWang Y.-HChang P.-YHsin 10.3390/su10103729 Sustainability 10 3729 2018 Graph based representation and analysis of text document: A survey of techniques SSonawane PKulkarni 10.5120/16899-6972 International Journal of Computer Applications 96 2014 Extractive summarization using semigraph (essg) SSonawane PKulkarni CDeshpande BAthawale 10.1007/s12530-018-9246-8 Evolving Systems 10 2019 Extraction of the effect and the technology terms from a patent document HNonaka AKobayashi HSakaji YSuzuki HSakai SMasuyama 10.1109/iccie.2010.5668167 Proceedings of the 2010 International Conference on Computational Intelligence and Education (ICCIE) the 2010 International Conference on Computational Intelligence and Education (ICCIE) 2010 Analyzing patent topical information to identify technology pathways and potential opportunities JMa ALPorter 10.1007/s11192-014-1392-6 Scientometrics 102 2014 Chain-of-event prompting for multi-document summarization by large language models SBao 10.1108/ijwis-12-2023-0249 International Journal of Web Information Systems 20 2024 CWilliams 10.1101/2024.04.03.24305088 doi:10.1101/2024.04.03. 24305088 Evaluating large language models for drafting emergency department discharge summaries 2024 SONeil 10.1101/2024.01.31.578275 doi:10.1101/2024.01.31.578275 Phenomics assistant: an interface for llm-based biomedical knowledge graph exploration 2024 Knowledge graphs AHogan EBlomqvist MCochez C GDMelo CGutierrez SKirrane JE LGayo RNavigli SNeumaier A.-CNNgomo APolleres SMRashid ARula LSchmelzeisen JSequeda SStaab AZimmermann 10.1145/3447772 ACM Computing Surveys 54 2021 Text summarization with pretrained encoders YLiu MLapata 10.18653/v1/D19-1387 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics KInui JJiang VNg XWan the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics

Hong Kong, China

2019 K-bert: enabling language representation with knowledge graph WLiu PZhou ZZhao ZWang QJu HDeng PWang 10.1609/aaai.v34i03.5681 Proceedings of the AAAI Conference on Artificial Intelligence the AAAI Conference on Artificial Intelligence 2020 34 An empirical study of pre-trained language models in simple knowledge graph question answering NHu YWu GQi DMin JChen JPan AZafar 10.21203/rs.3.rs-2184834/v1 2022 A survey of automatic text summarization systems, applications, challenges, and future trends MUchkempirov PShoon LRLei PAadhya Kulkarni 10.5120/ijca2024923672 International Journal of Computer Applications 186 2024 Incorporating external knowledge into unsupervised graph model for document summarization TTang TYuan XTang DChen 10.3390/electronics9091520 Electronics 9 2020 Knowledge graph embedding: a survey of approaches and applications QWang ZMao BWang LGuo 10.1109/tkde.2017.2754499 IEEE Transactions on Knowledge and Data Engineering 29 2017 Knowledge graph: a survey AAkinnubi 10.31219/osf.io/8xm2g 2023 Embedding knowledge graph of patent metadata to measure knowledge proximity GLi LSiddharth JLuo 10.1002/asi.24736 Journal of the Association for Information Science and Technology 74 2023 Engineering knowledge graph from patent database LSiddharth LBlessing KWood JLuo 10.1115/1.4052293 doi: Journal of Computing and Information Science in Engineering 22 2021 Ip analytics and machine learning applied to create process visualization graphs for chemical utility patents AJ CTrappey CVTrappey C.-WLiang H.-SLin 10.3390/pr9081342 Processes 9 1342 2021 Leveraging online behaviors for interpretable knowledgeaware patent recommendation WDu QYan WZhang JMa 10.1108/intr-08-2020-0473 Internet Research 32 2021 Knowledge graph-augmented abstractive summarization with semanticdriven cloze reward LHuang LWu LWang 10.18653/v1/2020.acl-main.457 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics DJurafsky JChai NSchluter JTetreault the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics 2020 Lauren -knowledge graph summarization for question answering RJalota DVollmers DMoussallem A.-CNNgomo 10.1109/ICSC50631.2021.00047 IEEE 15th International Conference on Semantic Computing (ICSC) 2021. 2021 Mining summaries for knowledge graph search QSong YWu PLin LXDong HSun 10.1109/TKDE.2018.2807442 IEEE Transactions on Knowledge and Data Engineering 30 2018 Adaptive personalized knowledge graph summarization TDe Safavi 2018 Gates: Using graph attention networks for entity summarization AFFirmansyah DMoussallem A.-CNNgomo 10.1145/3460210.3493574 doi:10.1145/3460210.3493574 Proceedings of the 11th Knowledge Capture Conference, K-CAP '21 the 11th Knowledge Capture Conference, K-CAP '21

New York, NY, USA

Association for Computing Machinery 2021 Bigpatent: A large-scale dataset for abstractive and coherent summarization ACSharma CLi JWang C.-YHuang ZHu 10.18653/v1/P19-1213 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics the 57th Annual Meeting of the Association for Computational Linguistics 2019 MSuzgun LMelas-Kyriazi SKSarkar SDKominers SMShieber The harvard uspto patent dataset: A large-scale, well-structured, and multi-purpose corpus of patent applications 2022 ROUGE: A package for automatic evaluation of summaries C.-YLin Text Summarization Branches Out, Association for Computational Linguistics

Barcelona, Spain

2004