=Paper=
{{Paper
|id=Vol-3604/paper4
|storemode=property
|title=Automatic Generation of Explanatory Text from Flowchart Images in Patents
|pdfUrl=https://ceur-ws.org/Vol-3604/paper4.pdf
|volume=Vol-3604
|authors=Hidetsugu Nanba,Shohei Kubo,Satoshi Fukuda
|dblpUrl=https://dblp.org/rec/conf/patentsemtech/NanbaKF23
}}
==Automatic Generation of Explanatory Text from Flowchart Images in Patents==
Automatic Generation of Explanatory Text from Flowchart Images in Patents Hidetsugu Nanba1, Shohei Kubo1 and Satoshi Fukuda1 1 Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo 112-8551 JAPAN Abstract This paper addresses the automatic generation of explanatory text from flowchart images in patents. The construction of an explanatory text generator consists of four steps: (1) automatic recognition of flowchart images from patent images, (2) extraction of text strings from flowchart images, (3) creation of data for machine learning, and (4) construction of an explanatory text generator using T5. In this study, a benchmark consisting of 7,099 images was constructed to determine whether an image in a patent is a flowchart. Furthermore, an explanatory text generator was constructed from the images using 11,188 flowchart image-explanatory text pairs. The experimental results showed that a recognition accuracy of 0.9645 was achieved for flowchart images. Although high-quality explanatory text could be generated from flowchart images, some issues remain for flowcharts with complex shapes. Keywords Flowchart, Image recognition, Text generation, Character recognition, Patent 1. Introduction 2. Related Work A procedural text is a description of a set of procedures to achieve a particular objective. Our goal is to automatically extract knowledge about a series of 2.1. Flowchart Analysis procedures in a wide range of fields from texts and systematize them. Here, we describe the automatic Services that share flowcharts, such as generation of explanatory text from flowchart images myExperiment and SHIWA, have started recently, in patents. which has led to a demand for techniques to search for In automatically generating explanatory text for similarities between one flowchart and another the flowchart images, we focus on the abstract and flowchart [1]. A related research project in flowchart selected figures of the patent. A selection figure image analysis is CLEF-IP, which refers to a task enables us to grasp the outline of the invention quickly targeting patents [2]. The Conference and Labs of the and accurately. The applicant usually selects a diagram Evaluation Forum (CLEF) is a workshop on from among the diagrams in the patent that they information retrieval held mainly in Europe. CLEF-IP consider necessary for understanding the abstract recognizes shapes, detects text, edges, and nodes that contents. If a classifier that automatically determines are elements of flowcharts, and recognizes flowcharts. whether an image in a patent is a flowchart or not is Herrera-Cámara also worked to recognize flowchart constructed and only those selected diagrams that are images [3]. In addition, Sethi et al. identified flowcharts flowcharts are extracted, a large number of pairs of from diagram images in deep learning-related papers flowcharts and their explanatory texts (i.e., patent and further analyzed the flowcharts to build a system abstracts) can be generated automatically. that outputs the sources in Keras and Caffe [4]. This Furthermore, using these pairs, we believe it is research differs from theirs in that we take a flowchart possible to construct a system that automatically image as input and output its description as a natural generates explanatory text from flowchart images language sentence. We considered the availability of using machine learning. resources such as the CLEF-IP for our work, but as it is The contributions of this paper are as follows: too small to be used as training data for the generation of explanatory texts, this study started with the l To determine whether an image in a patent is a creation of training data. configuration diagram, flowchart, or table, we constructed a benchmark consisting of 7,099 images. We used this benchmark to achieve a 2.2. Generating Text from Figures classification accuracy of 0.9645. l We constructed 11,188 pairs of flowchart images and Tables and their descriptions automatically. Chart to text refers to the task of generating natural l Using these pairs, we constructed a system that language sentences to describe the important automatically generates explanatory text from information derived from charts and tables. Zhu et al. flowchart images through machine learning. [5] addressed this problem by building a system, PatentSemTech'23: 4th Workshop on Patent Text Mining and Semantic Technologies, July 27th, 2023, Taipei, Taiwan. CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 22 AutoChart. A human and machine evaluation of the [original] generated text and charts demonstrates that the 半導体装置\n の製造方法\n 半導体層の上にフォ generated text is informative, coherent, and relevant to トレジストを塗布\n 第 1 波長の紫外線を用いて the corresponding charts [6]. \n フォトレジストを露光した後に\n フォトレジ Tan and colleagues [7] generated sentences from ストを現像することによって、\n 開口部の内側 pie charts, bar graphs, and line graphs in scientific papers, while Kantharaj and colleagues [8] generated へと迫り出した\n ネガ型レジストパターンを形 sentences from charts using generators such as T5 [9], 成\n 第 1 波長より短い第 2 波長の紫外線を\n ネ BART, and GPT2 based on bar and line graphs mainly ガ型レジストパターンに照射することによって、 describing economic, market, and social issues. \n ネガ型レジストパターンを硬化\n(照射工程)\n Instead of graphs, this study uses flowchart images as ネガ型レジストパターンが形成された\n 半導体 inputs and the goal is to automatically generate 層の上に金属膜を形成\n ネガ型レジストパター explanatory text from these flowcharts. ン を 半 導 体 層 か ら 除 去 \n 完 成 \nP110\nP120\nP130\nP140\nP150 [translation] 3. Automatic Generation of Semiconductor devices \n Manufacturing method Explanatory Text from for \n photoresist is applied over the semiconductor layer \n using ultraviolet light of the first Flowchart Images wavelength \n After exposing the photoresist \n by developing the photoresist, \n The photoresist is The construction of the generator of explanatory text developed to form a negative resist pattern that consists of the following four steps: (Step 1) automatic extends into the aperture. \n Negative resist pattern recognition of flowchart images; (Step 2) extraction of is formed. \n By exposing the negative resist pattern character strings from the flowchart image; (Step 3) to ultraviolet rays of the second wavelength, which creation of data for machine learning; and (Step 4) is shorter than the first wavelength \n by irradiating construction of an explanatory text generator using T5. the negative resist pattern, \n curing the negative Each procedure is described as follows. resist pattern by irradiating it with ultraviolet light of the second wavelength, which is shorter than the (Step 1) Automatic recognition of flowchart images first wavelength. \n (Irradiation process) \n The Convolutional neural networks (CNNs) are used to negative resist pattern is formed \n Metal film is recognize flowchart images in patents. Our method formed on top of the semiconductor layer \n uses seven CNN models trained on a large image data Negative resist pattern is removed from the set called “ImageNet” to construct a learning model by semiconductor layer \n Completion \n P110 \n fine tuning, and its effectiveness is verified through the P120 \n P130 \n P140 \n P150 experiments described in Section 4. Figure 2: Character Recognition Results for the Image in Figure 1. (Step 2) Extraction of character strings from the flowchart image (Step 3) Creation of data for machine learning An optical character recognition function in Google We build an explanatory text generator by machine Cloud Vision (https://cloud.google.com/vision) is learning, using pairs of character recognition results used to extract text strings from flowcharts. An and explanatory text from a large number of flowchart example of a flowchart image and the character images. In this process, we consider that data with recognition result are shown in Figures 1 and 2 large differences between the character recognition respectively. Here, “\n” indicates a line break. results and the manually written explanatory texts (patent abstracts) are inappropriate as training data1; therefore, we exclude these data. In this process, we calculate the similarity between the character recognition result and the explanatory text of the flowchart image using Gestalt pattern matching [10] and use only the pairs that are above a threshold value for training. (Step 4) Construction of an explanatory text generator using T5 We build an explanatory text generator by the language model T5. With respect to the flowchart image in Figure 1, the input and output of T5 are Figure 2 for input and Figure 3 for output. Figure 1: Example of Flowchart Image Included in a Patent 1 Figures 2 and 3 show examples of a character recognition result case, the similarity between them is so high that we use them as and a manually written explanatory text (patent abstract). In this machine learning data. 23 [original] Evaluation 半導体装置の製造方法は、半導体層の上にフォト The seven methods and the baseline method were レジストを塗布する工程と;第1波長の紫外線を evaluated using Precision, Recall, and F-measure. 用いてフォトレジストを露光した後にフォトレジ Results ストを現像することによって、開口部の内側へと The experimental results are shown in Table 1. 迫り出したネガ型レジストパターンを、形成する Among the compared methods, DenseNet121 was the 工程と;第1波長より短い第2波長の紫外線をネ most accurate in detecting flowcharts in terms of ガ型レジストパターンに照射することによって、 Precision. The results from DenseNet121 were used in ネガ型レジストパターンを硬化させる照射工程 the subsequent experiments. と;照射工程を行った後、ネガ型レジストパター ンの開口部から露出する半導体層の上に、ニッケ Table 1 ル(Ni)から主に成る金属膜を形成する工程 Flowchart Recognition Results with Eight Models と;ネガ型レジストパターンを半導体層から除去 Precision Recall F- する工程とを備える。 measure [translation] Baseline 0.8508 0.8902 0.8701 The method of manufacturing a semiconductor VGG16 0.8750 0.9711 0.9205 device comprises the steps of: applying a VGG19 0.9227 0.9653 0.9435 photoresist onto a semiconductor layer; forming a ResNet50 0.8698 0.9653 0.9151 negative resist pattern, which is pressed inwards InceptionV3 0.9422 0.9422 0.9422 into an aperture, by developing the photoresist after MobileNet 0.9326 0.9595 0.9459 exposing the photoresist using a first wavelength of DenseNet169 0.9593 0.9538 0.9565 ultraviolet light; forming a negative resist pattern by DenseNet121 0.9645 0.9422 0.9532 irradiating the negative resist pattern with ultraviolet light of a second wavelength that is shorter than the first wavelength; and The negative 4.2. Automatic Generation of resist pattern is hardened by irradiating the negative resist pattern with ultraviolet light of a Explanatory Text from Flowchart second wavelength shorter than the first Images wavelength; forming a metal film mainly comprising nickel (Ni) on the semiconductor layer exposed from Data the opening of the negative resist pattern after Among the Japanese patents published from 2010 performing the irradiation process; and removing to 2019, 11,188 patents that included flowcharts and the negative resist pattern from the semiconductor with a similarity of 0.1 by Gestalt pattern matching layer. The process of removing the negative resist were used in our experiments. Of these patents, 90% pattern from the semiconductor layer. were categorized as training data and the remainder as Figure 3: Explanatory Text (Patent Abstract) validation and evaluation data. Correspongind to the Image in Figure 1 Hyperparameters The following hyperparameters were used in the 4. Experiments generation of explanatory texts by T5. l Max input length: 280 We performed some experiments to confirm the l Max target length: 256 effectiveness of our method. l Train batch size: 8 l Eval batch size: 8 l Num train epochs: 6 4.1. Automatic Recognition of Flowchart Images Evaluation Our method was evaluated using the following Data measures: Using 7,099 randomly selected images from the l ROUGE-N: This is the most basic index and is a 2018 edition of the Japanese Patent Public Gazette, we method of taking the degree of agreement in N- manually identified whether they were flowcharts or gram units. In this case, N = 1, 2 were used for not and obtained 1,120 flowcharts from the 7,099 evaluation (https://github.com/pltrdy/rouge). cases. l ROUGE-L: Evaluates the maximum sequence that matches the generated summary and the Alternative methods manually generated summary. As a baseline method, we used Keras, a deep l BERTScore [11]: An automatic evaluation metric learning library, to build CNN training models with using the language model BERT [12], which three layers for Conv2D and two layers for calculates the similarity between texts using MaxPooling2D. As a comparison method, we used vector representations obtained from pretrained seven CNN models: VGG16, VGG19, ResNet50, BERT. InceptionV3, MobileNet, DenseNet169, and DenseNet121 trained on a large image data set called ImageNet. 24 Results [original] The results of Recall, Precision, and F-measure for ガス供給箇所へ供給されるガスの流量を測定する ROUGE-1, ROUGE-2, ROUGE-L, and BERTScore. ガスメータの一次側へ連通接続する連通空間を有 すると共に当該連通空間の湿度を測定する湿度測 Table 2 定部を有する水差推定治具を取り付ける取付工程 Evaluation Results for the Generation of Explanatory と、湿度測定部にて連通空間の湿度を測定する湿 Texts 度測定工程と、湿度測定工程にて測定される湿度 Recall Precision F- に基づいて、ガス管に水差しが発生しているか否 measure かを推定する水差推定工程とを実行する。 ROUGE-1 0.47 0.72 0.55 [translation] ROUGE-2 0.26 0.46 0.32 The following processes are performed: An ROUGE-L 0.41 0.64 0.49 installation process in which a water-difference BERTScore 0.74 0.77 0.75 estimation jig is attached to a gas meter with a connecting space that is connected to the primary Discussion side of the gas meter that measures the flow rate of For simple geometries with no branches in the gas supplied to the gas supply point and a humidity flowchart (see Figure 4), we obtained good analytical measuring section that measures the humidity in the results. Figures 5 and 6 show the explanatory text and connecting space; a humidity measurement process the patent summary (correct answer) generated by in which the humidity in the connecting space is our method, respectively. measured by the humidity measuring section; and a water-difference estimation process in which whether a water drop occurs in a gas pipe is estimated based on the humidity measured in the humidity measurement process. Figure 6: Patent Summary for the Image in Figure 1 (Correct Answer) Flowcharts with complex shapes, such as the one shown in Figure 7, tended to generate low-quality explanatory text. The dash line boxes in the figure were added by the author for the purpose of explanation. The description generated by the process Figure 4: Example of Target Image for Generation in Figure 7 is shown in Figure 8. [original] 測定治具を取り付ける取付工程と、複数のガス供 給箇所にて取付工程及び湿度測定工程を実行する 湿度測定工程と、複数のガス供給箇所での湿度測 定工程での測定結果に基づいて、水差発生箇所を 推定する水差推定工程と、を含む。 [translation] The process includes a mounting process to install the measurement jig, which is a humidity measurement process to perform the mounting and humidity measurement processes at multiple gas supply locations, and a water-difference estimation process to estimate the location of water-difference occurrence based on the measurement results of the humidity measurement process at multiple gas supply locations. Figure 5: Exploratory Text Automatically Generated from the Image in Figure 1 Figure 7: Example of a Flowchart with Conditional Branching [original] References 潜伏モードへ移行し(s2110)、信頼度情報を取得 し(s2110)、変倍率を決定し(s2110)、経過時間の [1] J. Starlinger, B. Brancotte, S. Cohen-Boulakia, and 計測を開始する(s2112)。そして、画像表示部が S. Leser, Similarity Search for Scientific 所定時間経過しているか否かを判定し(s2112)、 Workflows, Proceedings of the VLDB 所定時間が経過すると(s2112 にて yes)、潜伏報 Endowment, Vol. 7, No. 12, pp.1143-1154, 2014. 知モードを終了する(s2112)。潜伏報知モードを [2] F. Piroi, M. Lupu, and A. Hanbury, Overview of 終了すると(s2112 にて yes)、潜伏報知モードを CLEF-IP 2013 Lab Information Retrieval in the 終了する。 Patent Domain, Information Access Evaluation. [translation] Multilinguality, Multimodality, and Visualization. The system moves to the latent mode (s2110), CLEF 2013. Lecture Notes in Computer Science, obtains the reliability information (s2110), Vol. 8138. Springer, Berlin, Heidelberg, 2013. determines the variable magnification factor [3] J. I. Herrera-Cámara, FLOW2CODE - From Hand- (s2110), and starts measuring the elapsed time drawn Flowchart to Code Execution, Master (s2112). The image display then determines Thesis, Texas A&M University, 2017. whether or not the predetermined time has elapsed [4] A. Sethi, A. Sankaran, N. Panwar, S. Khare, and S. (s2112). When the predetermined time elapses (yes Mani, DLPaper2Code: Auto-generation of Code at s2112), the latent report mode is terminated from Deep Learning Research Papers, (s2112). When the latent report mode is terminated Proceedings of the 32th AAAI Conference on (yes at s2112), the latent report mode is terminated. Artificial Intelligence, 2018. [5] J. Zhu, J. Ran, R. K. Lee, Z. Li, and K. Choo, Figure 8: Exploratory Text Automatically Generated AutoChart: A Dataset for Chart-to-Text from the Image in Figure 7 Generation Task, Proceedings of the International Conference on Recent Advances in Looking at Figure 8, overall, step IDs such as s2110 Natural Language Processing, pp. 1636-1644, do not correspond to the explanatory text, but this is 2021. because this time the coordinates of each string in the [6] J. Obeid and E. Hoque, Chart-to-Text: Generating figure are not considered at all. The first conditional Natural Language Descriptions for Charts by branch is “When the predetermined time elapses (yes Adapting the Transformer Model, Proceedings of at s2112), the latent report mode is terminated.” The the 13th International Conference on Natural correct sentence is generated except for the step ID Language Generation, pp. 138-147, 2020. (s2112) (see Figure 8). However, the dashed box in [7] H. Tan, C. Tsai, Y. He, M. and Bansal, Scientific Figure 5 is not included in the explanatory text. Chart Summarization: Datasets and Improved Currently, the character strings output by Google Text Modeling, Proceedings of the AAAI-22 Cloud Vision’s character recognition results are used Workshop on Scientific Document as input to T5 as they are, but in the future, it will be Understanding, 2022. necessary to perform preprocessing such as [8] K. Kantharaj, R. T. Leong, X. Lin, A. Masry, M. considering the coordinate information of the Thakkar, E. Hoque, and S. Joty, Chart-to-Text: A character strings and reordering them appropriately. Large-Scale Benchmark for Chart Summarization, Proceedings of the 60th Annual Meeting of the 5. Conclusions Association for Computational Linguistics, pp. 4005-4023, 2022. [9] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, In this study, 11,188 flowchart image-description M. Matena, Y. Zhou, W. Li, and P. J. Liu, Exploring pairs were obtained from patents and these data were the Limits of Transfer Learning with a Unified used to construct a system that automatically Text-to-Text Transformer, Journal of Machine generates descriptions of flowchart images using T5. Learning Research, Vol. 20, No. 140, pp. 1-67, The experimental results showed that for the detection 2020. of flowchart images, an accuracy of 0.9645 was [10] J. W. Ratcliff and D. Metzener, Pattern Matching: achieved with a fine-tuned model using DenseNet121. The Gestalt Approach, Dr. Dobb’s Journal, pp. 46, In the generation of explanatory text from flowchart 1988. images, it was found that high-quality explanatory text [11] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and could be generated, although some issues remain for Y, Artzi, BERTScore: Evaluating Text Generation flowcharts with complex shapes. In the future, we will with BERT, arXiv:1904.09675 [cs.CL], 2019. examine the possibility of generating appropriate [12] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, explanatory text for flowcharts with complex shapes, BERT: Pre-training of Deep Bidirectional such as those containing multiple conditional branches, Transformers for Language Understanding, by considering the positional information of each Proceedings of the 2019 Conference of the North character string in the image, rather than using the American Chapter of the Association for character strings in the flowchart as is. Computational Linguistics: Human Language Technologies, p. 4171-4186, 2019. Acknowledgment This work was supported by JSPS KAKENHI Grant Numbers JP22K12154 and JP20H04210. 26