=Paper=
{{Paper
|id=Vol-3604/paper4
|storemode=property
|title=Automatic Generation of Explanatory Text from Flowchart Images in Patents
|pdfUrl=https://ceur-ws.org/Vol-3604/paper4.pdf
|volume=Vol-3604
|authors=Hidetsugu Nanba,Shohei Kubo,Satoshi Fukuda
|dblpUrl=https://dblp.org/rec/conf/patentsemtech/NanbaKF23
}}
==Automatic Generation of Explanatory Text from Flowchart Images in Patents==
<pdf width="1500px">https://ceur-ws.org/Vol-3604/paper4.pdf</pdf>
<pre>
                         Automatic Generation of Explanatory Text from
                         Flowchart Images in Patents
                         Hidetsugu Nanba1, Shohei Kubo1 and Satoshi Fukuda1
                         1 Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo 112-8551 JAPAN


                                               Abstract

                                               This paper addresses the automatic generation of explanatory text from flowchart images in patents.
                                               The construction of an explanatory text generator consists of four steps: (1) automatic recognition of
                                               flowchart images from patent images, (2) extraction of text strings from flowchart images, (3) creation
                                               of data for machine learning, and (4) construction of an explanatory text generator using T5. In this
                                               study, a benchmark consisting of 7,099 images was constructed to determine whether an image in a
                                               patent is a flowchart. Furthermore, an explanatory text generator was constructed from the images
                                               using 11,188 flowchart image-explanatory text pairs. The experimental results showed that a
                                               recognition accuracy of 0.9645 was achieved for flowchart images. Although high-quality explanatory
                                               text could be generated from flowchart images, some issues remain for flowcharts with complex
                                               shapes.

                                               Keywords
                                               Flowchart, Image recognition, Text generation, Character recognition, Patent


                         1. Introduction
                                                                                                2. Related Work
                         A procedural text is a description of a set of procedures
                         to achieve a particular objective. Our goal is to
                         automatically extract knowledge about a series of                      2.1.      Flowchart Analysis
                         procedures in a wide range of fields from texts and
                         systematize them. Here, we describe the automatic                          Services that share flowcharts, such as
                         generation of explanatory text from flowchart images                   myExperiment and SHIWA, have started recently,
                         in patents.                                                            which has led to a demand for techniques to search for
                             In automatically generating explanatory text for                   similarities between one flowchart and another
                         the flowchart images, we focus on the abstract and                     flowchart [1]. A related research project in flowchart
                         selected figures of the patent. A selection figure                     image analysis is CLEF-IP, which refers to a task
                         enables us to grasp the outline of the invention quickly               targeting patents [2]. The Conference and Labs of the
                         and accurately. The applicant usually selects a diagram                Evaluation Forum (CLEF) is a workshop on
                         from among the diagrams in the patent that they                        information retrieval held mainly in Europe. CLEF-IP
                         consider necessary for understanding the abstract                      recognizes shapes, detects text, edges, and nodes that
                         contents. If a classifier that automatically determines                are elements of flowcharts, and recognizes flowcharts.
                         whether an image in a patent is a flowchart or not is                  Herrera-Cámara also worked to recognize flowchart
                         constructed and only those selected diagrams that are                  images [3]. In addition, Sethi et al. identified flowcharts
                         flowcharts are extracted, a large number of pairs of                   from diagram images in deep learning-related papers
                         flowcharts and their explanatory texts (i.e., patent                   and further analyzed the flowcharts to build a system
                         abstracts) can be generated automatically.                             that outputs the sources in Keras and Caffe [4]. This
                         Furthermore, using these pairs, we believe it is                       research differs from theirs in that we take a flowchart
                         possible to construct a system that automatically                      image as input and output its description as a natural
                         generates explanatory text from flowchart images                       language sentence. We considered the availability of
                         using machine learning.                                                resources such as the CLEF-IP for our work, but as it is
                             The contributions of this paper are as follows:                    too small to be used as training data for the generation
                                                                                                of explanatory texts, this study started with the
                         l     To determine whether an image in a patent is a
                                                                                                creation of training data.
                               configuration diagram, flowchart, or table, we
                               constructed a benchmark consisting of 7,099
                               images. We used this benchmark to achieve a                      2.2. Generating Text from Figures
                               classification accuracy of 0.9645.
                         l     We constructed 11,188 pairs of flowchart images
                                                                                                and Tables
                               and their descriptions automatically.
                                                                                                    Chart to text refers to the task of generating natural
                         l     Using these pairs, we constructed a system that
                                                                                                language sentences to describe the important
                               automatically generates explanatory text from
                                                                                                information derived from charts and tables. Zhu et al.
                               flowchart images through machine learning.
                                                                                                [5] addressed this problem by building a system,
                                PatentSemTech'23: 4th Workshop on Patent Text Mining and
                                Semantic Technologies, July 27th, 2023, Taipei, Taiwan.
CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
                                                                                           22
AutoChart. A human and machine evaluation of the                           [original]
generated text and charts demonstrates that the                            半導体装置\n の製造方法\n 半導体層の上にフォ
generated text is informative, coherent, and relevant to                   トレジストを塗布\n 第 1 波長の紫外線を用いて
the corresponding charts [6].                                              \n フォトレジストを露光した後に\n フォトレジ
    Tan and colleagues [7] generated sentences from
                                                                           ストを現像することによって、\n 開口部の内側
pie charts, bar graphs, and line graphs in scientific
papers, while Kantharaj and colleagues [8] generated                       へと迫り出した\n ネガ型レジストパターンを形
sentences from charts using generators such as T5 [9],                     成\n 第 1 波長より短い第 2 波長の紫外線を\n ネ
BART, and GPT2 based on bar and line graphs mainly                         ガ型レジストパターンに照射することによって、
describing economic, market, and social issues.                            \n ネガ型レジストパターンを硬化\n(照射工程)\n
Instead of graphs, this study uses flowchart images as                     ネガ型レジストパターンが形成された\n 半導体
inputs and the goal is to automatically generate                           層の上に金属膜を形成\n ネガ型レジストパター
explanatory text from these flowcharts.                                    ン を 半 導 体 層 か ら 除 去 \n 完 成
                                                                           \nP110\nP120\nP130\nP140\nP150
                                                                           [translation]
3. Automatic Generation of                                                 Semiconductor devices \n Manufacturing method
   Explanatory Text from                                                   for \n photoresist is applied over the semiconductor
                                                                           layer \n using ultraviolet light of the first
   Flowchart Images                                                        wavelength \n After exposing the photoresist \n by
                                                                           developing the photoresist, \n The photoresist is
The construction of the generator of explanatory text                      developed to form a negative resist pattern that
consists of the following four steps: (Step 1) automatic                   extends into the aperture. \n Negative resist pattern
recognition of flowchart images; (Step 2) extraction of                    is formed. \n By exposing the negative resist pattern
character strings from the flowchart image; (Step 3)                       to ultraviolet rays of the second wavelength, which
creation of data for machine learning; and (Step 4)                        is shorter than the first wavelength \n by irradiating
construction of an explanatory text generator using T5.                    the negative resist pattern, \n curing the negative
Each procedure is described as follows.                                    resist pattern by irradiating it with ultraviolet light
                                                                           of the second wavelength, which is shorter than the
(Step 1) Automatic recognition of flowchart images                         first wavelength. \n (Irradiation process) \n The
    Convolutional neural networks (CNNs) are used to                       negative resist pattern is formed \n Metal film is
recognize flowchart images in patents. Our method                          formed on top of the semiconductor layer \n
uses seven CNN models trained on a large image data                        Negative resist pattern is removed from the
set called “ImageNet” to construct a learning model by                     semiconductor layer \n Completion \n P110 \n
fine tuning, and its effectiveness is verified through the                 P120 \n P130 \n P140 \n P150
experiments described in Section 4.                                       Figure 2: Character Recognition Results for the Image
                                                                          in Figure 1.
(Step 2) Extraction of character strings from the
flowchart image                                                           (Step 3) Creation of data for machine learning
    An optical character recognition function in Google                       We build an explanatory text generator by machine
Cloud Vision (https://cloud.google.com/vision) is                         learning, using pairs of character recognition results
used to extract text strings from flowcharts. An                          and explanatory text from a large number of flowchart
example of a flowchart image and the character                            images. In this process, we consider that data with
recognition result are shown in Figures 1 and 2                           large differences between the character recognition
respectively. Here, “\n” indicates a line break.                          results and the manually written explanatory texts
                                                                          (patent abstracts) are inappropriate as training data1;
                                                                          therefore, we exclude these data. In this process, we
                                                                          calculate the similarity between the character
                                                                          recognition result and the explanatory text of the
                                                                          flowchart image using Gestalt pattern matching [10]
                                                                          and use only the pairs that are above a threshold value
                                                                          for training.

                                                                          (Step 4) Construction of an explanatory text
                                                                          generator using T5
                                                                              We build an explanatory text generator by the
                                                                          language model T5. With respect to the flowchart
                                                                          image in Figure 1, the input and output of T5 are Figure
                                                                          2 for input and Figure 3 for output.


Figure 1: Example of Flowchart Image Included in a
Patent

1 Figures 2 and 3 show examples of a character recognition result         case, the similarity between them is so high that we use them as
and a manually written explanatory text (patent abstract). In this        machine learning data.


                                                                     23
 [original]                                                     Evaluation
 半導体装置の製造方法は、半導体層の上にフォト                                            The seven methods and the baseline method were
 レジストを塗布する工程と；第１波長の紫外線を                                         evaluated using Precision, Recall, and F-measure.
 用いてフォトレジストを露光した後にフォトレジ
                                                                Results
 ストを現像することによって、開口部の内側へと
                                                                    The experimental results are shown in Table 1.
 迫り出したネガ型レジストパターンを、形成する                                         Among the compared methods, DenseNet121 was the
 工程と；第１波長より短い第２波長の紫外線をネ                                         most accurate in detecting flowcharts in terms of
 ガ型レジストパターンに照射することによって、                                         Precision. The results from DenseNet121 were used in
 ネガ型レジストパターンを硬化させる照射工程                                          the subsequent experiments.
 と；照射工程を行った後、ネガ型レジストパター
 ンの開口部から露出する半導体層の上に、ニッケ                                         Table 1
 ル（Ｎｉ）から主に成る金属膜を形成する工程                                          Flowchart Recognition Results with Eight Models
 と；ネガ型レジストパターンを半導体層から除去                                                         Precision     Recall       F-
 する工程とを備える。                                                                                                measure
 [translation]                                                    Baseline      0.8508        0.8902       0.8701
 The method of manufacturing a semiconductor                      VGG16         0.8750        0.9711       0.9205
 device comprises the steps of: applying a                        VGG19         0.9227        0.9653       0.9435
 photoresist onto a semiconductor layer; forming a                ResNet50      0.8698        0.9653       0.9151
 negative resist pattern, which is pressed inwards                InceptionV3   0.9422        0.9422       0.9422
 into an aperture, by developing the photoresist after            MobileNet     0.9326        0.9595       0.9459
 exposing the photoresist using a first wavelength of             DenseNet169 0.9593          0.9538       0.9565
 ultraviolet light; forming a negative resist pattern by          DenseNet121 0.9645          0.9422       0.9532
 irradiating the negative resist pattern with
 ultraviolet light of a second wavelength that is
 shorter than the first wavelength; and The negative            4.2. Automatic Generation of
 resist pattern is hardened by irradiating the
 negative resist pattern with ultraviolet light of a
                                                                Explanatory Text from Flowchart
 second wavelength shorter than the first                       Images
 wavelength; forming a metal film mainly comprising
 nickel (Ni) on the semiconductor layer exposed from            Data
 the opening of the negative resist pattern after                   Among the Japanese patents published from 2010
 performing the irradiation process; and removing               to 2019, 11,188 patents that included flowcharts and
 the negative resist pattern from the semiconductor             with a similarity of 0.1 by Gestalt pattern matching
 layer. The process of removing the negative resist             were used in our experiments. Of these patents, 90%
 pattern from the semiconductor layer.                          were categorized as training data and the remainder as
Figure 3: Explanatory Text (Patent Abstract)                    validation and evaluation data.
Correspongind to the Image in Figure 1
                                                                Hyperparameters
                                                                   The following hyperparameters were used in the
4. Experiments                                                  generation of explanatory texts by T5.
                                                                l    Max input length: 280
We performed some experiments to confirm the                    l    Max target length: 256
effectiveness of our method.                                    l    Train batch size: 8
                                                                l    Eval batch size: 8
                                                                l    Num train epochs: 6
4.1. Automatic Recognition of
Flowchart Images                                                Evaluation
                                                                   Our method was evaluated using the following
Data                                                            measures:
   Using 7,099 randomly selected images from the                l    ROUGE-N: This is the most basic index and is a
2018 edition of the Japanese Patent Public Gazette, we               method of taking the degree of agreement in N-
manually identified whether they were flowcharts or                  gram units. In this case, N = 1, 2 were used for
not and obtained 1,120 flowcharts from the 7,099                     evaluation (https://github.com/pltrdy/rouge).
cases.                                                          l    ROUGE-L: Evaluates the maximum sequence that
                                                                     matches the generated summary and the
Alternative methods                                                  manually generated summary.
    As a baseline method, we used Keras, a deep                 l    BERTScore [11]: An automatic evaluation metric
learning library, to build CNN training models with                  using the language model BERT [12], which
three layers for Conv2D and two layers for                           calculates the similarity between texts using
MaxPooling2D. As a comparison method, we used                        vector representations obtained from pretrained
seven CNN models: VGG16, VGG19, ResNet50,                            BERT.
InceptionV3,     MobileNet,    DenseNet169,     and
DenseNet121 trained on a large image data set called
ImageNet.


                                                           24
Results                                                   [original]
   The results of Recall, Precision, and F-measure for    ガス供給箇所へ供給されるガスの流量を測定する
ROUGE-1, ROUGE-2, ROUGE-L, and BERTScore.                 ガスメータの一次側へ連通接続する連通空間を有
                                                          すると共に当該連通空間の湿度を測定する湿度測
Table 2
                                                          定部を有する水差推定治具を取り付ける取付工程
Evaluation Results for the Generation of Explanatory
                                                          と、湿度測定部にて連通空間の湿度を測定する湿
Texts
                                                          度測定工程と、湿度測定工程にて測定される湿度
                 Recall      Precision    F-
                                                          に基づいて、ガス管に水差しが発生しているか否
                                          measure
                                                          かを推定する水差推定工程とを実行する。
 ROUGE-1         0.47        0.72         0.55            [translation]
 ROUGE-2         0.26        0.46         0.32            The following processes are performed: An
 ROUGE-L         0.41        0.64         0.49            installation process in which a water-difference
 BERTScore       0.74        0.77         0.75            estimation jig is attached to a gas meter with a
                                                          connecting space that is connected to the primary
Discussion                                                side of the gas meter that measures the flow rate of
    For simple geometries with no branches in the         gas supplied to the gas supply point and a humidity
flowchart (see Figure 4), we obtained good analytical     measuring section that measures the humidity in the
results. Figures 5 and 6 show the explanatory text and    connecting space; a humidity measurement process
the patent summary (correct answer) generated by          in which the humidity in the connecting space is
our method, respectively.                                 measured by the humidity measuring section; and a
                                                          water-difference estimation process in which
                                                          whether a water drop occurs in a gas pipe is
                                                          estimated based on the humidity measured in the
                                                          humidity measurement process.
                                                         Figure 6: Patent Summary for the Image in Figure 1
                                                         (Correct Answer)

                                                             Flowcharts with complex shapes, such as the one
                                                         shown in Figure 7, tended to generate low-quality
                                                         explanatory text. The dash line boxes in the figure
                                                         were added by the author for the purpose of
                                                         explanation. The description generated by the process
Figure 4: Example of Target Image for Generation         in Figure 7 is shown in Figure 8.

 [original]
 測定治具を取り付ける取付工程と、複数のガス供
 給箇所にて取付工程及び湿度測定工程を実行する
 湿度測定工程と、複数のガス供給箇所での湿度測
 定工程での測定結果に基づいて、水差発生箇所を
 推定する水差推定工程と、を含む。
 [translation]
 The process includes a mounting process to install
 the measurement jig, which is a humidity
 measurement process to perform the mounting and
 humidity measurement processes at multiple gas
 supply locations, and a water-difference estimation
 process to estimate the location of water-difference
 occurrence based on the measurement results of the
 humidity measurement process at multiple gas
 supply locations.
Figure 5: Exploratory Text Automatically Generated
from the Image in Figure 1


                                                         Figure 7: Example of a Flowchart with Conditional
                                                         Branching
 [original]                                                      References
 潜伏モードへ移行し(s2110)、信頼度情報を取得
 し(s2110)、変倍率を決定し(s2110)、経過時間の                                   [1]  J. Starlinger, B. Brancotte, S. Cohen-Boulakia, and
 計測を開始する(s2112)。そして、画像表示部が                                            S. Leser, Similarity Search for Scientific
 所定時間経過しているか否かを判定し(s2112)、                                            Workflows, Proceedings of the VLDB
 所定時間が経過すると(s2112 にて yes)、潜伏報                                         Endowment, Vol. 7, No. 12, pp.1143-1154, 2014.
 知モードを終了する(s2112)。潜伏報知モードを                                       [2] F. Piroi, M. Lupu, and A. Hanbury, Overview of
 終了すると(s2112 にて yes)、潜伏報知モードを                                         CLEF-IP 2013 Lab Information Retrieval in the
 終了する。                                                                Patent Domain, Information Access Evaluation.
 [translation]                                                        Multilinguality, Multimodality, and Visualization.
 The system moves to the latent mode (s2110),                         CLEF 2013. Lecture Notes in Computer Science,
 obtains the reliability information (s2110),                         Vol. 8138. Springer, Berlin, Heidelberg, 2013.
 determines the variable magnification factor                    [3] J. I. Herrera-Cámara, FLOW2CODE - From Hand-
 (s2110), and starts measuring the elapsed time                       drawn Flowchart to Code Execution, Master
 (s2112). The image display then determines                           Thesis, Texas A&M University, 2017.
 whether or not the predetermined time has elapsed               [4] A. Sethi, A. Sankaran, N. Panwar, S. Khare, and S.
 (s2112). When the predetermined time elapses (yes                    Mani, DLPaper2Code: Auto-generation of Code
 at s2112), the latent report mode is terminated                      from Deep Learning Research Papers,
 (s2112). When the latent report mode is terminated                   Proceedings of the 32th AAAI Conference on
 (yes at s2112), the latent report mode is terminated.                Artificial Intelligence, 2018.
                                                                 [5] J. Zhu, J. Ran, R. K. Lee, Z. Li, and K. Choo,
Figure 8: Exploratory Text Automatically Generated
                                                                      AutoChart: A Dataset for Chart-to-Text
from the Image in Figure 7
                                                                      Generation        Task,   Proceedings      of    the
                                                                      International Conference on Recent Advances in
    Looking at Figure 8, overall, step IDs such as s2110
                                                                      Natural Language Processing, pp. 1636-1644,
do not correspond to the explanatory text, but this is
                                                                      2021.
because this time the coordinates of each string in the
                                                                 [6] J. Obeid and E. Hoque, Chart-to-Text: Generating
figure are not considered at all. The first conditional
                                                                      Natural Language Descriptions for Charts by
branch is “When the predetermined time elapses (yes
                                                                      Adapting the Transformer Model, Proceedings of
at s2112), the latent report mode is terminated.” The
                                                                      the 13th International Conference on Natural
correct sentence is generated except for the step ID
                                                                      Language Generation, pp. 138-147, 2020.
(s2112) (see Figure 8). However, the dashed box in
                                                                 [7] H. Tan, C. Tsai, Y. He, M. and Bansal, Scientific
Figure 5 is not included in the explanatory text.
                                                                      Chart Summarization: Datasets and Improved
Currently, the character strings output by Google
                                                                      Text Modeling, Proceedings of the AAAI-22
Cloud Vision’s character recognition results are used
                                                                      Workshop           on     Scientific     Document
as input to T5 as they are, but in the future, it will be
                                                                      Understanding, 2022.
necessary to perform preprocessing such as
                                                                 [8] K. Kantharaj, R. T. Leong, X. Lin, A. Masry, M.
considering the coordinate information of the
                                                                      Thakkar, E. Hoque, and S. Joty, Chart-to-Text: A
character strings and reordering them appropriately.
                                                                      Large-Scale Benchmark for Chart Summarization,
                                                                      Proceedings of the 60th Annual Meeting of the
5. Conclusions                                                        Association for Computational Linguistics, pp.
                                                                      4005-4023, 2022.
                                                                 [9] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang,
    In this study, 11,188 flowchart image-description
                                                                      M. Matena, Y. Zhou, W. Li, and P. J. Liu, Exploring
pairs were obtained from patents and these data were
                                                                      the Limits of Transfer Learning with a Unified
used to construct a system that automatically
                                                                      Text-to-Text Transformer, Journal of Machine
generates descriptions of flowchart images using T5.
                                                                      Learning Research, Vol. 20, No. 140, pp. 1-67,
The experimental results showed that for the detection
                                                                      2020.
of flowchart images, an accuracy of 0.9645 was
                                                                 [10] J. W. Ratcliff and D. Metzener, Pattern Matching:
achieved with a fine-tuned model using DenseNet121.
                                                                      The Gestalt Approach, Dr. Dobb’s Journal, pp. 46,
In the generation of explanatory text from flowchart
                                                                      1988.
images, it was found that high-quality explanatory text
                                                                 [11] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and
could be generated, although some issues remain for
                                                                      Y, Artzi, BERTScore: Evaluating Text Generation
flowcharts with complex shapes. In the future, we will
                                                                      with BERT, arXiv:1904.09675 [cs.CL], 2019.
examine the possibility of generating appropriate
                                                                 [12] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova,
explanatory text for flowcharts with complex shapes,
                                                                      BERT: Pre-training of Deep Bidirectional
such as those containing multiple conditional branches,
                                                                      Transformers for Language Understanding,
by considering the positional information of each
                                                                      Proceedings of the 2019 Conference of the North
character string in the image, rather than using the
                                                                      American Chapter of the Association for
character strings in the flowchart as is.
                                                                      Computational Linguistics: Human Language
                                                                      Technologies, p. 4171-4186, 2019.
Acknowledgment
  This work was supported by JSPS KAKENHI Grant
Numbers JP22K12154 and JP20H04210.


                                                            26

</pre>