<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Digital interpretation of sensor-equipment diagrams</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Carlos Francisco Moreno-Garc a</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>The Robert Gordon University</institution>
          ,
          <addr-line>Garthdee Road, Aberdeen</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A sensor-equipment diagram is a type of engineering drawing used in the industrial practice that depicts the interconnectivity between a group of sensors and a portion of an Oil &amp; Gas facility. The interpretation of these documents is not a straightforward task even for human experts. Some of the most common limitations are the large size of the drawing, a lack of standard in de ning equipment symbols, and a complex and entangled representation of the connectors. This paper presents a system that, given a sensor-equipment diagram and a few impositions by the user, outputs a list with the reading of the content of the sensors and the equipment parts plus their interconnectivity. This work has been developed using open source Python modules and code, and its main purpose is to provide a tool which can help in the collection of labelled samples for a more robust arti cial intelligence based solution in the near future.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>A sensor-equipment diagram (SED) is a type of engineering drawing which is
commonly used in the Oil &amp; Gas industrial practice to depict how a group
of sensors are interconnected to a certain section of an oil rig or a plant. These
drawings are composed of a central main grid with multiple pieces of equipment,
plus a series of circular shapes which represent sensors. An example is shown in
Figure 1. Notice that there are two types of sensors: Local, which are connected
to the main grid through a solid line, and Panel Mounted, which are connected
to the local sensors through dashed lines.</p>
      <p>In recent years, the Oil &amp; Gas industry has shown a particular interest in
developing systems that can digitise and interpret a large amount of SEDs in
order to migrate printed drawings towards a paperless environment.
Nonetheless, experts have realised the di culty of the automation of this task due to
several factors, most notably poor image quality, the large size of a SED, a lack
of clarity on the delimitations of an equipment symbol and the complexity of
understanding the connectivity. For instance, the SED shown in Figure 1 is an
image of 4460 2544 pixels (approx 1.5 Mb) which contains 164 sensors (not all
of them connected to the equipment parts) and four pieces of equipment which
can only be identi ed empirically, since there is no particular standard for
differentiating and delimiting equipment parts. Moreover, notice that connectors
may overlap or gap each other.</p>
      <p>
        Digitisation and interpretation of engineering drawings from the Oil &amp; Gas
industry is not a new problem. In fact, the rst literature dates back to the
1980's [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] for a type of drawings called piping and instrumentation diagram
(P&amp;ID), which is a much harder type of drawing to digitise, in part because of
its larger size, complexity and amount of symbols used. In the 1990's, Howie et
al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] developed a system which was capable of digitising simple P&amp;IDs in DFX
format by using a set of symbols previously loaded by the user.
      </p>
      <p>
        In 2016 Banerjee et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] presented a system for the automatic linking of
construction and manufacturing engineering drawings, where a series of circular
symbols called callouts are used to establish a link between two pages. To that
aim, they implemented a function based on Hough circle detection [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and some
rules to distinguish between true callouts and any other circular shapes in the
drawing.
      </p>
      <p>
        Most recently, Moreno-Garcia et al. presented a heuristics-based method to
detect and segregate a series of commonly found shapes in P&amp;IDs. Afterwards, a
state of the art digitisation methodology called text/graphics separation [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] was
applied to the remaining drawing. This framework showed improved results for
text detection. Some of the shapes detected in the P&amp;IDs were continuity labels
(i.e. arrow-like shapes), polygons and circular sensors. Later, this methodology
was later applied by Elyan et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to collect a dataset of P&amp;ID symbols and
perform classi cation experiments.
      </p>
      <p>
        The tool starts by importing the image as a grayscale bitmap. Then, this
image is binarised using a standard thresholding algorithm [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Afterwards, the
system requests the user to perform two types of selections, the area of interest
and the equipment part(s) area(s), through the use of a Python-based module
called Sloth1. The purpose of the manual imposition of the area of interest is
twofold; rstly, to discard unwanted elements such as the SED title, margin or
other elements considered noise, and secondly, in case that not all of the SED
is meant to be digitised. Since there are currently no set of rules that delimit
what exactly constitutes each equipment part, the tool requests the users to do
an approximate selection of what he or she considers an equipment part. It is
expected that, as this tool is used progressively and more labelled examples are
obtained, eventually a machine learning algorithm can be trained to
automatically detect equipment parts. Figure 2 shows these selections on the example
SED.
      </p>
      <p>
        Once the area of interest is segregated, a circle detection algorithm based on
OpenCV2 Hough circles, which was succesfully implemented in previous work
for P&amp;IDs [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], is applied to the remaining image. All detected sensors are stored
as individual images, as shown in Figure 3. To di erentiate between Local and
Panel Mounted sensors, a vertical scan is executed for each sensor image. If one
or more horizontal continuous lines are found, then the sensor is agged as Panel
Mounted.
      </p>
    </sec>
    <sec id="sec-2">
      <title>1 http://sloth.readthedocs.io/en/latest/ 2 https://opencv.org/</title>
      <p>To segregate the text inside the sensors, the largest and outermost contours of
each sensor image are discarded, assuming these to be the sensor shape and any
noisy pixels surrounding the sensor. In addition, for the case of Panel Mounted
sensors, an additional detection step for large and elongated contours is
implemented. This allows the identi cation and removal of the horizontal line
dividing the text within the sensor. Then, a connected component analysis (CCA)
is applied on the remaining image to obtain the height and width of each text
character. This will be useful in a later stage to detect the text naming each
equipment part. Finally, the sensor text is read using the Pytesseract OCR3
module.</p>
      <p>
        To automatically detect the text naming each equipment part, a CCA is
run on each equipment part area to detect all shapes which are approximately
the height and width of an average text character found in sensors. Then, a
morphological brushing operation [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is executed, with the aim of lling the gap
between text characters and creating contiguous strings of black pixels. The
widest stream of contiguous pixels reveals the location of the equipment name,
which is then read using Pytesseract OCR.
      </p>
      <p>
        Finally, the image containing the connectors and the main grid is analysed
using a line detection method similar to the one porposed in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This outputs a
list of the starting point, endpoint and length of each line. For each local sensor,
the system nds the closest line and iteratively checks the line list to nd a
line segment which "follows the path" (i.e. has a start/endpoint close to the
start/end point of another line), until one of the following two conditions are
met: 1) the start/endpoint of a line reaches an area marked as an equipment part
or 2) there are no further lines that follow the path and an equipment area has
not been reached. Once all local sensors are processed, the tool shows the user
the nal image depicting the local (blue) and panel mounted (red) sensors, the
equipment parts names (pink) and the connectors (green). Moreover, the system
outputs the list of sensors and the equipments to which these are connected.
Both of these outputs are shown in Figure 4.
      </p>
      <p>This paper presents a system which, given an engineering drawing known as
a SED, produces a list of the connectivity between the sensors and the
equipment parts contained on the main grid. By requesting the user to select the
area of interest and the equipment parts, the system automatically nds the
sensors, reads the content of the sensors and equipment parts, and deduces the
connectivity between these shapes. The method has been successfully showcased</p>
    </sec>
    <sec id="sec-3">
      <title>3 https://anaconda.org/ijstokes/pytesseract</title>
      <p>to industrial partners of the Oil &amp; Gas sector and has been mounted in one of
their servers for testing in future projects4.</p>
      <p>
        It is important to note that this work opens a niche area of engineering
drawing analysis in the light of deep learning advancements. Literature
regarding the application of machine learning for the digitisation and interpretation
of engineering drawings is still scarce, and most importantly, there is an insu
cient amount of labelled data which can aid on training automated systems to
perform these tasks. While some work based on neural networks has been
presented for the digitisation of other assets such as circuit diagrams [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and P&amp;IDs
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], these mostly dedicate e orts to the identi cation of recurrent shapes, but
neither consider the presence of rare shapes such as equipment parts, nor the
contextualisation of the connectivity between symbols. It is expected that by
developing semi-automatic solutions such as the one presented in this paper, it
is possible to generate a considerable amount of labelled engineering drawings
which eventually can serve as an input for arti cial intelligence based solutions.
4 http://circuits-dev.azurewebsites.net/
      </p>
      <p>c
rs e
o m
l p</p>
      <p>i
g u
n q</p>
      <p>e</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>P.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Choudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Roy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B. B.</given-names>
            <surname>Chaudhuri</surname>
          </string-name>
          .
          <article-title>Automatic Hyperlinking of Engineering Drawing Documents</article-title>
          .
          <source>Proceedings of the 12th IAPR International Workshop on Document Analysis Systems (DAS 2016)</source>
          , pages
          <fpage>102</fpage>
          {
          <fpage>107</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. T. Cheng, J. Khan, H. Liu, and
          <string-name>
            <given-names>D.</given-names>
            <surname>Yun</surname>
          </string-name>
          .
          <article-title>A symbol recognition system</article-title>
          .
          <source>In ProceedIngs of the Second International Conference on Document Analysis and Recognition - ICDAR'93</source>
          , pages
          <fpage>918</fpage>
          {
          <fpage>921</fpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>R. O.</given-names>
            <surname>Duda</surname>
          </string-name>
          and
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Hart</surname>
          </string-name>
          .
          <article-title>Use of the Hough transformation to detect lines and curves in pictures</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>15</volume>
          (April
          <year>1971</year>
          ):
          <volume>11</volume>
          {
          <fpage>15</fpage>
          ,
          <year>1971</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>E.</given-names>
            <surname>Elyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. F.</given-names>
            <surname>Moreno-Garcia</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Jayne</surname>
          </string-name>
          .
          <article-title>Symbols classi ction in engineering drawings</article-title>
          .
          <source>In International Joint Conference on Neural Networks (IJCNN)</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>M.</given-names>
            <surname>Furuta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kase</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Emori</surname>
          </string-name>
          .
          <article-title>Segmentation and recognition of symbols for handwritten piping and instrument diagram</article-title>
          .
          <source>pages 626{629</source>
          ,
          <year>1984</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>M. K. Gellaboina</surname>
            and
            <given-names>V. G.</given-names>
          </string-name>
          <string-name>
            <surname>Venkoparao</surname>
          </string-name>
          .
          <article-title>Graphic symbol recognition using auto associative neural network model</article-title>
          .
          <source>In Proceedings of the 7th International Conference on Advances in Pattern Recognition, ICAPR</source>
          <year>2009</year>
          , pages
          <fpage>297</fpage>
          {
          <fpage>301</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>C.</given-names>
            <surname>Howie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kunz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Binford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K. H.</given-names>
            <surname>Law</surname>
          </string-name>
          .
          <article-title>Computer interpretation of process and instrumentation drawings</article-title>
          .
          <source>Advances in Engineering Software</source>
          ,
          <volume>29</volume>
          (
          <issue>7- 9</issue>
          ):
          <volume>563</volume>
          {
          <fpage>570</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>M.</given-names>
            <surname>Ishii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yamamoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Harada</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Iwasaki</surname>
          </string-name>
          .
          <article-title>An automatic recognition system for piping and instrument diagrams</article-title>
          .
          <source>Systems and computers in Japan</source>
          ,
          <volume>20</volume>
          (
          <issue>3</issue>
          ):
          <volume>32</volume>
          {
          <fpage>46</fpage>
          ,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lu</surname>
          </string-name>
          .
          <article-title>Detection of text regions from digital engineering drawings</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          ,
          <volume>20</volume>
          (
          <issue>4</issue>
          ):
          <volume>431</volume>
          {
          <fpage>439</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>C. F Moreno-Garc</surname>
            <given-names>a</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>E</given-names>
            <surname>Elyan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C</given-names>
            <surname>Jayne</surname>
          </string-name>
          .
          <article-title>Heuristics-Based Detection to Improve Text / Graphics Segmentation in Complex Engineering Drawings</article-title>
          .
          <source>In Engineering Applications of Neural Networks, volume CCIS 744</source>
          , pages
          <fpage>87</fpage>
          {
          <fpage>98</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>N.</given-names>
            <surname>Otsu</surname>
          </string-name>
          .
          <article-title>A threshold selection method from gray-level histograms</article-title>
          .
          <source>IEEE Transactions on Systems, Man, and Cybernetics</source>
          ,
          <volume>9</volume>
          (
          <issue>1</issue>
          ):
          <volume>62</volume>
          {
          <fpage>66</fpage>
          ,
          <year>1979</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>