<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Yuhong Yu, A. Samal, and S.C. Seth. A system for recognizing a large class of engineering
drawings. Pattern Analysis and Machine Intelligence, IEEE Transactions on</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">0162-8828</issn>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1109/34.608290</article-id>
      <title-group>
        <article-title>Textual summarisation of owcharts in patent drawings for CLEF-IP 2012</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrew Thean</string-name>
          <email>andy.thean@gmail.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jean-Marc Deltorn</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Patrice Lopez</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laurent Romary</string-name>
          <email>laurent.romary@inria.fr</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>General Terms</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Measurement, Performance, Experimentation</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>1997</year>
      </pub-date>
      <volume>19</volume>
      <issue>8</issue>
      <abstract>
        <p>The CLEF-IP 2012 track included the Flowchart Recognition task, an image-based task where the goal was to process binary images of owcharts taken from patent drawings to produce summaries containing information about their structure. The textual summaries include information about the owchart title, the box-node shapes, the connecting edge types, text describing owchart content and the structural relationships between nodes and edges. An algorithm designed for this task and characterised by the following method steps is presented: Text-graphic segmentation based on connected-component clustering; Line segment bridging with an adaptive, oriented lter; Box shape classi cation using a stretch-invariant transform to extract features based on shape-speci c symmetry; Text object recognition using a noisy channel model to enhance the results of a commercial OCR package.</p>
      </abstract>
      <kwd-group>
        <kwd>H</kwd>
        <kwd>3 [Information Storage and Retrieval]</kwd>
        <kwd>H</kwd>
        <kwd>3</kwd>
        <kwd>1 Content Analysis and Indexing</kwd>
        <kwd>H</kwd>
        <kwd>3</kwd>
        <kwd>3 Information Search and Retrieval</kwd>
        <kwd>H</kwd>
        <kwd>3</kwd>
        <kwd>4 Systems and Software</kwd>
        <kwd>H</kwd>
        <kwd>3</kwd>
        <kwd>7 Digital Libraries</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Certain inventions are most e ectively summarised in pictorial form and patent professionals
sometimes rely heavily on patent drawings for performing prior art searches. Searching patent drawings
is currently a labour-intensive, error-prone task which would be facilitated by automatic
indexing methods. Generic methods for automatically indexing patent drawings have been reported
        <xref ref-type="bibr" rid="ref2 ref20 ref7">(Hanbury et al., 2011; Vrochidis et al., 2010; Codina et al., 2009)</xref>
        but the heterogeneity of patent
drawings means that it is di cult for a one-size- ts-all approach to reach high levels of performance
on all image classes. Flowcharts represent a large and useful subclass of patent images for which
specially-adapted methods will improve indexing performance. They have a special role in the
patents domain since they o er a way to graphically represent a process (a process is one of four
legally-recognised categories of patent claim alongside products, apparatus and use e.g. Rule 43(2)
        <xref ref-type="bibr" rid="ref3">EPC (2000)</xref>
        ) and often contain information-rich text that is prohibited from other types patent
drawing (they are usually considered an exception to rules that generally prohibit text from patent
drawings e.g. Rule 46(2)(j) of
        <xref ref-type="bibr" rid="ref3">EPC (2000)</xref>
        and Rule 6.2 (b) of
        <xref ref-type="bibr" rid="ref15">PCT (1970)</xref>
        ). Flowcharts were
one of the image categories targeted in the Patent Image Cl
        <xref ref-type="bibr" rid="ref12">assi cation task of CLEF-IP 2011</xref>
        (see
Piroi et
        <xref ref-type="bibr" rid="ref12">al. (2011</xref>
        ) and references therein), but the summarisation of owcharts in patent drawings
has not been discussed previously in the literature.
      </p>
      <p>
        Few studies have addressed the automatic analysi
        <xref ref-type="bibr" rid="ref8">s of owcharts in general. Ito (1982</xref>
        ) used
corner detection and edge-based block shape classi cation to derive FORTRAN code directly from a
owchart bitmap (see also US5386508). In Abe et al. (1986), a method for discriminating between
the various components of a owchart diagram (symbols, lines and characters) was discussed.
Kasturi et al. (1990) proposed an automatic line drawing analysis system (including owcharts)
based on the reduction of the image into segments and closed loops. Yu et al. (1997) described a
general framework for the analysis of engineering drawings characterised by alternating instances
of symbols and connections lines. The particular case of owchart summarisation was also
discussed in Fut
        <xref ref-type="bibr" rid="ref5">relle and Nikolakis (1995</xref>
        ) in the context of automatic document analysis. More
recently, Sz
        <xref ref-type="bibr" rid="ref18">woch (2007</xref>
        ) applied template matching to the recognition and description of
handdrawn owchart components for the purpose of automatic diagram aestheticization. Vasudevan
et al. (2008) described a prototype system of owchart knowledge extraction from images based on
chain code boundary descriptors and neural network classi er. Fin
        <xref ref-type="bibr" rid="ref12">ally, in Lemaitre et al. (2010</xref>
        )
the characteristic causal relationship between owchart components (described as line strokes)
was exploited through a set of syntactic constraints that helped address the problems of owchart
diagram recognition as well as text-graphics segmentation.
      </p>
      <p>The paper is set out as follows: in Section 2 the problem addressed in the CLEF-IP 2012
Flowchart Recognition task is de ned in detail; in Section 3 a method of solving this problem is
described; in Section 4 the performance of the method is reported and discussed.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Problem Statement</title>
      <sec id="sec-2-1">
        <title>Input image data</title>
        <p>
          A training set of 50 training images were released in March 1012, followed by a set of 100 test
images three months later: results were submitted on July 10th 2012. The input image data
consists of binary images of owcharts taken from patent drawings. Each image contains a single
gure and has been cropped from a patent drawing to exclude irrelevant information such as patent
metadata and page numbers. As such the task does not include gure segmentation: in general
a single patent drawing may contain multiple gures and a single owchart may be spread over
multiple drawings/ gures. In principle, the appearance of any patent drawing is constrained by the
rules applied at large patent o ces (although not all patent drawings are compliant). Examples
of the most important of these rules as applied by the EPO are as follows (Rule 46(2) of
          <xref ref-type="bibr" rid="ref3">EPC
(2000)</xref>
          , Section A-X, 7.5.1 of
          <xref ref-type="bibr" rid="ref6">GL (2012)</xref>
          ):
only black lines drawn with drafting instruments are allowed;
the scale of the drawing should allow for two thirds reduction in size;
the character height should be larger than 0.32 mm;
reference signs (no-box nodes) should be clear and without brackets;
no text should be included in drawings unless it is indispensable (this rule is intended to
minimise the need for language translation in drawings, however text in owcharts is usually
considered indispensable);
leading lines (wiggly edges) should be as short as possible and extend to the feature indicated.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Text output</title>
        <p>The aim of the Flowchart Recognition task was to produce textual summaries in a predetermined
format containing information about the owchart structure in the following 3 categories.</p>
        <p>Metadata describing the number of nodes and edges in the owchart and, optionally, the
'title' of the owchart (usually the gure number).</p>
        <p>Node data describing the type of each node and, optionally, the text appearing in the node.
The following node types are possible: oval, rectangle, double-rectangle, parallelogram,
diamond, circle, cylinder, no-box, unknown and point. All types of node correspond to boxes or
image regions that may contain text apart from point nodes which are the junction points
where two or more edges meet.</p>
        <p>
          Edge data describing node connectivity relationships including, optionally, any text attached
to the edge. Edges are either undirected or directed (in the sense indicated by arrow symbols)
and have the following possible types: plain, dotted, or wiggly. The type wiggly refers to
elements referred to as leading lines in patent law: wiggly edges are not necessarily wiggly
in shape, but serve to connect reference signs to nodes (e.g. see Section A-X, 7.5.1 of
          <xref ref-type="bibr" rid="ref6">GL
(2012)</xref>
          ).
        </p>
        <p>box</p>
        <p>connector
white
background implicit
graphic
3.3</p>
        <p>pixel
noise</p>
        <p>noise
3.1
rectangle
parallelogram diamond</p>
        <p>oval circle
double rectangle</p>
        <p>cylinder
3.5
black
graphic
point node
3.6
3.2</p>
        <p>text
node edge meta
label label</p>
        <p>3.8
edge
3.7
plain
dashed</p>
        <p>wiggly
UE DE UE DE
Figure 1: Possible membership states of image pixels in a owchart. Numerical labels refer to the
section headings below which describe how the various membership values are assigned. Note that joint
membership states are, in principle, possible but are not allowed in the approach chosen.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Method</title>
      <p>The input images may be scanned (and binarised) versions of hard copies led with the patent
application. Note that online ling of patent applications in electronic form now dominates
submissions and leads to less noisy images, but scanned images dominate archived patent image
databases. To remove some of the defects introduced by the digitisation process the following
two-step method is applied. Firstly, a morphological opening with a kernel of 3 3 pixels removes
isolated black dots. Secondly, a sequence of local median lters is applied in order to remove
isolated line breaks and small 1-to-2-pixel gaps and to perform de-jagging of the lines. The choice
of a median lter rather than a morphological closing, is motivated by the aim of bridging line
breaks without arti cially connecting text characters with the neighboring owchart outline.
3.2</p>
      <sec id="sec-3-1">
        <title>Text-graphic segmentation</title>
        <p>For 70% of the training set it is possible to separate text from graphics (nodes and edges) by
assuming that graphics are the single largest connected component. Based on this observation,
text-graphic segmentation was performed by analysing the spatial extent of all connected
components in the image. A mixture of three two-dimensional Gaussians was tted to the height-vs-width
distribution of the connected components using the Expectation Maximisation (EM) algorithm.
The cluster with the largest height and width was deemed to contain all graphics components.
3.3</p>
      </sec>
      <sec id="sec-3-2">
        <title>Finding implicit-graphics pixels</title>
        <p>Certain types of connectivity between black pixels in a owchart are merely implied i.e. the
connectivity of dotted/dashed line elements, the connectivity of edges and nodes that do not
physically touch and the outlines of broken box nodes (for example diamonds that are broken
to accommodate text). The chosen method de nes implicit-graphics pixels as white pixels that
should be treated as black pixels in order to physically establish the connectivity implied by the
owchart author.</p>
        <p>While some of the smaller line breaks are removed by the rst preprocessing step (see Section
3.1), larger gaps can remain that cannot be resolved by simple morphological operations.
Wedgeshaped masks, de ned by an opening angle and a radius, are used to bridge such line breaks and
connect dashed line segments. Firstly, a skeleton of the isolated graphic component is obtained and
an estimate of the orientation in the neighborhood of each endpoint is derived. A wedge-shaped
mask is placed at each endpoint and aligned with the local skeleton orientation. All intersecting
pairs of masks are identi ed and two endpoints are reconnected by a straight line if their respective
wedge masks intersect. This method is similar in nature to the dashed line detection algorithm
presented in Kasturi et al. (1990).
3.4</p>
      </sec>
      <sec id="sec-3-3">
        <title>Segmentation of closed regions</title>
        <p>A connector is de ned as a line in a owchart that does not de ne a box node. Usually a connector
pixel belongs to an edge, but it may also belong to a point node. The approach chosen for
nodeconnector segmentation relies on a rst step of box node classi cation. Closed regions in the
graphics may correspond to either box nodes or loops (a loop is de ned as a background region
enclosed by edge lines). Node-connector segmentation is performed as follows. Firstly, loops are
identi ed using the box classi cation technique described below. Secondly, the interior region of
any remaining box node is dilated and used as a mask to separate nodes from connectors.</p>
        <p>graphic</p>
        <p>flowchart bitmap
preprocessing &amp; denoising
segment graphic component
graphic component postprocessing
segment closed regions
check main orientation
regions feature extraction
and node classification
identify connectors
identify point nodes
associate nodes and edges
idenfify no-box nodes
get edge properties
extract text regions</p>
        <p>Y</p>
        <p>OCR
text?
N
generate output
flowchart text description</p>
        <p>N
S H
Id
S H
S H
Id
Id
Id
S H
Id
Id
Estimating the global orientation of the owchart was necessary to allow accurate box-node
classi cation and OCR processing. The aspect ratio of the box node candidates identi ed after
closed-region segmentation was measured and the entire image was rotated clockwise by 90
degrees if the fraction of box node candidates for which the height exceeded the width was greater
than a xed threshold.</p>
        <p>Classifying box node candidates involved reducing each closed shape to a shape primitive. This
step exploits the observation that during the owchart authoring process individual box nodes may
be squeezed or stretched in the horizontal and vertical directions according to the amount of text
they contain. Since node classi cation should be invariant to such transformations, each candidate
was subjected to a squeezing operation in the horizontal and vertical direction by application of
the operators SH and SV , where</p>
        <p>SH =
B(x; y) dy) and SV =</p>
        <p>B(x; y) dx);
and discarding the segments for which SV or SV are below a prede ned threshold.</p>
        <p>The result of the squeezing operations is a shape primitive which characterises the basic shape
family of each node type (see Figure 3). This shape simpli cation step is particularly suitable for
the present de nition of shape categories. In the present task, the class oval does not only
encompass elliptical outlines (ovals in the simplest sense) but also half-circles joined by two horizontal
lines and even rounded rectangles (see Figure 3). Similarly, the class diamond includes, in addition
to the standard four-sided parallelograms, hexagons and octagons (each having two parallel sides
of the same size and aligned with the horizontal or the vertical). Reducing each of these shapes
to a single primitive (an elliptical shape in the case of ovals, a quadrilateral shape for diamonds)
enables each family to be characterised.</p>
        <p>Multiple shape features were extracted from the box nodes: a rst set of features, f0, was
derived from the original box node, while a second set, fR, was derived from the horizontally and/or
vertically reduced shape. Features in the f0 set include area (A0), ellipticity (E0), solidity (S0: the
ratio between the area within the bounding rectangle and the area within the box node perimeter)
and the horizontal and vertical symmetries (SyH0 and SyV0). Reduced shape features include
area (AR), ellipticity (ER), solidity (SR) as well as symmetry measures aimed at characterising
speci c node shapes: diamond, parallelogram and cylinder bodies (see Figure 4).</p>
        <p>DR measures the rectangularity (here estimated by the solidity, as de ned above) of a reduced
shape after decomposing it into four quadrants and reassembling them as in gure 4(a). Diamonds
are thus expected to be formed by complementary shapes and can be characterised by a value
of DR close to 1 (indicative of a high degree of rectangularity of the recomposed shape). In the
same way PR o ers a measure of rectangularity of a recomposed shape obtained by cutting in two
halves a horizontally-squeezed node box and repositioning its two portions (Figure 4(b)). Here
again a high degree of rectangularity is expected from a horizontally-sheared rectangle (i.e. a
parallelogram under the present task de nitions). CB1R and CB2R measure the symmetries and
complementaries expected from typical cylinder body shapes: CB1R measures the ellipticity of a
composite shape obtained by adjoining the body half and the horizontally re ected top half of a
vertically squeezed node box (Figure 4(c)), whereas CB2R measures the rectangularity of the two
recombined halves. For an input node box region B(x; y) the set of features given in Tables 1 and
2 is generated.</p>
        <p>Based on the features de ned in Tables 1 and 2, the box nodes are rst classi ed into the
following set of shape families, as de ned in the CLEF-IP task: circle, oval, parallelogram, rectangle,
diamond. On top of these required categories, we introduced the following additional classes: loop
(to identify closed areas de ned by edges and box outlines) and small (to ag regions having an
area below a xed threshold). The two other node classes that are not de ned by a closed outline
shape (i.e. point node and no box node) are discussed in Sections 3.6 and 3.8.</p>
        <p>A simple hierarchical classi cation scheme is used to discard the small regions and separate
asymmetrical shapes (loops) from the other classes. The rectangles are then identi ed. The
remaining shapes (oval, circle, parallelogram, diamond, cylinder-body) are then classi ed using heuristic
thresholds on feature values. Statistical pattern recognition techniques have not been applied due
to time constraints and the lack of image-level annotations in the training set. Obviously, the
application of such techniques is expected to signi cantly improve classi cation performance.</p>
        <p>Composite shapes (double rectangles and cylinders) are identi ed in a second pass as a
combination of primitive shapes (double rectangles being formed by a collection of three adjoining
rectangles, and cylinders by an oval and a cylinder body).
3.6</p>
      </sec>
      <sec id="sec-3-4">
        <title>Point node detection</title>
        <p>Point nodes, de ned as the junctions where edges meet, were identi ed as follows. Firstly,
horizontal and vertical lines in the connector-only image were detected using a simpli ed version of
the Hough transform by projecting pixel values onto the vertical and horizontal axes and
identifying maxima. Secondly, point node candidates were identi ed as the image coordinates where
horizontal and vertical lines crossed. Thirdly, a box centered on the point node candidates was
used for validation: candidates were accepted if the box contained a branch-point and 3 or more
lines crossed the box perimeter.
3.7</p>
      </sec>
      <sec id="sec-3-5">
        <title>Edge classi cation</title>
        <p>The main challenge in classifying edge types surrounds the classi cation of dashed and wiggly
edges. Dashed edges are not common and were identi ed as those containing multiple
implicitgraphics segments (see Section 3.3). Counterintuitively, wiggly edges ('leading lines') are often
straight and indistinguishable from plain undirected edges in the absence of context. These edges
are classi ed solely according to their relationship to no-box nodes. Since wigglies correspond to
leading lines and reference signs are of type no-box, any edge connecting a box node to a no-box
was a assumed to be a wiggly. Preliminary trials suggested that truly curved wiggly edges could be
detected using features describing the orientation of their endpoints which are neither horizontal
nor vertical, such features could be useful for improving no box classi cation in future.</p>
        <p>Edge directivity was classi ed using a simple set of features. Firstly, the two endpoints of
each edge were identi ed and a xed region around each endpoint was analysed. Next, the
number of erosions required to remove all black pixels from the endpoint region was measured and
a binary classi cation tree with heuristic thresholds was applied. Again, the use of statistical
pattern recognition techniques was not applied due to time constraints and the lack of image-level
annotations.</p>
        <sec id="sec-3-5-1">
          <title>Feature Operator</title>
          <p>A0
E0
S0
SyH0
SyV0</p>
          <p>Area
Ellipticity
Solidity
Horizontal symmetry
Vertical symmetry</p>
        </sec>
        <sec id="sec-3-5-2">
          <title>Feature</title>
        </sec>
        <sec id="sec-3-5-3">
          <title>Operator</title>
          <p>AR
ER
SR
DR
PR
CB1R
CB2R</p>
          <p>Area o SH o SV
Ellipticity o SH o SV
Solidity o SH o SV
Solidity o TDR o SH o SV
Solidity o TP R o SH
Ellipticity o TCB1R o SV
Solidity o TCB2R o SV
The text-graphic segmentation method (Section 3.2) results in a set of connected components
representing characters that must be further grouped into text objects and classi ed into
boxnode labels, edge labels, no-box nodes (reference signs) and metadata. Segmenting text objects
involved the hierarchical, agglomerative clustering of characters according to the proximity of their
centroids. Since characters are generally grouped horizontally, before clustering, vertical distances
were weighted with a positive scaling constant which favoured horizontal associations.</p>
          <p>Any text within the closed region of a box node was classi ed as a node label in a
straightforward manner. To classify edge labels and no-box nodes each text object was dilated and checked
for overlap with known edges. If the overlapping edge connects to two nodes (box nodes or point
nodes) the text object is classi ed as an edge label, whereas if the overlapping edge connects to
only one node the text object is classi ed as a no-box node and the edge is classi ed as a wiggly.</p>
          <p>The fonts used for metadata ( gure titles) tend to be larger than the other types of text objects
and titles usually appear at the image edges. After classi cation of text objects into box-node
labels, edge labels and no-box nodes any remaining text is re-clustered according to centroid
positions. The new text objects were scored proportionally to their total area and their distance
from the image center: the object with the highest score was classi ed as metadata.</p>
          <p>The quality of image digitisation can be very low. This is mainly due to the low quality of
the original scanning and the original paper support:
The quality of patent owchart draftsmanship is variable. Note that rules and guidelines
discussed in Section 2.1 do not exclude hand-drawn owcharts;
The segmentation of textual content is di cult (see Section 3.8);
The text used in owcharts is a particular form of technical language, di erent from the
general domain language addressed by the state of the art OCR engines.</p>
          <p>These di erent issues have been addressed using owchart-speci c text segmentation followed
by state-of-the-art OCR software (ABBYY FineReader, version 10 and Tesseract 3.0) and
customised post-correction techniques based on a Shannon's noisy-channel model approach and
language models.</p>
          <p>
            It has been shown that post-correction of the recognition errors based on specialised models
can provide a signi cant reduction of the word error rate produced by general OCR engines.
The proposed technique is based on a Shannon's noisy-channel model approach, more precisely a
probabilist modeling of the degradation of the original string by the OCR process. This approach
is used by spell-checker systems and auto-correction/auto-completion systems for information
retrieval systems. The model can then be applied to a noisy string for correcting recognition
errors, as sh
            <xref ref-type="bibr" rid="ref10">own in Kolak and Resnik (2002</xref>
            ).
3.9.1
          </p>
          <p>Character Level Model
Following a probabilistic framework, nding the correct string c given an original string o
corresponds to nding,
and, following Bayes' Rule,</p>
          <p>argmaxcP (cjo)
argmaxcP (ojc)P (c)
Following Shannon's noisy-channel model approach, P (c) is the source model estimating which
string is likely to be sent, and P (ojc) is the channel model estimating how the signal is degraded.
In our case, P (ojc) estimates the probability that the string o is recognised by the OCR when the
original string present in the bitmap image is c. For the source and channel model, we introduce
rst an N-Gram model of characters denoted CM with N=6:</p>
          <p>P (c) ' P (cjCM ) =</p>
          <p>Y P (cmjcm 1; :::; cm 5)
m
For the channel/error model, we use a probabilistic framework for edit operations following a
confusion model.</p>
          <p>P (o1::njc1::m) =</p>
          <p>Y Pe(c1::m ! o1::n)
1::m
where Pe corresponds to the probability that the OCR recognised the string o1::n when interpreting
the string c1::m based on the four following edit operations:</p>
          <p>Pmatch(ci ! oj jci 2 c1::m _ oj 2 o1::n) = Pmatch(ci)
Psubstitution(ci ! oj jci 2 c1::m; oj 2 o1::n) = Psubstitution(ci ! oj )
Pinsertion(ci ! oj jci 2 c1::m; oj 2 o1::n) = Pinsertion(ci ! ")</p>
          <p>Pdeletion(ci ! oj jci 2 c1::m; oj 2 o1::n) = Pdeletion(" ! oj )</p>
          <p>Having a probabilistic interpretation of the edit distance allows the probabilistic costs of the
edit operations to be learned from a corpus of errors instead of setting costs manually/experimentally
as is often the case with existing spell checkers based on noisy channel models. The probabilities
Psubstitution(ci ! oj ), Pinsertion(" ! oj ) and Pdeletion(ci ! ") are learned from a corpus of OCR
errors aligned with the expected correction following the Levenshtein distance:</p>
          <p>Pmatch(ci) = 1
Psubstitution(ci ! oj ) =
Pinsertion(" ! oj ) =
Pdeletion(ci ! ") =
count(" ! oj )</p>
          <p>count(c)
count(ci ! ")</p>
          <p>count(ci)
count(ci ! fcj gj6=i) + count(ci ! ")</p>
          <p>count(ci)
count(ci ! oj )</p>
          <p>count(ci)
where c corresponds to any characters of the alphabet of the source.</p>
          <p>A smoothing operation is then applied for estimating the edit operations unseen in the
OCRerror training data. This was done by giving a very small additive probability to all character
substitutions, insertions and deletions (Lidstone's additive smoothing).</p>
          <p>Note that transposition is not used in our channel model because this sort of character error
is relevant for spell-checker applications, but extremely rare in the case of OCR. The original
Levenshtein distance was therefore preferred over the Damerau-Levenshtein distance generally
used for spell-checking.
3.9.2</p>
          <p>Language Model
N-gram language models are used to capture constraints on word sequences. For estimating P(c)
a word trigram language model LM was combined with the character model CM :
P (c) ' P (cjCM )P (cjLM ) = P (hcpijCM ) Y P (cpjcp 1; cp 2)
p
where hcpi denotes the complete concatenated character sequence with boundary characters
corresponding to the p words cp.</p>
          <p>At this stage the main issue is the word segmentation of the corrected word sequence. The
character model handles word splits naturally and merges by inserting, substituting or deleting
character boundaries. Consequently, the application of the character model may result in di erent
hypotheses having di erent numbers of words. The application of a language model on these
di erent hypotheses will be biased by the fact that the probability of a longer word sequence is
always penalised. For solving this problem, empty nal words are introduced so that all hypotheses
have the same word length (set to the maximum among the hypotheses). The transition cost to
empty nal words is computed dynamically as the average of the existing transitions.</p>
          <p>Given a noisy word sequence recognised by the OCR, the nal aim is to nd, for a sequence of
corrections of words:
argmaxcP (ojc)P (c) = argmaxc</p>
          <p>Y Y Pe(c1::m;p ! on::q)
0 p pmax m</p>
          <p>Y P (cm;pjcm 1;p; :::; cm 5;p)
m
P (cpjcp 1; cp 2)
where pmax is the maximum number of words in the segmentation introduced by the
character model and n to q correspond to the character range of the corrupted character sequence
corresponding to the word p in the correction.</p>
          <p>The decoding of the intended sequence of words given the observed sequence of words recognised
by the OCR is implemented using the Viterbi approximation.
3.9.3</p>
          <p>Training
For the present experiments a set of approximatively 2000 corrections of error produced by ABBYY
FineReader was used. A correction is simply the alignment of the erroneous strings and the
corrected ones which allow the probabilistic costs of the edit operations introduced in the previous
section to be learned. Since the corpus of errors was only produced for ABBYY FineReader, the
resulting model has limited applicability to Tesseract.</p>
          <p>For training the character and language model, a corpus of 10.000 patent descriptions and
2.000 scienti c articles was used. In both cases, the resources were a mixture of English, German
and French texts.
4
4.1</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results and discussion</title>
      <sec id="sec-4-1">
        <title>Graph structure recognition</title>
        <p>At the time of writing no o cial performance results are available for the algorithm test run
submitted. Instead, the performance of the algorithm has been measured by manually comparing
algorithm output with known ground-truth values. This method of evaluation, although
timeconsuming, allows a detailed analysis of the modes of failure of the algorithm. For each class of
objects in the owchart, P r, the precision and, Re, the recall value of the classi cation scores over
the whole test set were determined as</p>
        <p>P r =</p>
        <p>true positive
true positive + f alse positive
and Re =</p>
        <p>true positive
true positive + f alse negative
:</p>
        <sec id="sec-4-1-1">
          <title>In addition, the balanced F-score, F , is calculated as F = 2:</title>
          <p>0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
(40)
(51)
(13)
(57)
(50)
(56)</p>
          <p>(60)
(59)
(4)
(90)
(55)
(2)
(49)</p>
          <p>(89)
(23)
(91)
(3)
(18)
(29)
(17)
(44)</p>
          <p>(45)
F</p>
          <p>N
0</p>
          <p>Table 3 summarises the results of the evaluation for the various node and edge types. Note
the following.</p>
          <p>The relatively high values of P r and Re for box nodes is achieved despite the use of a
simplistic classi cation approach. This indicates that the box features de ned in Section 3.5
are discriminative.</p>
          <p>The relatively low values of P r for directed edges and Re for undirected edges are probably
caused by an overly simplistic approach to extracting directivity features and classifying
them. A large fraction of directed edges were misclassi ed as undirected edges as shown by
the relatively high values of P r and Re when directivity is ignored.</p>
          <p>Since the identi cation of wiggly edges and no-box nodes occurs in conjunction, the
performance scores for these two entities are almost identical: P rwiggly=0.96 and Rewiggly=0.76.
A recall score of 83% was measured for identifying the position of the title (prior to OCR).
The relatively low value of P rpoint is partly explained by text touching characters which
causes false positives.</p>
          <p>In order to examine the patterns of algorithm failure, a single combined F-score, FN , is
calculated for all nodes of each owchart (excluding no-box nodes since they are labels that do not
de ne the owchart logic). Similarly, a single combined F-score, FE , is calculated for all edges
of each owchart (ignoring directivity). F denotes the average of the node or edge F-scores for a
given owchart.</p>
          <p>The results of the combined F-score evaluation over the whole test set are summarised in
Figure 5. Out of a total sample of 100 analysed owcharts, 34 show a F-score of FN =FE =1.00.
50% of the diagrams have an average F-score superior to 0.90, while 75% show an average score of
0.78 or better. Only 10% of the test sample have an average F-score below 0.50. For the purpose
of identifying the most signi cant sources of errors, the owcharts associated with F &lt;0.8 were
we identi ed. Each of these owcharts was assigned to one of three categories according to the
main mode of failure: (i) initial text-graphics segmentation error, (ii) presence of text characters
touching the owchart graphics and (iii) discontinuities in the owchart graphic structure (mainly
due to broken lines).</p>
          <p>Looking in turn at each of these classes, it was observed that poor segmentation of the main
owchart component at the early stage of processing was the main cause of very low F-scores. Six
owcharts have F &lt;0.5: EP00000821886 (13), EP00000926609 (23), EP000010469970029 (40),
EP000010702880135 (45), EP0000110761 (50) and EP00001113655 (51). The mode of failure for
this group of patents is associated with the text-graphic segmentation module (Section 3.2), in
particular the assumption that the graphic component of the owchart can be isolated from text
by selecting the connected components with the largest heights and widths. A failure at this stage
prohibits further node classi cation and edge identi cation.</p>
          <p>
            Characters touching graphics (mostly text intersecting node boundaries) is also an important
source of errors. Six owcharts fall into category: EP000006215470107 (2), EP000006479080138
(3), EP000006879010063 (4), EP00001063844 (44), EP00001217549 (57) and EP00001526500 (91).
Although this type of error is mostly due to a combination of design choices and scanning noise, it
can also stem from the preprocessing steps designed to remove small gaps in the image components
(Sections 3.1 and 3.3). This re ects the di culty in bridging line breaks without creating unwanted
pixel junctions between otherwise disconnected entities. It is noted that the problem of separating
touching text from graphics remains a challenging issue for general OCR processin
            <xref ref-type="bibr" rid="ref14">g engines (Nagy
(2000</xref>
            )). It has been studied in the document image analysis literature in relation to the analysis
of maps
            <xref ref-type="bibr" rid="ref13 ref4">(Fletcher and Kasturi, 1988; Luo et al., 1995)</xref>
            ), form processing (Yoo et al. (1997)) and
technical drawing understanding
            <xref ref-type="bibr" rid="ref9">(Kasturi et al., 1990)</xref>
            . An improved text-graphics segmentation
method would bene t from the inclusion of an additional stage of detection and separation of
touching characters.
          </p>
          <p>The failure to correctly bridge broken lines (Section 3.3) is the leading source of error for values
of F &gt;0.5. A total of 10 owcharts in the range 0.3&lt; F &lt;0.75 show signi cant errors that can
be attributed to discontinuous graphical components. Line breaks can a ect the connecting lines
between nodes creating false positives in the edge components, as is the case for: EP00000866609
(17), EP000000884919 (18), EP00000098456 (29) and EP000001513121 (89)). Line breaks can
also prevent a proper identi cation of the nodes as for: EP000001104172 (49), EP000001197682
(55), EP000001201195 (56), EP000001223519 (59), EP0000012274360 (60) and EP000001521176
(90). Improving gap bridging schemes would improve performance.
4.2</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Recognition of text objects</title>
        <p>In order to evaluate the performance of the text object recognition module (Section 3.9), text
labels were transcribed manually for approximately 20% of the test set (412 out of a total of 2023
text boxes). The sentence and word scores given below are based on this sample.</p>
        <p>The following scores are presented in Table 4.</p>
        <p>A coverage score, which is the percentage of cases where the OCR returns a result.
An accuracy score which is the percentage of correct results produced by the OCR given the
total expected results. When the OCR does not produce a result, it is considered that the
result is incorrect. An accuracy evaluation is given at both word level and sentence level,
i.e. if the complete text associated with a graphical object is correctly recognised.
A precision score which is the percentage of correct results produced by the OCR given the
total result produced by the OCR, similarly at word and sentence level.</p>
        <sec id="sec-4-2-1">
          <title>Technique</title>
          <p>Coverage (sentence)
Coverage (word)
Accuracy (sentence)
Accuracy (word)
Precision (sentence)
Precision (word)</p>
          <p>ABBYY</p>
          <p>As a comparison, the highest accuracy of current state-of-the-art OCR engines is considered to
be around 98-99% correctly recognised words in optimal conditions. It is clear from the coverage
evaluation that Tesseract lters out a large amount of results that it judges incorrect. Although the
Tesseract con dence threshold a ects its coverage, no exploration of this parameter was performed
for lack of time. It can be observed that, although producing fewer results, the precision of
Tesseract out-of-the-box remains signi cantly below that of ABBYY FineReader.</p>
          <p>
            As the post-correction technique has been trained only on ABBYY FineReader errors, it is
less e ective at improving Tesseract results. The accuracy of ABBYY FineReader is improved
by 9.4%, meaning a reduction of the word error rate of 26.2%. This reduction of the error rate
is lower than the 60% reducti
            <xref ref-type="bibr" rid="ref11">on reported by Kolak et al. (2003</xref>
            ) using a similar approach (but
correcting an OCR engine with a signi cantly lower baseline performance). These results should
be considered carefully because, due to time constraints, only a partial specialisation of the models
was applied to the owchart text problem:
          </p>
          <p>The texts considered for training the probabilistic costs of the edit operations, the
character and language models were patent descriptions and scienti c articles without text from
owcharts. Ideally, the text present on the owchart of the training data (50 owcharts)
should have been combined with the other training resources.</p>
          <p>The errors used for training the costs of the edit operations were generated for ABBYY
FineReader only. A fairer comparison would use Tesseract-generated errors for correcting
Tesseract output.</p>
          <p>The ideal source for training the language model is the patent description of the patent
where the owchart gure appears. As the patent publication number was given with the
owchart images, the usage of the description text would have been possible, for instance by
retrieving the ST.36 document via the 'Open Patent Service' of the EPO.</p>
          <p>These issues suggest clear directions for future improvement. It should be possible to improve
the post-correction model signi cantly by improving the way specialised training data is selected
and exploited.</p>
          <p>URL
CLEF-IP</p>
          <p>URL</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>K.</given-names>
            <surname>Abe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Azumatani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mukouda</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Suzuky</surname>
          </string-name>
          .
          <article-title>Discrimination of symbols, lines and characters in owchart recognition</article-title>
          .
          <source>In 8th International Conference on Pattern Recognition</source>
          , pages pages
          <fpage>1071</fpage>
          {
          <fpage>1074</fpage>
          ,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>J.</given-names>
            <surname>Codina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Pianta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vrochidis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Papadopoulos</surname>
          </string-name>
          .
          <article-title>Integration of semantic, metadata and image search engines with a text search engine for patent retrieval</article-title>
          .
          <year>2009</year>
          . URL http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.143.98.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>EPC. European Patent Convention. 14th edition</source>
          ,
          <year>2000</year>
          . URL http://www.epo.org/law-practice/legal-texts/epc.htm14l.
          <article-title>Note: the wording of Rule 46 EPC is almost identical to the wording used in PCT Rules 6.2 (b</article-title>
          ),
          <source>11.11 and 11.13 and US Manual of Patent Examining Procedure, Section</source>
          <volume>1</volume>
          .
          <fpage>84</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>L.A.</given-names>
            <surname>Fletcher</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Kasturi</surname>
          </string-name>
          .
          <article-title>A robust algorithm for text string separation from mixed text/graphics images</article-title>
          .
          <source>Pattern Analysis and Machine Intelligence</source>
          , IEEE Transactions on,
          <volume>10</volume>
          (
          <issue>6</issue>
          ):
          <volume>910</volume>
          {918, nov
          <year>1988</year>
          . ISSN 0162-
          <fpage>8828</fpage>
          . doi:
          <volume>10</volume>
          .1109/34.9112.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>R.P.</given-names>
            <surname>Futrelle</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Nikolakis</surname>
          </string-name>
          .
          <article-title>E cient analysis of complex diagrams using constraint-based parsing</article-title>
          .
          <source>In Document Analysis and Recognition</source>
          ,
          <year>1995</year>
          ., Proceedings of the Third International Conference on, volume
          <volume>2</volume>
          , pages
          <fpage>782</fpage>
          <lpage>{</lpage>
          790 vol.
          <volume>2</volume>
          , aug
          <year>1995</year>
          . doi:
          <volume>10</volume>
          .1109/ICDAR.
          <year>1995</year>
          .
          <volume>602019</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          GL.
          <article-title>Guidelines for Examination in the European Patent O ce</article-title>
          ,
          <year>2012</year>
          . http://www.epo.org/law-practice/legal-texts/guidelines.html.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bhatti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lupu</surname>
          </string-name>
          , and
          <string-name>
            <surname>R.</surname>
          </string-name>
          <article-title>Morzinger. Patent image retrieval: a survey</article-title>
          .
          <source>In Proceedings of the 4th workshop on Patent information retrieval</source>
          ,
          <source>PaIR '11</source>
          , pages
          <issue>3</issue>
          {
          <fpage>8</fpage>
          , New York, NY, USA,
          <year>2011</year>
          . ACM.
          <source>ISBN 978-1-4503-0955-4</source>
          . doi:
          <volume>10</volume>
          .1145/2064975.2064979. URL http://doi.acm.
          <source>org/10</source>
          .1145/2064975.2064979.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Ito</surname>
          </string-name>
          .
          <article-title>Automatic input of ow chart in document images</article-title>
          .
          <source>In 6th international conference on Software engineering</source>
          , pages
          <volume>319</volume>
          {
          <fpage>328</fpage>
          ,
          <year>1982</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>R.</given-names>
            <surname>Kasturi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.T.</given-names>
            <surname>Bow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>El-Masri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.R.</given-names>
            <surname>Gattiker</surname>
          </string-name>
          , and
          <string-name>
            <given-names>U.B.</given-names>
            <surname>Mokate</surname>
          </string-name>
          .
          <article-title>A system for interpretation of line drawings</article-title>
          .
          <source>Pattern Analysis and Machine Intelligence</source>
          , IEEE Transactions on,
          <volume>12</volume>
          (
          <issue>10</issue>
          ):
          <volume>978</volume>
          {992, oct
          <year>1990</year>
          .
          <article-title>ISSN 0162-8828</article-title>
          . doi:
          <volume>10</volume>
          .1109/34.58870.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>O.</given-names>
            <surname>Kolak</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Resnik</surname>
          </string-name>
          .
          <article-title>OCR error correction using a noisy channel model</article-title>
          .
          <source>In Proceedings of the Human Language Technology Conference (HLT)</source>
          , San Diego, California, USA,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>O.</given-names>
            <surname>Kolak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Byrne</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Resnik</surname>
          </string-name>
          .
          <article-title>A generative probabilistic OCR model for NLP applications</article-title>
          .
          <source>In Proceedings of the Human Language Technology Conference (HLT-NAACL)</source>
          , Edmonton, Canada,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Lemaitre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mouchere</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Camillerapp</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Coasnon</surname>
          </string-name>
          .
          <article-title>Interest of syntactic knowledge for on-line owchart recognition</article-title>
          .
          <source>In 9th IAPR International Workshop on Graphics RECognition (GREC</source>
          <year>2011</year>
          ), pages
          <fpage>85</fpage>
          {
          <fpage>88</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>H.</given-names>
            <surname>Luo</surname>
          </string-name>
          , G. Agam,
          <string-name>
            <surname>and I. Dinstein.</surname>
          </string-name>
          <article-title>Directional mathematical morphology approach for line thinning and extraction of character strings from maps and line drawings</article-title>
          .
          <source>In Document Analysis and Recognition</source>
          ,
          <year>1995</year>
          ., Proceedings of the Third International Conference on, volume
          <volume>1</volume>
          , pages
          <fpage>257</fpage>
          <lpage>{</lpage>
          260 vol.
          <volume>1</volume>
          , aug
          <year>1995</year>
          . doi:
          <volume>10</volume>
          .1109/ICDAR.
          <year>1995</year>
          .
          <volume>598989</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>G.</given-names>
            <surname>Nagy</surname>
          </string-name>
          .
          <article-title>Twenty years of document image analysis in pami</article-title>
          .
          <source>Pattern Analysis and Machine Intelligence</source>
          , IEEE Transactions on,
          <volume>22</volume>
          (
          <issue>1</issue>
          ):
          <volume>38</volume>
          {62, jan
          <year>2000</year>
          . ISSN 0162-
          <fpage>8828</fpage>
          . doi:
          <volume>10</volume>
          .1109/34.824820.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>PCT. Patent Cooperation Treaty</surname>
          </string-name>
          .
          <year>1970</year>
          . Done at Washington on June 19,
          <year>1970</year>
          , amended on September 28,
          <year>1979</year>
          , modi ed
          <source>on February 3</source>
          ,
          <year>1984</year>
          , and
          <issue>on October 3</issue>
          ,
          <year>2001</year>
          ,
          <article-title>(as in force from April 1,</article-title>
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          2011:
          <article-title>Retrieval in the Intellectual Property Domain</article-title>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          http://clef2011.org/resources/proceedings/Overview-CLEF-
          <article-title>IP Clef2011</article-title>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>W.</given-names>
            <surname>Szwoch</surname>
          </string-name>
          .
          <article-title>Recognition, understanding and aestheticization of freehand drawing owcharts</article-title>
          .
          <source>In Document Analysis and Recognition</source>
          ,
          <year>2007</year>
          .
          <article-title>ICDAR 2007</article-title>
          . Ninth International Conference on, volume
          <volume>2</volume>
          , pages
          <fpage>1138</fpage>
          {1142, sept.
          <year>2007</year>
          . doi:
          <volume>10</volume>
          .1109/ICDAR.
          <year>2007</year>
          .
          <volume>4377093</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>B.G.</given-names>
            <surname>Vasudevan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dhanapanichkul</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Balakrishnan</surname>
          </string-name>
          .
          <article-title>Flowchart knowledge extraction on image processing</article-title>
          .
          <source>In Neural Networks</source>
          ,
          <year>2008</year>
          .
          <source>IJCNN</source>
          <year>2008</year>
          .
          <article-title>(IEEE World Congress on Computational Intelligence)</article-title>
          . IEEE International Joint Conference on, pages
          <volume>4075</volume>
          {4082, june 2008. doi:
          <volume>10</volume>
          .1109/IJCNN.
          <year>2008</year>
          .
          <volume>4634384</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Stefanos</given-names>
            <surname>Vrochidis</surname>
          </string-name>
          , Symeon Papadopoulos, Anastasia Moumtzidou, Panagiotis Sidiropoulos, Emanuelle Pianta, and
          <string-name>
            <given-names>Ioannis</given-names>
            <surname>Kompatsiaris</surname>
          </string-name>
          .
          <article-title>Towards content-based patent image retrieval: A framework perspective</article-title>
          .
          <source>World Patent Information</source>
          ,
          <volume>32</volume>
          (
          <issue>2</issue>
          ):
          <volume>94</volume>
          {
          <fpage>106</fpage>
          ,
          <year>2010</year>
          . ISSN 0172-
          <fpage>2190</fpage>
          . doi:
          <volume>10</volume>
          .1016/j.wpi.
          <year>2009</year>
          .
          <volume>05</volume>
          .010. URL http://www.sciencedirect.com/science/article/pii/S0172219009000489.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>J-Y. Yoo</surname>
          </string-name>
          ,
          <string-name>
            <surname>M-Ki. Kim</surname>
            ,
            <given-names>S. Y.</given-names>
          </string-name>
          <string-name>
            <surname>Ban</surname>
          </string-name>
          , and
          <string-name>
            <surname>Y-B. Kwon</surname>
          </string-name>
          .
          <article-title>Line removal and restoration of handwritten characters on the form documents</article-title>
          .
          <source>In Document Analysis and Recognition</source>
          ,
          <year>1997</year>
          .,
          <source>Proceedings of the Fourth International Conference on</source>
          , volume
          <volume>1</volume>
          , pages
          <fpage>128</fpage>
          <lpage>{</lpage>
          131 vol.
          <volume>1</volume>
          , aug
          <year>1997</year>
          . doi:
          <volume>10</volume>
          .1109/ICDAR.
          <year>1997</year>
          .
          <volume>619827</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>