<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Abstract Argumentation-based Strategy for Reading Order Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Stefano Ferilli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Pazienza</string-name>
          <email>andrea.pazienzag@uniba.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centro Interdipartimentale per la Logica e sue Applicazioni</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dipartimento di Informatica</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universita di Bari</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Document Image Analysis is the branch of Document Processing in charge of extracting high-level information from the purely pictorial appearance of the document. In between single-column documents, in which no reading order ambiguity is present, and fully partitioned documents, in which each layout component is self-contained and hence can be read independently of all the others, there are tricky cases in which (some of) the layout components in a document page are related to each other, in that they contain di erent portions of a single discourse (e.g., newspapers). In these cases, automatic procedures for the acquisition of the document content might be ine ective, inapplicable, or lead to inconsistent results. This paper proposes an automatic strategy for identifying the correct reading order of a document page's components based on abstract argumentation. The technique is unsupervised, and works on any kind of document based only on general assumptions about how humans behave when reading documents. Experimental results show that it is very e ective, also compared to previous solutions that have been proposed in the literature.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>While today most documents are generated, stored and exchanged in a
digital format, the typing convention of classical paper documents are still in use.
Moreover, large amounts of legacy documents are digitized and stored in various
kinds of repositories. To be automatically handled and managed, the content of
these documents needs to be extracted and properly organized, and the huge
amount of material makes this activity manually unfeasible. Hence, the need
for automatic pre-processing techniques that carry out this task. The overall
process a digital document typically includes three phases: Layout Analysis,
Document Image Understanding and Document Understanding. Layout
Analysis consists in the perceptual organization process that aims at identifying the
single blocks of a document and at detecting geometrical relations among them
(Layout Structure); then, extracting semantic information from this structure is
the task of Document Image Understanding, that yields the Document Logical
Structure. Finally, Document Understanding is in charge of extracting useful
information from the document content. In particular, Document Image Analysis
(DIA) [15], that encompasses the rst two phases, is the branch of Automatic
Document Processing that aims at extracting high-level information from the
low-level representation of a document.</p>
      <p>The textual content of the single logical components of a document can be
read after Document Image Understanding is carried out, and provided to
Document Understanding. While such an extraction often does not pose any problem,
because the aim is just displaying each component separately or because most
documents involve a linear ow of text, in other cases the document layout is
quite complex, requiring suitable strategies to determine the correct reading
order of these components. Since the various articles are composed in the page
in several unpredictable combinations, and have di erent size and number of
components, a simple top-down, left-to-right reading order of the pages would
be ine ective, returning a text ow that interleaves components from di erent
unrelated articles. Hence, subsequent Document Understanding steps would be
inapplicable, or would return nonsense results. So, Reading Order Detection in
a document is a hot problem and new approaches are needed to tackle di cult
cases, in order to provide general a exible solutions to this problem.</p>
      <p>This paper proposes the use of an abstract argumentation framework to
solve this problem. Speci cally, the proposed approach uses a representation
of the problem that is totally general and applicable to any kind of document.
Advantages of this solution include the fact that it does not need any (supervised
or unsupervised) learning, and hence is directly applicable to any document page
and complies with an incremental extension of the document base. The technique
has been implemented and embedded in the DIA step of DoMInUS, a system
for document processing and management that provides the Layout Analysis
pre-processing techniques required to obtain the representation to be fed to the
argumentation reasoner. The next section discusses related work on this topic.
Then, Section 3 recalls the architecture and the main components of DoMInUS
that carry out the Layout Analysis task and Section 4 summarizes the main
concepts of a formal argumentation framework. Section 5 describes the solution
and experimental results showing its e ectiveness. Finally, Section 6 concludes
the paper and outlines future work directions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>The reading order detection problem consists in nding a sequence of the objects
in a document's page that re ects the human reading order. Multiple approaches
to this problem have been proposed in the literature. Some are based on layout
information only, others exploit the content of objects, with or without using a
priori knowledge about the particular document class.</p>
      <p>Many works are based on the well-known XY-cuts segmentation algorithm [16]
(see next section for details). Ishitani [13] proposes to exploit the hierarchy of
blocks generated by the XY-cuts algorithm (called the `XY-tree') to induce the
reading order among the nal blocks. However, exploiting XY-cuts to determine
reading order often causes problems. Indeed, XY-cuts cannot deal with blocks
organized in L-like shapes, and it needs to know the minimum required width
of the horizontal/vertical stripes for the cutting strategy. Overall, XY-cuts
approaches are simple and require only visual information. However, due to their
cut strategies, they perform reasonably well only on documents with simple
layouts (e.g., Manhattan layouts). This is a serious limitation of these approaches.</p>
      <p>Knowledge-based approaches use rules to identify a reading order in the
pages. Typically the rules provided to the system encode general criteria
concerning reading order, and are manually written by experts. In [5] geometric
and linguistic information is used to determine a reading order. The key idea
is to compare all possible pairs of text lines, introducing whenever appropriate
suitable constraints on their reading order, derived from their geometric
arrangement or of their linguistic content. In [18] tab-stops detection is used to deduce
the columns of the page, then this information is exploited to detect blocks and
determine the reading order. Three types of blocks are used to formulate ve
ordering rules that determine the reading order of the page.</p>
      <p>Di erently from previous approaches, Aiello et al. [3] use Natural Language
Processing (NLP) techniques to improve reading order detection. For each page,
rectangular components (document-object s) are extracted (for non-rectangular
shapes the bounding box is considered) and described with geometrical and
content-based features. To detect the reading order, they de ne a partial order
relation, called BeforeInReading, on pairs of document-objects.</p>
      <p>A more sophisticated approach that uses only visual information to reading
order is provided in [14]. They formulate the reading order problem as a learning
problem where the goal is to nd a First-Order Logic (FOL) theory that is
complete and consistent with respect to all training examples. Initially, they
use a knowledge-based document image processing system (WISDOM++) to
extract the logical structure of the page and identify the membership class of
the document. Then, they use an Inductive Logic Programming system (ATRE)
to learn concepts describing the reading order chains of a single document page.</p>
      <p>It is worth noting that a common assumption, made by many approaches,
concerns the uniqueness of reading order for each page. Indeed, this assumption
is clearly wrong for the pages of newspapers or magazines, where many mutually
independent articles are present in each page and can be read in any order. To
the best of our knowledge, only [14] and [12] explicitly address the problem with
the identi cation of multiple reading chains.
3</p>
    </sec>
    <sec id="sec-3">
      <title>DoMInUS</title>
      <p>Extending previous research presented in [9] DoMInUS (DOcument Management
INtelligent Universal System) [8, 10] is a document processing and management
system characterized by the intensive exploitation of intelligent techniques in
each step of document processing from acquisition to indexing, from
categorization to storing and retrieval. Since it is general and exible, it can be embedded
as a document management engine into many di erent Digital Library systems.
Based on the ODA/ODIF standard, any document can be progressively
partitioned into a hierarchy of abstract representations, called its layout structure.
Here we describe an approach implemented for discovering a full layout hierarchy
in digital documents based primarily on layout information. DoMInUS embeds
several techniques that allow it to extract the high-level geometrical structure of
a document, both for born-digital documents and for digitized documents, both
for Manhattan and for non-Manhattan cases:
{ XY-cuts, used for digitized documents, but modi able to work on
borndigital documents as well, useful for Manhattan layout;
{ Run-Length Smoothing Algorithm (RLSA), used for digitized documents
only, useful for Manhattan layout;
{ Run-Length Smoothing with OR (RLSO) [11], in its two versions for digitized
and born-digital documents, useful for non-Manhattan layout;
{ Background Structure Analysis, used for born-digital documents only, useful
for Manhattan layout, but adaptable for non-Manhattan layout.
For digitized documents, the input to such techniques is the raster image for
each page after pre-processing aimed at noise removal, dewarping and
deskewing. For born-digital documents, in PostScript (PS) [1] or PDF [2] formats, the
input to such techniques is a vectorial description of each document page in
terms of blocks, each of which may be (a fragment of) a text line, a graphical
(horizontal/vertical) line, a closed area lled with one color or a raster image.
In both cases, the output is a set of frames, de ned as collections of basic blocks
or pixels completely surrounded by white space that should correspond to
logical components that may be associated to a well-de ned role in the document.
The horizontal/vertical size and position in the page, and type of content, of
the frames is reported, and used to infer various kinds of higher-level spatial
relationships among frames (expressing horizontal/vertical adjacency and
horizontal/vertical alignment) [17, 7]. Figure 1 shows a digital document and the
corresponding output of the proposed algorithm, with the discovered frames
highlighted.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Formal Argumentation Basics</title>
      <p>Everyday, people must make decisions about con icting information. For
example, when a judge has to decide whether a person is guilty or not, he must base his
decision on contradictory information. Indeed, the claims of the involved persons
are typically contradictory. Making decisions in these situations is not trivial.
The usual strategy consists in proving the inconsistency of some assertions and
then nding a consistent way to link the remaining information. This is the task
of argumentation, an inferential strategy that provides a general approach to
model defeasible and non-monotonic reasoning. Over the last years, it has
become an in uential sub eld of Arti cial Intelligence, with applications ranging
from legal reasoning, to dialogues and persuasion, to medicine, to eGovernment.</p>
      <p>A foundational work for the abstract argumentation theory was proposed by
Dung [6]. According to his setting, an abstract argument system or
Argumentation Framework (AF for short) is a pair hA; Ri consisting of a set A, whose
elements are called arguments, and a binary relation R A A, called attack
relation. Given two arguments ; 2 A, the relation R represents an attack
from against . In general, arguments and are in con ict if argument
refutes argument or if is attacking premises supporting . This setting
can be represented as a directed graph, with each node representing an
argument and each edge representing an attack. Given such a graph, the objective is
determining which subset(s) of its nodes can be justi ed.</p>
      <p>A basic requirement in this setting is the concept of admissibility. A set S A
of arguments is admissible if it is con ict-free (i.e., @ ; 2 S s.t. R ), and
acceptable (i.e., 8 2 A : R ) 9 2 S s.t. R ). For example, if fA; B; C; D; Eg
is included in a set of arguments and the attacks are de ned as shown in Figure 2,
then the subset S = fB; Dg (double circles in the gure) is admissible because
it defends all its arguments (i.e., it is acceptable) and there not exists any attack
between its arguments (i.e., it is con ict-free). As observed in [4], in an AF one
is usually interested in the justi cation state of the arguments. The justi cation
state in an AF can be determined according to suitable semantics, that specify
how to derive from an AF a set of extensions that intuitively represents a set of
arguments which are collectively justi ed.</p>
      <p>De nition 1 (semantics). Let AF = hA; Ri be an Argumentation Framework,
S A be a con ict-free set of arguments and F : 2A ! 2A be de ned as
F (S) = fa j a is defended by Sg. Then,
{ S is a Complete Extension i S = F (S);
{ S is a Grounded Extension i S is the minimal Complete Extension (w.r.t.</p>
      <p>set inclusion);
{ S is a Preferred Extension i S is the maximal Complete Extension (w.r.t.</p>
      <p>set inclusion);
{ S is a Stable Extension i S is a Complete Extension that attacks every
argument in A n S;
Each semantics involves a di erent degrees of skepticism that can be used to
evaluate the arguments. So, a partial order relation can be established among
semantics. Among the semantics in De nition 1, the most skeptical is the Grounded
Extension, while the most credulous the Preferred Extension. Preferred and
Ground Extensions are also Complete. Considering again the example in
Figure 2, we have that f;; fAg; fB; Dgg are Complete Extensions, ffAg; fB; Dgg
are Preferred Extensions, fB; Dg is a Stable Extension, and ; is the only Ground
Extension.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Argumentation-based Reading Order Detection</title>
      <p>Summing up, we aim at designing a procedure for reading order detection in
document pages that ful lls several requirements ensuring maximum generality
and widest applicability. First of all, the procedure must not take into account
the textual content of the document blocks, but just their layout organization
(hence, the exploitation of NLP techniques is not required). Indeed, using the
textual content would require the exploitation of NLP techniques, and it is
wellknown that these technique are language-dependent and are not available for
all languages, or at least not with the same quality. Conversely, our proposed
approach can be applicable to any kind of document because it relies basically
on the position of components resulting from the layout structure of the page(s).</p>
      <p>
        This is consistent with the human approach, in which the reading order
is determined without actually reading the document3. Also, our technique is
speci cally interested in non-Manhattan layouts, where techniques based on a
partition of the page according to its background are not applicable. For instance,
newspapers are a good representative of this kind of documents. Third, we want
the technique to be based on very simple layout information, that can be easily
and reliably extracted from an automatic procedure and is independent of the
speci c kind of document. Having such a low-level input clearly places most
burden on the reading order detection technique. While this complexity is often
tackled using knowledge-based approaches, we want to avoid this setting, because
3 It may be necessary to leverage the textual content in order to solve inconsistencies
or ambiguities in the text ow across subsequent components in the determined
reading order. E.g., in Figure 3 one would expect the horizontally-oriented reading
order (
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">0,1,2,3,4,5</xref>
        ), while the correct one is vertically-oriented (
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">0,3,1,4,2,5</xref>
        ).
hand-written rules are costly, error-prone and typically depend on the kind of
document. We want to avoid also the Machine Learning-based solution to this
problem, by which such rules are automatically obtained by the system starting
from examples of manually labeled documents. Indeed, a manual intervention
is anyway required, and the learned knowledge is in any case dependent on the
kind of documents, and on the speci c documents, used for training the system.
So, it is valid only up to proof of the contrary. Finally, it is often the case that
the reading order in a document page does not determine a total order, but the
components can be partitioned into independent subgroups, each characterized
by its own reading order, leaving the reader free to decide an order among groups.
      </p>
      <p>Among the various inferential strategies available in Arti cial Intelligence,
Abstract Argumentation seemed to provide the proper tools to deal with all of
the above requirements. Indeed, one might just express possible (even trivial)
partial reading orderings and identify pairs of these partial solutions that are
mutually inconsistent. Given a document page image, we run DoMInUS to
obtain the layout structure, consisting of layout blocks labeled with their type of
content. For the reading order detection purposes, image blocks are simply
ignored. Moreover, horizontal or vertical lines are considered as natural separators
when their projection spans more than one content (i.e., image or text) block.
The presence of these separators allows to partition the page into independent
portions in which the reading order can be determined separately, which slightly
simpli es the problem. Frames in a page can be considered as made up of 4
di erent lines to which this perspective can be applied. Given the text blocks
in the (portion of) page under processing, we consider only the following very
basic and document-independent reading rules for providing the input to our
technique:
{ horizontally or vertically adjacent components are candidates to be read
consequently;
{ a component at the bottom of the (portion of) page might be followed by a
component at the top of an adjacent column, and
{ a rightmost (resp., leftmost) component might be followed by a leftmost
(resp., rightmost) component in an adjacent row.</p>
      <p>As required, these rules are so trivial and general that a computer may easily
identify, given the document layout, which pairs of components ful ll them. Each
pair of blocks (A; B) in the considered (portion of) document that ful lls any of
these requirements is translated into an argument representing the claim
\components A and B are to be read one after the other in a document". Formally,
we express this in FOL using predicate next/2, as next(A,B). Note that
this predicate does not imply any direction in the relationship between A and
B, and hence it applies both to languages in which the reading order proceeds
left-to-right and to those in which it proceeds right-to-left.</p>
      <p>Arguments of this kind are so simple and intuitive that it is also immediate
to automatically infer the possible attacks to be provided to the argumentation
engine. Indeed, given three components A, B and C, one knows that arguments
of the type next(A,B) and next(A,C) mutually attack each other, because
if B is to be read after A, then C cannot be read after A; conversely, if C is to
be read after A, then B cannot be read after A. This can be expressed by the
following attacks in our Argumentation Framework:</p>
      <p>attacks(next(A,B),next(A,C)), attacks(next(A,C),next(A,B)).
The same holds for two arguments of the type next(A,C) and next(B,C):
if A is to be read immediately before C, then B cannot be read immediately
before C as well, and vice-versa:</p>
      <p>attacks(next(A,C),next(B,C)), attacks(next(B,C),next(A,C)).
For instance, the document page in Figure 1 yields the following formal
description, corresponding to the segments in Figure 4a:</p>
      <p>
        (
        <xref ref-type="bibr" rid="ref3">0,3</xref>
        ), (
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ), (
        <xref ref-type="bibr" rid="ref1 ref3">3,1</xref>
        ), (
        <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
        ), (
        <xref ref-type="bibr" rid="ref4 ref6">4,6</xref>
        ), (
        <xref ref-type="bibr" rid="ref4 ref7">4,7</xref>
        ), (
        <xref ref-type="bibr" rid="ref5 ref6">5,6</xref>
        ), (
        <xref ref-type="bibr" rid="ref5 ref9">5,9</xref>
        ), (
        <xref ref-type="bibr" rid="ref6 ref7">6,7</xref>
        ), (
        <xref ref-type="bibr" rid="ref8">8,0</xref>
        ), (
        <xref ref-type="bibr" rid="ref4 ref8">8,4</xref>
        ), (
        <xref ref-type="bibr" rid="ref8 ref9">8,9</xref>
        )
for which the following attacks are automatically derived:
(
        <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
        )-(
        <xref ref-type="bibr" rid="ref4 ref6">4,6</xref>
        ), (
        <xref ref-type="bibr" rid="ref4 ref6">4,6</xref>
        )-(
        <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
        ), (
        <xref ref-type="bibr" rid="ref4 ref6">4,6</xref>
        )-(
        <xref ref-type="bibr" rid="ref4 ref7">4,7</xref>
        ), (
        <xref ref-type="bibr" rid="ref4 ref7">4,7</xref>
        )-(
        <xref ref-type="bibr" rid="ref4 ref6">4,6</xref>
        ), (
        <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
        )-(
        <xref ref-type="bibr" rid="ref4 ref7">4,7</xref>
        ), (
        <xref ref-type="bibr" rid="ref4 ref7">4,7</xref>
        )-(
        <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
        ),
(
        <xref ref-type="bibr" rid="ref5 ref6">5,6</xref>
        )-(
        <xref ref-type="bibr" rid="ref5 ref9">5,9</xref>
        ), (
        <xref ref-type="bibr" rid="ref5 ref9">5,9</xref>
        )-(
        <xref ref-type="bibr" rid="ref5 ref6">5,6</xref>
        ), (
        <xref ref-type="bibr" rid="ref8">8,0</xref>
        )-(
        <xref ref-type="bibr" rid="ref4 ref8">8,4</xref>
        ), (
        <xref ref-type="bibr" rid="ref4 ref8">8,4</xref>
        )-(
        <xref ref-type="bibr" rid="ref8">8,0</xref>
        ), (
        <xref ref-type="bibr" rid="ref8">8,0</xref>
        )-(
        <xref ref-type="bibr" rid="ref8 ref9">8,9</xref>
        ), (
        <xref ref-type="bibr" rid="ref8 ref9">8,9</xref>
        )-(
        <xref ref-type="bibr" rid="ref8">8,0</xref>
        ),
(
        <xref ref-type="bibr" rid="ref4 ref8">8,4</xref>
        )-(
        <xref ref-type="bibr" rid="ref8 ref9">8,9</xref>
        ), (
        <xref ref-type="bibr" rid="ref8 ref9">8,9</xref>
        )-(
        <xref ref-type="bibr" rid="ref4 ref8">8,4</xref>
        ), (
        <xref ref-type="bibr" rid="ref4 ref6">4,6</xref>
        )-(
        <xref ref-type="bibr" rid="ref5 ref6">5,6</xref>
        ), (
        <xref ref-type="bibr" rid="ref5 ref6">5,6</xref>
        )-(
        <xref ref-type="bibr" rid="ref4 ref6">4,6</xref>
        ), (
        <xref ref-type="bibr" rid="ref4 ref7">4,7</xref>
        )-(
        <xref ref-type="bibr" rid="ref6 ref7">6,7</xref>
        ), (
        <xref ref-type="bibr" rid="ref6 ref7">6,7</xref>
        )-(
        <xref ref-type="bibr" rid="ref4 ref7">4,7</xref>
        ),
(
        <xref ref-type="bibr" rid="ref5 ref9">5,9</xref>
        )-(
        <xref ref-type="bibr" rid="ref8 ref9">8,9</xref>
        ), (
        <xref ref-type="bibr" rid="ref8 ref9">8,9</xref>
        )-(
        <xref ref-type="bibr" rid="ref5 ref9">5,9</xref>
        )
The correct reading order is the following (graphically shown in Figure 4b):
(
        <xref ref-type="bibr" rid="ref1 ref3">3,1</xref>
        ), (
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ), (
        <xref ref-type="bibr" rid="ref2 ref4">2,4</xref>
        ), (
        <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
        ), (
        <xref ref-type="bibr" rid="ref5 ref6">5,6</xref>
        ), (
        <xref ref-type="bibr" rid="ref6 ref7">6,7</xref>
        )
      </p>
      <p>
        Our technique returned exactly this reading order, except for relation (
        <xref ref-type="bibr" rid="ref2 ref4">2,4</xref>
        ).
This is due to the presence of a ruling line between the paper heading and its
body. Actually, this may be considered acceptable, since the heading and the
body may indeed be read independently.
      </p>
      <p>The proposed technique was tested on a dataset including 103 document
pages of di erent layout complexity, taken from newspapers, magazines and
(scienti c) papers. Trivial cases of single-column documents were not included in the
dataset, while tricky cases (such as the one in Figure 3) are present. Statistics
about the dataset are reported in the top and middle rows of Table 1, both for
single classes and for the whole dataset (in the last column). Table cells report
average values, with minimum-maximum range in parentheses. The last column
reports both the averages computed on single documents, and the averages over
class averages (in parentheses). The dataset involved on average 20.73 blocks,
22.98 arguments and 63.17 attacks per document. It is evident that, as expected,
the simplest class is `Magazine', while the most complex one is `Newspaper',
both for the number of involved components and for the reading order-related
relationships. Figure 5 shows the structure of the most complex document in
the dataset, a newspaper page involving 106 blocks, 137 arguments and 1020
attacks. The last two rows of Table 1 report the gures of the experimental
results, using the same representation as for the dataset statistics. The justi ed set
of arguments was determined using Preferred Extensions. Interestingly, for this
parameter the `Magazine' and `Paper' classes become very close to each other.
While the maximum number of Preferred Extensions is still higher for papers
than for magazines, on average `Magazine' turns out to have slightly more
Preferred Extensions. However, the gap between these classes and `Newspaper' is
huge (the number of Preferred Extensions in the latter is 3 orders of magnitude
larger than in the former). This is due to the fact that pages with complex
layout arrangement are often partitioned so that reading order is relevant for blocks
within the same element of the partition, but reading order of partition elements
is independent. Newspaper pages in the dataset have a signi cant impact on the
overall complexity of the dataset, as can be noted by looking at the averages in
the last column.</p>
      <p>For each extension, the recall was evaluated as the ratio of correct next/2
items retrieved over next/2 items in the correct order sequence. Since a page
may admit several Preferred Extensions (i.e., alternative correct reading orders),
the recall of a given page was determined as the average recall over all Preferred
Extensions for that page. The proposed technique reached 77:94% average recall
on the entire dataset (79:73% considering the compound average over classes).
Consistently with the class complexity, the worst recall occurred on newspapers,
but still being quite satisfactory (70:74%) for such a di cult class. Interestingly,
the best recall was reached on papers (91:32%), even if they have a more complex
structure than magazines. Conversely, the performance on magazines is very
close to that for newspapers (71:75%), despite its much less complex structure.
Since the competitor systems, and/or the dataset on which they were run, are
not available, we have run our own experiments and compared our performance
with those reported in the other papers. It should be noted that we included in
our dataset very complex cases having non-Manhattan layout. Our performance
is comparable to that of previous systems, which can be considered a success,
since we take a trivial input that does not require high-level interpretation. and
we can deal with very complex non-Manhattan layouts. As regards techniques
that handle many possible reading orders, we are better than [14]. We are worse
than [12], but [12] uses linguistic information, which is not always available,
while we can deal with any kind of document independently of the language in
which it is written.</p>
      <p>We also carried out a qualitative analysis to spot problems and get hints for
improvement. Most problems are due to the fact that sometimes there are fancy
reading orders that deviate from the general `top-down, left-to-right' rule. These
are the cases where linguistic information may help. This is an intrinsic
limitation of our approach, that we accepted when we ruled out any language-based
processing. Other minor problems concerned the separation of the portions of the
pages. On papers, the technique fails when header and footer are not separated
by lines; on magazines, when multi-line titles are across di erent backgrounds;
on newspapers when columns of the same article are separated by lines. These
problems might be tackled by re ning the rules to consider additional layout
information, such as spacing and font size.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>DIA aims at automatically extracting high-level information from the purely
pictorial appearance of the document. A task of DIA is determining the reading
order among text components in a document page. This is a required step to
ensure applicability and e ectiveness of automatic procedures for the acquisition
of the document content. While this may be trivial in single-column documents
or in documents in which each layout component is self-contained, there are
tricky cases in which (some of) the layout components in a document page are
related to each other, in that they contain di erent portions of a single discourse.</p>
      <p>This paper proposed an automatic strategy for identifying the correct
reading order of a document page's components based on abstract argumentation.
The technique is unsupervised, and works on any kind of document based only
on general assumptions about how humans behave when reading documents.
Experimental results show that it is very e ective, also compared to previous
solutions that have been proposed in the literature. Qualitative analysis of the
results suggested possible directions for further improvement of the approach.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was partially funded by the Italian PON 2007-2013 project
PON02 00563 3489339 `Puglia@Service'.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Adobe</given-names>
            <surname>Systems</surname>
          </string-name>
          <article-title>Inc. PostScript language reference manual { 2nd ed</article-title>
          .
          <source>Addison Wesley</source>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Adobe</given-names>
            <surname>Systems</surname>
          </string-name>
          <article-title>Inc</article-title>
          .
          <source>PDF Reference version 1</source>
          .3 { 2nd ed.
          <source>Addison Wesley</source>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Aiello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Monz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Todoran</surname>
          </string-name>
          .
          <article-title>Document understanding for a broad class of documents</article-title>
          .
          <source>IJDAR</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ):1{
          <fpage>16</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Baroni</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Giacomin</surname>
          </string-name>
          .
          <article-title>Semantics of abstract argument systems</article-title>
          .
          <source>In Guillermo Simari and Iyad Rahwan</source>
          , editors,
          <source>Argumentation in Arti cial Intelligence</source>
          , pages
          <fpage>25</fpage>
          {
          <fpage>44</fpage>
          .
          <string-name>
            <surname>Springer</surname>
            <given-names>US</given-names>
          </string-name>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Thomas</surname>
            <given-names>M Breuel.</given-names>
          </string-name>
          <article-title>High performance document layout analysis</article-title>
          .
          <source>In Proc. Symp. Document Image Understanding Technology</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.M.</given-names>
            <surname>Dung</surname>
          </string-name>
          .
          <article-title>On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games</article-title>
          .
          <source>Artif</source>
          . Intell.,
          <volume>77</volume>
          (
          <issue>2</issue>
          ):
          <volume>321</volume>
          {
          <fpage>358</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Egenhofer</surname>
          </string-name>
          .
          <article-title>Reasoning about binary topological relations</article-title>
          .
          <source>In O. Gunther and H</source>
          .-J. Schek, editors,
          <source>Second Symposium on Large Spatial Databases</source>
          , volume
          <volume>525</volume>
          <source>of LNCS</source>
          , pages
          <volume>143</volume>
          {
          <fpage>160</fpage>
          . Springer,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Esposito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ferilli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.M.A.</given-names>
            <surname>Basile</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N. Di</given-names>
            <surname>Mauro</surname>
          </string-name>
          .
          <article-title>Machine learning for digital document processing: From layout analysis to metadata extraction</article-title>
          . In S. Marinai and H. Fujisawa, editors,
          <source>Machine Learning in Document Analysis and Recognition</source>
          , volume
          <volume>90</volume>
          <source>of Studies in Computational Intelligence</source>
          , pages
          <fpage>79</fpage>
          {
          <fpage>112</fpage>
          . Springer,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>F.</given-names>
            <surname>Esposito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Malerba</surname>
          </string-name>
          , G. Semeraro,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ferilli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Altamura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.M.A.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Berardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ceci</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N. Di</given-names>
            <surname>Mauro</surname>
          </string-name>
          .
          <article-title>Machine learning methods for automatically processing historical documents: From paper acquisition to XML transformation</article-title>
          .
          <source>In DIAL 2004</source>
          , pages
          <fpage>328</fpage>
          {
          <fpage>335</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ferilli</surname>
          </string-name>
          .
          <source>Automatic Digital Document Processing and Management - Problems</source>
          , Algorithms and Techniques. Adv. in Pattern Recognition series. Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ferilli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Biba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Esposito</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.M.A.</given-names>
            <surname>Basile</surname>
          </string-name>
          .
          <article-title>A distance-based technique for non-manhattan layout analysis</article-title>
          .
          <source>In ICDAR-2009</source>
          , volume I, pages
          <volume>231</volume>
          {
          <fpage>235</fpage>
          . IEEE Computer Society,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>A graph-based method of newspaper article reconstruction</article-title>
          .
          <source>In ICPR</source>
          , pages
          <volume>1566</volume>
          {
          <fpage>1569</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ishitani</surname>
          </string-name>
          .
          <article-title>Document transformation system from papers to xml data based on pivot xml document method</article-title>
          .
          <source>In Document Analysis and Recognition</source>
          ,
          <year>2003</year>
          . Proceedings. Seventh International Conference on, pages
          <volume>250</volume>
          {255 vol.
          <volume>1</volume>
          ,
          <string-name>
            <surname>Aug</surname>
          </string-name>
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Malerba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ceci</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Berardi</surname>
          </string-name>
          .
          <article-title>Machine learning for reading order detection in document image understanding</article-title>
          . In S.. Marinai and H. Fujisawa, editors,
          <source>Machine Learning in Document Analysis and Recognition</source>
          , volume
          <volume>90</volume>
          <source>of Studies in Computational Intelligence</source>
          , pages
          <fpage>45</fpage>
          {
          <fpage>69</fpage>
          . Springer,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>G.</given-names>
            <surname>Nagy</surname>
          </string-name>
          .
          <article-title>Twenty years of document image analysis in PAMI</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          ,
          <volume>22</volume>
          (
          <issue>1</issue>
          ):
          <volume>38</volume>
          {
          <fpage>62</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>G.</given-names>
            <surname>Nagy</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.C.</given-names>
            <surname>Seth</surname>
          </string-name>
          .
          <article-title>Hierarchical representation of optically scanned documents</article-title>
          .
          <source>In ICPR</source>
          , pages
          <volume>347</volume>
          {
          <fpage>349</fpage>
          . IEEE,
          <year>1984</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>D.</given-names>
            <surname>Papadias</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Theodoridis</surname>
          </string-name>
          .
          <article-title>Spatial relations, minimum bounding rectangles, and spatial data structures</article-title>
          .
          <source>International Journal of Geographical Information Science</source>
          ,
          <volume>11</volume>
          (
          <issue>2</issue>
          ):
          <volume>111</volume>
          {
          <fpage>138</fpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>R.</given-names>
            <surname>Smith</surname>
          </string-name>
          .
          <article-title>Hybrid page layout analysis via tab-stop detection</article-title>
          .
          <source>In Document Analysis and Recognition</source>
          ,
          <year>2009</year>
          . ICDAR '
          <volume>09</volume>
          . 10th International Conference on, pages
          <volume>241</volume>
          {
          <fpage>245</fpage>
          . IEEE,
          <year>July 2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>