<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Rule-based location extraction from Italian unstructured text</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Daniele Caruso, Rosario Giunta, Dario Messina, Giuseppe Pappalardo, Emiliano Tramontana Department of Mathematics and Computer Science University of Catania</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <fpage>17</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>-Named entity recognition is a wide research topic concerned with the extraction of information from unlabelled texts. Existing approaches mainly deal with the English language, in this paper we present the results of a novel approach specifically tailored to the Italian language. The approach is directed at recognising location names in unstructured texts by several agents based on rules devised for the Italian grammar. Preliminary results show an F1 score up to 0.67.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Keywords-Information Extraction, Named entity recognition,
Free Text, Natural Language Processing, Italian Language.</p>
      <p>I. INTRODUCTION</p>
      <p>
        Huge amounts of text data are easily available on the World
Wide Web. Unfortunately, the great majority of such texts is
in the form of unstructured or semi-structured text. Such a
reality makes it difficult for both human beings and machines
to make a good use of the content of such texts. Information
Extraction is concerned with the process of structuring existing
texts (both semi-structured and free) so as to single out some
parts of text and have them accessed directly by some existing
postprocessors [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        A comprehensive survey of existing approaches [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] show
how the Information Extraction community evolved from
the seminal approaches since the early ’90s, e.g. automatic
learning of rules to extract entities [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], maximum entropy
models [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], Conditional Random Fields [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], etc. Many, if not
all, of these approaches are tested, or developed, on the English
language. Moreover, specific analysers have been developed
to embed security checks on software programs [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], discover
structural properties [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], and
perform automatic transformation of programs [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>We are especially concerned with the problem of named
entities extraction from free texts in the Italian language,
in particular we are interested in the extraction of location
names, i.e. proper nouns of places. Free texts can be of any
kind, ranging from dialogues in a movie to fiction prose,
thus enacting different constraints, however, in general, a
location name is assumed to be written with a capital letter,
and common names can be thus considered location names,
especially in casual speech, e.g. in Vediamoci in Dipartimento
(Let’s meet at the Department)1, where the said Department
1We have decided to use both the original Italian and the translated version
of any processed text we show, in order to allow a better appreciation of the
proposed approach.
is a shared knowledge between speakers.</p>
      <p>Unlike machine learning approaches, both unsupervised and
supervised, we proposed a rule-based approach built from
simple grammar rules of the Italian language complemented by a
dictionary. The process of location names extraction is pursued
by means of several specialised agents, each performing an
elaboration step and connected in the pipe and filter style (see
Figure 1), i.e. the result of the application of a rule removes
the bulk of the candidate words, which later have to pass a
further screening based, essentially, on a variant of a dictionary
comparison.</p>
      <p>The text is pre-filtered to remove punctuations symbols and
then split into sentences. Each sentence is analysed by up to
three rules (See Section III) so to identify word candidates,
finally combined to remove false positives. Devised rules are
typical Italian language patterns, identifying general contexts
where a location can be found, thus the rules are not a simple
filtering of words from an existing dictionary.</p>
      <p>Preliminary results of the algorithm are encouraging:
precision goes up to 0.82 and recall up to 0.92, while the
comprehensive F1 score goes up to 0.67.</p>
    </sec>
    <sec id="sec-2">
      <title>II. PHASE 1: PRELIMINARIES</title>
      <p>The approach and corresponding tool we have developed
works on simple text files, i.e. a web page can be pre-processed
beforehand by one of the many converters available to remove
HTML tags.</p>
      <p>For the devised rules, we make use of an especially
compiled Italian lexicon, containing the following classes of
words:
• Articles. A list of definite articles, e.g. il (the).
• Prepositions. Both kinds (semplice (simple) and
articolata (composite)) but excluding con (with), as it is not
used when naming places.
• Verbs. A subset of verbs related with places, such as
andare (to go), mandare (to send), partire (to leave),
passeggiare (to (take a) walk).
• Descriptors. A list of adverbs frequently related to a
place, such as dentro (inside), vicino (near).
• Non-places. Words of various kinds (verbs, adverbs,
nouns, etc.) not related to places, but that can appear
in grammar structures (defined by the rules we set) as if
they were places. E.g. acido (sour), dormire (to sleep). As
sentence
splitting</p>
      <p>Rule 1
start
Descriptors
avanti (in front of)
dietro (rear)
fianco (side)
dentro (inside)
fuori (outside)
vicino (near)
direzione (direction)
esterno (outer)
interno (inner)
lontano (away)
sinistra (left)
destra (right)
adiacente (adjacent)
vicinanza (proximity)
ingresso (entrance)
uscita (exit)
dirimpetto (opposite)
attiguo (adjacent)</p>
      <p>Non-places
altrimenti (else)
decimo (tenth)
allora (then)
filosofo (philosopher)
molto (much)
scrivere (to write)
camminare (to walk)
distrarre (to distract)
florido (prosperous)
bere (to drink)
visitare (to visit)
ognuno (everyone)
cremisi (crimson)
nostro (ours)
riempire (to fill)
durante (while)
esso (it)
piatto (flat)
lucente (shining)
spostare (to move)
capace (capable)
Phase 1</p>
      <p>Phase 2</p>
      <p>Phase 3
different words may be incorrectly identified as places,
the approach assists users in the customisation of the
set, by incorporating additional words, so as to exclude
(refine) future results.</p>
      <p>Table I shows the sample lists of words used in these
categories.</p>
      <p>In the following sections, such sets will be named after their
initial, e.g. we will talk of V as the set of verbs.</p>
      <p>Given a text T (read from an input file), the first step
is to separate sentences, based on standard Italian
grammar rules. T is split at occurrences of one of the
symbols in the set of sentence-end punctuation marks, i.e.
{full-stop, ellipsis, exclamation-mark, question-mark}, all the
other punctuation types are removed in order to be processed
by the next agents, obtaining a list of sentences. Any other
non-letter symbol is ignored, e.g. dollar sign, percent sign,
etc.</p>
      <p>Each sentence in the input text is further segmented in order
to find words, this is accomplished by using the space
character as word separator, this applies to any rule we describe.
The words found within a sentence are then compared with
the entries in the lexicons, according to the different rules
described in the next sections.
else
0</p>
      <p>We defined three finite state automata, to implement three
grammar cases possibly implying the use of a place in the
accepting state of the automaton. Each rule identifies a
different sentence pattern. The rules are applied at the sentence
level, i.e. on a list of words terminated by a punctuation
symbol, obtained in phase 1. The tokens (words) are fed to
the automaton and, if an accepting state is reached, the current
token is marked as a location candidate. If no accepting state
is reached no candidate is produced. When a candidate is
found, and a sentence still contains some more words, then
the automaton restarts from its initial state using the token
after the candidate, proceeding until the sentence ends.</p>
      <p>The devised rules are independent from one another, so they
can be parallelised by running as different agents e.g. on a
multicore machine or in different machines coordinated in a
Cloud fashion.</p>
      <p>The result of each rule application is a list of candidate
words, such words are used as input for the next phase (see
Section IV) for the definitive labelling. Different rules possibly
yields different candidates. Then, a way to use all the said rules
is to combine them, hence the candidate words passing the rule
filter(s) will be the union of the candidate words determined
by each applied rule (see Section V).</p>
      <sec id="sec-2-1">
        <title>A. Rule 1: Da Roma</title>
        <p>The first rule, translating “from Rome”, is used to identify
possible candidate words as a location, and is named, as the
other rules, after a typical example of a (part of a) sentence
in which a place can be identified.</p>
        <p>The automaton (see Figure 2) scans words (tokens) of a
given sentence and remains in state 0 until a preposition (P)
or an article (A) is found, this condition makes the automaton
changes its current state from 0 to 1, and the state remains
unchanged unless a different kind of word is found in the
next token. Other articles or prepositions do not enable a state
change, which is instead triggered by any other kind of word.
The final state is reached when a candidate word for a place is
found, however many candidates will be ignored afterwards,
as described in Section IV.</p>
        <p>As a single rule, this yields the highest number of false
positives, as the use of an article or a preposition is very
common in the Italian language.</p>
      </sec>
      <sec id="sec-2-2">
        <title>B. Rule 2: Vicino a Roma</title>
        <p>The second rule accommodates the mentioning of a place
name in sentences such as the name of the rule suggests,
“Near Rome”, as the presence of a Descriptor (see Section II)
start
start
else</p>
        <p>0
else
0</p>
        <p>D
V
1
1
else
else
else
is a strong indication that a place will be mentioned in the
following text in the same sentence.</p>
        <p>Figure 3 shows the Finite State Machine (FSM) to find a
candidate place. The automaton starts by reading the words
one by one and does not change its initial state (0) until a
descriptor is found, then it changes the current state to 1. From
state 1 a transition can take place to the state 2, when an article
or a preposition is found, or directly to the accepting state 3
in any other case. From state 2 it is possible to return to state
1, if another descriptor is found, or stay in the same state, if
more articles or prepositions are found. Finally, the accepting
state can be reached by reading any other kind of word.</p>
        <p>The accepting state identifies a candidate word as a place,
as Roma in the rule name.</p>
      </sec>
      <sec id="sec-2-3">
        <title>C. Rule 3: Andando a Roma</title>
        <p>While the previous rule uses descriptors as a way to identify
a possible place name, this rule is concerned with verbs, such
as in the rule name, “Going to Rome”.</p>
        <p>
          The FSM implementing the rule is shown in Figure 4. The
behaviour of the automaton is the same of Rule 2, where
instead of a descriptor a, possibly conjugated, verb is used.
The verbs included are only verbs related to movement, and
thus usually related to places, such as staying in a place or
moving to and from a place. Since a verb can be found in
a conjugated form, the check is performed using the Italian
version of the stemming algorithm Snowball [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ].
        </p>
        <p>The automaton scans the tokens and remains in the state 0
until a verb is found and the current state is changed to state
1. The automaton may change from state 1 to state 2 (and
vice versa), by reading a preposition or an article (or reading
a verb). Any other words will make the automaton to change
into the accepting state 3, i.e. pointing at the current word as
a possible candidate for a place name.</p>
        <p>IV. PHASE 3: NON-PLACE WORDS REMOVAL</p>
        <p>
          The candidate words yielded by the application of a rule
are further filtered before being labelled as a place name. A
candidate, to be considered a place name and thus evaluated
as a positive result, has to pass the following filters.
• Filter1: The candidate word is checked against the
Nonplaces lexicon (N). If it exists in N, then it is regarded
a False Postitive (FP) and hence discarded. E.g. in the
sentence Andare alla capitale (To go to the capital city),
Rule 2 will suggest capitale as a possible place, however
it is a common name and thus it will be discarded.
• Filter2: After passing the previous filter (Filter1), all the
remaining candidate words are filtered to avoid
identifying (conjugated) verbs as places. Once again, the check
is performed using a stemming algorithm [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. To check
if a candidate word is a verb, it is stemmed and then
concatenated with the three possible suffixes used in
Italian verbs (-are, -ere and -ire) so to get the infinitive
form of the verb, which is then searched for in the
Nonplaces lexicon. E.g. if the word (a verb) appears in the
lexicon, it is discarded as it is not a place name. E.g. in the
sentence Ella esce camminando (She gets out walking),
Rule 3 stays in the state 0 for Ella, then goes into state 1
reading the verb uscire, however the next token is not an
article nor a preposition, thus camminando is proposed
as a place candidate. However, in this filtering, such a
place candidate is recognised as a conjugation of the
verb camminare and finally discarded. The remaining
candidate words are promoted as results.
        </p>
        <p>Any word passing the said filters are labelled as a place
name, however such a result may be a True Positive (TP) or
a False Positive (FP).</p>
        <p>Unstructured text may not be reliably using orthographic
conventions, as a text could be a professionally proof-read
book or an informal automatic transcription, thus it may or
may not use capital letters to address location names. As far as
we described our approach, we did not make any assumption
on such an orthographic convention, however, experimentally,
we found better results when such a convention is satisfied,
thus we also provide a further filter:
• Filter0: If the candidate begins with a lower case
character, it is not deemed a location name, while it is output
as a result if it starts with a capital letter.</p>
        <p>As the name suggests, this filter has to be applied before
Filter1 and Filter2, as the user sees fit, based on the text to be
processed.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>V. DISCUSSION</title>
      <p>In the next subsections we review the rules and how they
relate to the actual grammar writings we examined, and then
show the results of the labelling experiments made on different
texts.</p>
      <sec id="sec-3-1">
        <title>A. Rule’s Assessment</title>
        <p>Real case examples. Table II shows several fragments of
sentences recognised by our approach, for both TP and FP
results, as specified in column 3. The words in italics are the
tokens consumed by the automaton for the current rule, while
the bold word is the one identified as a place.</p>
        <p>Lexicons. As the rules are based on several lexicons, the
completeness of such lexicons is essential for a good
recognition. In the sentence “dentro la stanza” (inside the room), the
rule 2 will candidate “stanza”, which should not be proposed
as a result, as it is not a proper location name. It is a
responsibility of the Non-places filter (Filter 1, see Section IV)
to recognise that the word is not a location, however, if
“stanza” is not in the N lexicon, it will be selected and thus
proposed as TP while being a FP.</p>
        <p>Repetitions. A simple observation of the FSMs shown
in Figures 2 to 4 may lead to a traversal of the states
recognising illicit sentences for the Italian grammar. E.g.
“Andando camminare per per Roma” (Going to walk to to
Rome) in which the application of Rule 3 would candidate
“Roma” as a place name. Our preliminary studies show that
ungrammatical sentences, such as the previous example, are
not so frequent unless we factor in informal languages, such
as instant messaging or poetic prose/verses.</p>
        <p>However, the same rules are capable of recognising
ungrammatical sentences appearing in both formal and informal
speech. A phrase such as “Andando a... a... a Roma” (Going
to... to... to Rome) would make the FSM in Figure 4 pointing
at “Roma” as an accepting state, even if the sentence is not
grammatically correct. As such a sentence can be typical in
speeches, e.g. when one speaks while recalling something, an
automatic transcription may report such sentences and thus we
left the loops in the rules.</p>
        <p>Sentence patterns. The rules we are proposing can be
considered arbitrary, even if intuitively correct. Thus, before
making the actual experiments in labelling, we studied the
result of the application of the rules alone on a set of
unstructured texts so as to check if such grammar structures
had the needed responsiveness degree. I.e. we are interested in
the possible paths any automaton may take, given real written
texts and not just simple cases (such as the ones in the titles
of subsections in Section III, which are correct but also very
basic).</p>
        <p>Rules have been tested on different kinds of textfiles, both
prose and dialogue transcriptions, for a total of 1.2 million
of characters. The results are shown in Table III. In each
line, the first column is the rule, the second is the sentence
pattern found by the rule, the third column is the number of
instances of the pattern found in the test corpus. The sentence
pattern is identified by the transition in the automaton, e.g.
VPA (Verb, Preposition, Article) identifies a sentence such as
Viaggiare per l’Italia (To travel in Italy), which is decomposed
as ViaggiareV perP l’A Italia, where the words before Italia
are being catalogued respectively as [V]erb, [P]reposition and
[A]rticle.</p>
        <p>Rule Sentence
1 Il processo che si svolge a Milano
(The trial taking place in Milano)
I treni a lunga percorrenza per la Sicilia
(Long distance trains to Sicily)
Il vertice che si terr oggi a Bruxelles
(The meeting taking place in Bruxelles)
Se il Ministro in indirizzo non intenda intervenire
(If the addressed Minister does not mean to intervene)
2 Scappa verso il Canale
(Runs away towards the Channel)
All’interno della Basilica Palladiana
(Inside the Basilica Palladiana)
Mi fanno sedere accanto a Carlo
(They let me sit beside Carlo)
Sono operative presso le DIGOS di tutto lo Stato
(They are operational in the DIGOS (offices) of the State)
3 Passando per Piazza Del Popolo
(Proceed through Piazza Del Popolo)
La prima volta che vedo Palermo
(The first time I see Palermo)
Non ho piu` visto Carlo
(I have not seen Carlo)
La soglia richiesta per entrare in Parlamento
(The threshold required to get into the Parliament)</p>
        <p>Given a rule, a Sentence Pattern such as A is more general
than any pattern having A as a suffix e.g. PA. Thus, all the
occurences of PA form a subset of the occurences of A. For
the experiments (Section V-B) the automata are set to found
the longest match.</p>
        <p>The preliminary study reported in Table III shows just the
number of occurrences for each sentence pattern, it does not
show the percentage of TPs or FPs, as this is just a way to
check the different transitions in the proposed automata.</p>
      </sec>
      <sec id="sec-3-2">
        <title>B. Experiments</title>
        <p>The rules detailed in Section III have been developed in a
tool and have been tested on different kinds of unstructured
texts: (i) theatrical dialogue transcriptions (texts T2, T3, T4),
(ii) official stenographic transcriptions of political debates
(T5) and (iii) news articles (T1). In the two latter cases, the
transcriptions are properly capitalized, and thus the Filter0 (see
Section IV) has been used in the experiments, while the other
texts were all in lower cases and thus only Filter1 and Filter2
have been used in phase 3 (see Section IV).</p>
        <p>All the texts used for the experiments have been manually
labelled for the location names. In the experiments, all the
combinations of the rules have been tested, as shown in the
second column. E.g. Rule “2&amp;3” means to put together as a
mathematical union the set of candidates gathered by Rule 2
with the set of candidates gathered by Rule 3, using such an
union for the filtering agents in phase 3.</p>
        <p>The precision metric is computed as a correctness measure,
using also the number of False Negatives (FN), as T PT+PF P ,
while the recall is computed as a completeness metric as
T PT+PF N . The F1 score gives the harmonic mean of precision
and recall.</p>
        <p>There are cases where a rule fails to identify any TP,
however this is expected. When an input text does reference
a place name by e.g. a motion verb, then only Rule 3 can be
able to recognise such places, while Rule 2, concerned with
the usage of descriptors, will never be applied.</p>
        <p>The results show an interesting F1 score, going up to 0.67
with an average of 0.38. The precision metric goes up to 0.82
in the best case, with a minimum value of 0.27 and an average
of 0.45. The recall shows also good results, having a maximum
value of 0.92 and an average of 0.51.</p>
        <p>While there are cases where very few location names
are identified, we deem such preliminar experiments worth
expanding, as one of the limitations is the small number of
labelled text which we have dealt with.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>VI. RELATED WORK</title>
      <p>
        Information extraction has come to be a hot research
topic, especially since the availability of huge amounts of
data publicly available. An excellent survey on Information
Extraction is [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], where the author reviews all the significant
existing approaches with a great amount of details. While
many different approaches have been proposed, however to
the best of our knowledge, little to no effort has been put
towards the Italian language.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] named entities are extracted and related to classified
newspaper advertisements (in French), using different
techniques. They make use of a lexicon to store already known
entities, thus once a word is found in an advertisement and
in the lexicon it can be automatically tagged as the lexicon
suggests. They also use regular expressions for entities such
as telephone numbers. Finally, a word spotting algorithm is
used to compute a score for unrecognised words, based on the
context (i.e. other specialised lexicons). While we also make
use of a lexicon, we use it to exclude a candidate, after a rule
has yielded one. It would be a trivial and brute force approach
to recognise a location name using a lexicon with all existing
location names (apart from homonymy), instead the rules we
Text
T1
T2
T3
T4
T5
propose allow the discovery of names not already inserted in
a lexicon.
      </p>
      <p>
        The approach presented in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] shows some similarities with
ours. The authors start with sample patterns containing named
entities, then identify actual instances of named entities, found
names are searched for to automatically identify new patterns
and reiterate the process.
      </p>
      <p>
        A different approach has been proposed in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and try to
identify named entities by short sequences of words, analysing
n-grams statistics obtained on Internet documents. Their Lex
method is a semi-supervised learning algorithm based on the
assumption that a sequence of capitalised words compound
the same name when such a n-gram appears to be statistically
more frequent than simple chance.
      </p>
      <p>
        A data mining approach is presented in [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], especially
crafted for geographical names. The algorithm searches for
specific keywords and patterns manually constructed and
related to geographical names, such as island of or archipelago.
The results are used to train a classifier with respect to the
found instances of a pattern.
      </p>
    </sec>
    <sec id="sec-5">
      <title>VII. CONCLUSIONS</title>
      <p>We have presented an algorithm devised specifically for the
Italian language, based on rules built upon its grammar. The
rules represent grammar pattern, implemented by finite state
machines, typically used in both written and spoken language,
thus several agents can be coordinated in a pipe and filter style
to get an unstructured input text to be filtered by the rules to
get candidate places. Preliminary results are promising, as the
F1 score reaches a maximum of 0.67, whereas the highest
precision and recall are 0.82 and 0.92, respectively.</p>
      <p>
        As possible future work, we aim to connect with our
previous research in which we have proposed to improve the
modularity of a software system by letting classes assume
roles on some design patterns [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]–[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The work presented here
can foster an approach whereby the automatic processing of
the italian language used for program comments can assist in
the selection of roles for classes. Moreover, semantic analysis
of text can take advantage of neural networks [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and as a
further work a possible approach would aim to recognise text
fragments using a soft computing approach [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
    </sec>
    <sec id="sec-6">
      <title>ACKNOWLEDGEMENT</title>
      <p>This work has been supported by project PRIME funded
within POR FESR Sicilia 2007-2013 framework.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Agichtein</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Gravano</surname>
          </string-name>
          . Snowball:
          <article-title>Extracting relations from large plain-text collections</article-title>
          .
          <source>In Proceedings of ACM Conference on Digital Libraries (DL)</source>
          , pages
          <fpage>85</fpage>
          -
          <lpage>94</lpage>
          , New York, NY, USA,
          <year>2000</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Banno`</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Marletta</surname>
          </string-name>
          , G. Pappalardo, and
          <string-name>
            <given-names>E.</given-names>
            <surname>Tramontana</surname>
          </string-name>
          .
          <article-title>Tackling consistency issues for runtime updating distributed systems</article-title>
          .
          <source>In Proceedings of International Symposium on Parallel &amp; Distributed Processing, Workshops and Phd Forum (IPDPSW)</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . IEEE,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Calvagna</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Tramontana</surname>
          </string-name>
          .
          <article-title>Delivering dependable reusable components by expressing and enforcing design decisions</article-title>
          .
          <source>In Proceedings of Computer Software and Applications Conference (COMPSAC) Workshop QUORS</source>
          , pages
          <fpage>493</fpage>
          -
          <lpage>498</lpage>
          . IEEE,
          <year>July 2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.-H.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Girgis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K. F.</given-names>
            <surname>Shaalan</surname>
          </string-name>
          .
          <article-title>A survey of web information extraction systems</article-title>
          .
          <source>IEEE Trans. on Knowl. and Data Eng</source>
          .,
          <volume>18</volume>
          (
          <issue>10</issue>
          ):
          <fpage>1411</fpage>
          -
          <lpage>1428</lpage>
          , Oct.
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Downey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Broadhead</surname>
          </string-name>
          , and
          <string-name>
            <given-names>O.</given-names>
            <surname>Etzioni</surname>
          </string-name>
          .
          <article-title>Locating complex named entities in web text</article-title>
          .
          <source>In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI)</source>
          , pages
          <fpage>2733</fpage>
          -
          <lpage>2739</lpage>
          . Morgan Kaufmann Publishers Inc.,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Giunta</surname>
          </string-name>
          , G. Pappalardo, and
          <string-name>
            <given-names>E.</given-names>
            <surname>Tramontana</surname>
          </string-name>
          .
          <article-title>Using Aspects and Annotations to Separate Application Code from Design Patterns</article-title>
          .
          <source>In Proceedings of Symposium on Applied Computing (SAC)</source>
          .
          <source>ACM</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Giunta</surname>
          </string-name>
          , G. Pappalardo, and
          <string-name>
            <given-names>E.</given-names>
            <surname>Tramontana</surname>
          </string-name>
          .
          <article-title>Aspects and annotations for controlling the roles application classes play for design patterns</article-title>
          .
          <source>In Proceedings of Asia Pacific Software Engineering Conference (APSEC)</source>
          . IEEE,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Giunta</surname>
          </string-name>
          , G. Pappalardo, and
          <string-name>
            <surname>E. Tramontana.</surname>
          </string-name>
          <article-title>AODP: refactoring code to provide advanced aspect-oriented modularization of design patterns</article-title>
          .
          <source>In Proceedings of Symposium on Applied Computing (SAC)</source>
          .
          <source>ACM</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Giunta</surname>
          </string-name>
          , G. Pappalardo, and
          <string-name>
            <given-names>E.</given-names>
            <surname>Tramontana</surname>
          </string-name>
          .
          <article-title>Superimposing roles for design patterns into application classes by means of aspects</article-title>
          .
          <source>In Proceedings of Symposium on Applied Computing (SAC)</source>
          .
          <source>ACM</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Giunta</surname>
          </string-name>
          , G. Pappalardo, and
          <string-name>
            <given-names>E.</given-names>
            <surname>Tramontana</surname>
          </string-name>
          .
          <article-title>A redundancy-based attack detection technique for java card bytecode</article-title>
          .
          <source>In Proceedings of International WETICE Conference</source>
          , pages
          <fpage>384</fpage>
          -
          <lpage>389</lpage>
          . IEEE,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lafferty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>McCallum</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F. C.</given-names>
            <surname>Pereira</surname>
          </string-name>
          .
          <article-title>Conditional random fields: Probabilistic models for segmenting and labeling sequence data</article-title>
          .
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mongiovi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Giannone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fornaia</surname>
          </string-name>
          , G. Pappalardo, and
          <string-name>
            <surname>E. Tramontana.</surname>
          </string-name>
          <article-title>Combining static and dynamic data flow analysis: a hybrid approach for detecting data leaks in Java applications</article-title>
          .
          <source>In Proceedings of Symposium on Applied Computing (SAC)</source>
          .
          <source>ACM</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , G. Pappalardo, and
          <string-name>
            <given-names>E.</given-names>
            <surname>Tramontana</surname>
          </string-name>
          .
          <article-title>A hybrid neuro-wavelet predictor for qos control and stability</article-title>
          .
          <source>In Proceedings of AIxIA</source>
          , volume
          <volume>8249</volume>
          <source>of LNCS</source>
          , pages
          <fpage>527</fpage>
          -
          <lpage>538</lpage>
          . Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , G. Pappalardo, and
          <string-name>
            <given-names>E.</given-names>
            <surname>Tramontana</surname>
          </string-name>
          .
          <article-title>Using modularity metrics to assist move method refactoring of large systems</article-title>
          .
          <source>In Proceedings of Complex, Intelligent and Software Intensive Systems (CISIS)</source>
          . IEEE,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , G. Pappalardo, and
          <string-name>
            <surname>E. Tramontana.</surname>
          </string-name>
          <article-title>An agent-driven semantical identifier using radial basis neural networks and reinforcement learning</article-title>
          .
          <source>In Proceedings of XV Workshop ”Dagli Oggetti agli Agenti”</source>
          , volume
          <volume>1260</volume>
          .
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , G. Pappalardo, and
          <string-name>
            <given-names>E.</given-names>
            <surname>Tramontana</surname>
          </string-name>
          .
          <article-title>Improving files availability for bittorrent using a diffusion model</article-title>
          .
          <source>In Proceedings of International WETICE Conference</source>
          , pages
          <fpage>191</fpage>
          -
          <lpage>196</lpage>
          . IEEE,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>K.</given-names>
            <surname>Nigam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lafferty</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>McCallum</surname>
          </string-name>
          .
          <article-title>Using maximum entropy for text classification</article-title>
          .
          <source>In IJCAI workshop on machine learning for information filtering</source>
          , volume
          <volume>1</volume>
          , pages
          <fpage>61</fpage>
          -
          <lpage>67</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>G.</given-names>
            <surname>Pappalardo</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Tramontana</surname>
          </string-name>
          .
          <article-title>Automatically discovering design patterns and assessing concern separations for applications</article-title>
          .
          <source>In Proceedings of Symposium on Applied Computing (SAC)</source>
          .
          <source>ACM</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>G.</given-names>
            <surname>Pappalardo</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Tramontana</surname>
          </string-name>
          .
          <article-title>Suggesting extract class refactoring opportunities by measuring strength of method interactions</article-title>
          .
          <source>In Proceedings of Asia Pacific Software Engineering Conference (APSEC)</source>
          , pages
          <fpage>105</fpage>
          -
          <lpage>110</lpage>
          . IEEE,
          <year>December 2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Peleato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-C.</given-names>
            <surname>Chappelier</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Rajman</surname>
          </string-name>
          .
          <article-title>Automated information extraction out of classified advertisements</article-title>
          .
          <source>In Natural Language Processing and Information Systems</source>
          , pages
          <fpage>203</fpage>
          -
          <lpage>214</lpage>
          . Springer,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Porter</surname>
          </string-name>
          .
          <article-title>Snowball: A language for stemming algorithms</article-title>
          ,
          <year>2001</year>
          . URL http://snowball. tartarus. org/texts/introduction. html,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sarawagi</surname>
          </string-name>
          . Information extraction.
          <source>Found. Trends databases</source>
          ,
          <volume>1</volume>
          (
          <issue>3</issue>
          ):
          <fpage>261</fpage>
          -
          <lpage>377</lpage>
          , Mar.
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>E.</given-names>
            <surname>Tramontana</surname>
          </string-name>
          .
          <article-title>Automatically characterising components with concerns and reducing tangling</article-title>
          .
          <source>In Proceedings of Computer Software and Applications Conference (COMPSAC) workshop QUORS</source>
          . IEEE,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>E.</given-names>
            <surname>Tramontana</surname>
          </string-name>
          .
          <article-title>Detecting extra relationships for design patterns roles</article-title>
          .
          <source>In Proceedings of AsianPlop. March</source>
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>O.</given-names>
            <surname>Uryupina</surname>
          </string-name>
          .
          <article-title>Semi-supervised learning of geographical gazetteers from the internet</article-title>
          .
          <source>In Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References - Volume</source>
          <volume>1</volume>
          , pages
          <fpage>18</fpage>
          -
          <lpage>25</lpage>
          . Association for Computational Linguistics,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>