<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Shoo the Spectre of Ignorance with QA SPR</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>An Open Domain Question Answering Architecture with Semantic Prioritisation of Roles</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Parco Scientifico e Tecnologico</institution>
          ,
          <addr-line>Verona</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Real T R&amp;TD Department</institution>
          ,
          <addr-line>Verona</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Open Domain Question Answering (ODQA) aims at automatically understanding and giving responses to general questions posed in natural language. Nowadays, the ability of a ODQA system is strictly dependent on how valuable information is e↵ectively discovered and extracted from the huge amount of documents on the net - may it be structured (e.g., online datasets), or unstructured (e.g., free text of generic web pages). This, in turn, relies on a proper (i) identification of question keywords to isolate candidate answer passages from documents, and (ii) ranking of the candidate answers to decide which passage contains the correct answer. In this paper we introduce a Question Answering Architecture with Semantic Prioritisation of Roles (QA2SPR) where a novel technique of prioritised semantic role labelling (PSRL) is used to optimise such phases. We also share the experimental results collected from a working prototype of QA2SPR for the Italian language.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Question Answering Systems (QAS) are particular types of Information Retrieval
(IR) systems that process user queries (questions) posed in natural language
and retrieve the closest or correct amount of information required by the query
(answer ). Since the development of first restricted domain QASs, the scientific
community has witnessed a widespread interest about QA-related topics. Only in
the last two decades – in particular the period 1999-2007 – the body of literature
in the field has grown so large and diverse that it is extremely dicult to survey all
research areas stemmed from this discipline (IR, Information Extraction, Natural
Language Processing, and many others). An exploratory analysis has shown
that the number of surveys, reviews, and conference papers on the subject has
increased by a factor of fifteen from the period 1960–1999 to 2000–2017 [
        <xref ref-type="bibr" rid="ref1 ref12">1,12</xref>
        ]. The
motivation behind such phenomenon is twofold. On one hand, QA tracks of annual
conferences like TREC, CLEF and NTCIR contribute to maintain a stimulating
and challenging research environment over the years [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. On the other hand, the
exponential growth rate of digital data (such as the number of web pages on the
Internet grown from 200 billion in 2006 to over 1 trillion in 2008 [14]) allowed
to access a massive pool of information and model highly sophisticated QASs
that answer more and more complex user questions (e.g., definitional questions,
list questions, or why-type questions). This factor plays a key role in building
deep interrelations between QA research and Knowledge Discovery (KD), whose
major aspect is to extract valuable knowledge and information from web data.
      </p>
      <p>Nevertheless, treating such amount of data also means to tackle several hidden
pitfalls that may threaten QA sub-task performances. Specifically, allowing more
complex user questions makes more dicult for the system to determine the
expected answer type, i.e., to classify the answer based on the category of the
subject required by the question (question classification phase). As an example,
we expect that the answer to “Who invented the light bulb?” regards a person.
Moreover, both document processing phase – i.e., the keyword-based retrieval
of web documents as much pertinent as possible with the answer topic – as
well as the answer processing phase – i.e., returning a ranked list of candidate
answer passages extracted from such documents – are extremely prone to errors
in scenarios characterised by high volumes of available information.</p>
      <p>
        Question classification, document processing, and answer processing are clearly
crucial to extract correct and precise answers. We cite, among others, the error
analysis of a ODQAS by [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] which shows that more than 30% of wrong answers
are due to incorrect question classification. However, while question classification
research has already produced very satisfying, quite definitive results (e.g.,
classification accuracy of up to 90% [14], or even better [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]), the vast plethora of plausible
heuristic metrics and algorithms that can be used to a↵ord the other two phases
make research still wide open to new proposal and improvements in this direction.
      </p>
      <p>In the present paper we introduce QA2SPR (Question Answering Architecture
with Semantic Prioritisation of Roles), an ODQA system architecture that exploits
a novel technique of question analysis in natural language – called prioritised
semantic role labelling (PSRL) – aimed at optimising question keyword extraction,
document processing, and answer processing phases. Moreover, we present an
embodiment of QA2SPR through a working prototype for the Italian language.</p>
      <p>The paper is organised as follows. Section 2 briefly reviews the QAS topics that
drove QA2SPR design, and stresses out the novelty of our work w.r.t. existing
semantic role labelling methodologies. In Section 3 we delineate the operating
principles of PSRL by means of examples. Section 4 reports a brief description of
QA2SPR and the most important building blocks we adopted for its realisation.
Section 5 reports the experimental results by a working prototype of QA2SPR for
the Italian language, while Section 6 draws some open issues left for future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        The theoretical work behind QA2SPR architecture design and the prototype
realisation are the result of a meticulous and critical study of answer extraction
techniques in factoid question answering [19], semantic approaches for question
classification [14], and ODQA based on syntactic and semantic question similarity
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], with special consideration of existing QA system implementations [13, 20].
      </p>
      <p>
        Although QA2SPR strictly follows the dictates of current state-of-the-art
methodologies existing in the literature, the main novelty of this work rather
relies on the strategy we devised for user questions analysis, that is, PSRL. Our
technique takes mainly inspiration from frame semantics theory and semantic
frame representations [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The semantic frame approach identifies the meaning
of words through schematic representations of the situations that characterise
human experience, each constituted by a group of participants in the situation,
or frame elements (FEs), and describes the possible syntactic realisations of the
FEs for every word. Usually, the information necessary for the individuation
of semantic frames is gathered by annotating (labelling ) corpus sentences in a
specific language with FEs (semantic roles) and syntactic informations. Several
(semi-)automatic techniques frame extraction from real world corpora (e.g., the
British National Corpus) gave rise to popular online resources such as FrameNet,
PropBank, and WordNet. It is nowadays widely acknowledged that linguistically
annotated corpora have a crucial theoretical as well as applicative role in NLP,
and QA is often cited as an obvious beneficiary of semantic role labelling.
WordNet has been already profusely employed in QA-related tasks ranging from query
expansion, to axiom-based reasoning, passage scoring, and answer filtering, while
syntactic structure matching has been applied to candidate passage retrieval and
answer extraction (see [17] for a complete list of references).
      </p>
      <p>
        The ever growing popularity of semantic role labelling applied to question
analysis persuaded us to choose a similar approach in our system as well. Nevertheless,
although QA2SPR has been designed as a modular architecture configurable with
an arbitrary language, our first case study envisaged a QA tool for Italian,3 which
still lacks stable or well documented semantic annotated resources [
        <xref ref-type="bibr" rid="ref11 ref8">8, 11, 18</xref>
        ]. We
then opted to model PSRL as a novel semantic frame-like approach for Italian
logical complement analysis based on Schank verb theory [16]. In the same way
frame semantics gives an heuristic model to isolate relevant frame elements based
on corpora annotations and schematisations of real world situations, we argue that
Schank verb analysis may give an heuristic model to isolate logical complements
from questions based on semantic verb content. Elevating the importance of verbs
to grasp the meaning of the whole sentence is in line with [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], where authors
state that a more rigorous and clearly defined methodology for the study of verb
semantic distribution is mandatory when coping with complex languages like
Italian. Conversely, annotated resources such as FrameNet for English do not
take into account the general distribution behaviour of a verb, nor it is even
represented within the standard FrameNet format for Lexical Units (LU). In Section 3
we shall briefly delineate the theory underlying PSRL. For space reasons, what
follows is the description of a simplified version of the real implemented procedure,
but complete enough for the reader to capture all the basic working principles.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>PSRL: Theory and Examples</title>
      <p>Schank verb analysis [16] maps natural language utterances into conceptual
structures that are unambiguous representations of their meaning, independently from
the language used. A conceptual dependency framework (or conceptualisation)
3 In the following, all references and examples in Italian language shall be reported in
italics with the corresponding English translation in regular typeset.
is devised, which models two basic constructions: (i) actor-action-object – e.g.,
“Johnactor hurtsaction Maryobject” (“Johnactor ha o↵eso action Maryobject”) – and
(ii) object-state – e.g., “Maryobject is hurtstate” (“Maryobject si ´e o↵esa state”).
The combination of the two permits to paraphrase an arbitrary sentence so as
to explicit the actual conceptual relationships. For instance, the construction
“Johnactor hurtsaction Maryobject” violates the rule that conceptual actions must
correspond to real world actions: the verb “hurt” does not refer to any action
that actually occurred, but rather to the result of the action that actually
occurred (which is unknown). Thus, the sentence should be rephrased as “Johnactor
doesaction something that causes Maryobject to be hurtstate”. A graphical
represenJohn
Mary
do
fare
hurt
offesa
(a)
tation of the utterance is reported in Figure 1(a), where denotes the mutual
dependency between actor and action, ∑ denotes the causal relationship (i.e.,
Mary was hurt because John did something to her), and indicates the
objectstate complex. Schank theory postulates that only fourteen action types – falling
into four distinct categories, namely Instrumental, Physical, Mental, and Global
– suces to conceptualise arbitrary complex statements in natural language.
Consider the sentence “John threw the ball (that) Fred loaned him to Mary” (“John
lanci´o a Mary la palla che Fred gli aveva prestato”), depicted in Figure 1(b).
Schank analysis unravels the utterance as follows: the actor “John” performs an
action that constitutes a change in physical location (Global action type PTRANS
inferred by verb “to throw to”, “lanciare a” – ) of object “ball”, “palla” ( O );
the direction of physical change (D-labelled ) is from the donor “John” to
the recipient “Mary”. The instrument used to cause PTRANS ( I ) is acting by
propelling (Physical action type PROPEL inferred again by verb “to throw to”)
the object “ball”, usually made through the medium “air”, “aria” ( ). Moreover,
there has been a change in the abstract relationship (Global action type ATRANS
inferred by verb “to loan”, “prestare a”) “possession” involving the object “ball”;
the relation change (R-labelled ) happens between the donor “Fred” and
the recipient “John”. As in the above example, most of the times Schank analysis</p>
      <sec id="sec-3-1">
        <title>1. Syntactic utterance analysis</title>
      </sec>
      <sec id="sec-3-2">
        <title>2. Schank verb analysis</title>
      </sec>
      <sec id="sec-3-3">
        <title>3. Explicit and implicit feature extraction</title>
        <p>See Figure 1(b)
“John”
“palla”
“John”
“Mary”
“John”
“palla”
“John”
“Mary”
“aria”
“John”
“palla”
“Fred”
“Fred”
“John”</p>
      </sec>
      <sec id="sec-3-4">
        <title>Object</title>
      </sec>
      <sec id="sec-3-5">
        <title>Possessor</title>
        <p>NOUN/NPR
(“il, lo, la, i, gli, le” +) NOUN/NPR
“da” + NOUN/NPR
“a” + NOUN/NPR</p>
        <p>NOUN/NPR
(“il, lo, la, i, gli, le” +) NOUN/NPR
“da” + NOUN/NPR
“a” + NOUN/NPR
“per, attraverso” (+ ART) + NOUN/NPR</p>
        <p>NOUN/NPR
(“il, lo, la, i, gli, le” +) NOUN/NPR
“di, dei, delle, degli” + NOUN/NPR
“da” + NOUN/NPR
“a” + NOUN/NPR
4. Italian logical analysis rules retrieval
(PTRANS, Actor) SOGGETTO
(PTRANS, ObjectOfAction) C_OGGETTO
(PTRANS, (DirectionOfPhysicalChange, Donor)) C_MOTO_DA_LUOGO
(PTRANS, (DirectionOfPhysicalChange, Recipient)) C_TERMINE
(PROPEL, Actor) SOGGETTO
(PROPEL, ObjectOfAction) C_OGGETTO
(PROPEL, (DirectionOfPhysicalChange, Donor)) C_MOTO_DA_LUOGO
(PROPEL, (DirectionOfPhysicalChange, Recipient)) C_TERMINE
(PROPEL, Medium) C_MEZZO</p>
        <p>(ATRANS, Actor) SOGGETTO
(ATRANS, (RelationOfAction, (Possession, Object))) C_OGGETTO
(ATRANS, (RelationOfAction, (Possession, Possessor))) C_SPECIFICAZIONE
(ATRANS, (RelationOfChange, Donor)) C_MOTO_DA_LUOGO
(ATRANS, (RelationOfChange, Recipient)) C_TERMINE</p>
      </sec>
      <sec id="sec-3-6">
        <title>5. Complement checking, matching and priority</title>
        <p>C_TERMINE
C_MOTO_DA_LUOGO
SOGGETTO</p>
        <p>NPR</p>
        <sec id="sec-3-6-1">
          <title>John</title>
          <p>C_TERMINE
C_OGGETTO
PRE
a
NPR</p>
        </sec>
        <sec id="sec-3-6-2">
          <title>Mary</title>
          <p>ART
la</p>
          <p>NOUN
palla</p>
          <p>C_MOTO_DA_LUOGO
C_SPECIFICAZIONE C_MEZZO
NPR</p>
        </sec>
        <sec id="sec-3-6-3">
          <title>Fred aria</title>
          <p>Fig. 2: PSRL phases: from sentence to logical analysis element extraction
discloses contextual hidden information involving the actions performed. In fact,
while we expect that the PTRANS conceptualisation always comes with explicit
features such as a donor (John), a recipient (Mary), and an object involving
the action (a ball), we can also argue that probably the ball has been thrown
through the air (a medium), as well as that the ball is Fred’s, since it has been
loaned by Fred to John (a possession). Such implicit features allow the same
sentence chunk to assume di↵erent semantic facets at the same time (e.g., Fred
is both the explicit donor and the implicit possessor of the ball), and may give
more significance to its semantic content. Techniques of semantic role labelling
for Italian such as logical complement analysis – where for instance, an object
(complemento oggetto) or a complement regarding possession (complemento di
specificazione) has typically more semantic meaning w.r.t. a complement involving
times (complemento di tempo) or places (complemento di moto a luogo, moto da
luogo, and so on) – could productively take advantage of this peculiar behaviour.</p>
          <p>Driven by these considerations, PSRL consists of the following phases (applied
to the above example in Figure 2):
1. Syntactic utterance analysis where the sentence is split into syntactic
tokens with Italian NLP tools (e.g., OpenNLP tokeniser) and the syntactic
information of each chunk is gathered from suitable dictionaries (e.g., MorphIt)
or by means of Named Entity Recognition techniques (e.g., OpenNLP NER);
2. Schank verb analysis where all verbs in the sentence are isolated, and the
complete Schank representation of the utterance is computed;
3. Explicit and implicit feature extraction where the system retrieves and
populates all the features attached to each Schank action type involved (e.g.,
Actor, ObjectOfAction, DirectionOfPhysicalChange.Donor and
DirectionOfPhysicalChange.Recipient as explicit features, and Medium implicit feature
for PROPEL action type in Figure 2);
4. Italian logical analysis rules retrieval where each extracted feature is
mapped to a logical complement of Italian language. The mapping is possible
by querying a special repository containing two types of rules:
– feature-to-complement rules of the type (SATN, FNT) ICN, where
(i) SATN is a Schank action type name, (ii) FNT is an iterative tree
of features names of the type FN – i.e., a singleton feature name –
or (FN, FNT) – required in cases where populated features are
subfeature of other implicit or explicit features – and (iii) ICN is an
Italian logical complement name. The intended meaning of a rule such as
(ATRANS, (RelationOfAction, (Possession, Object))) C OGGETTO
is that “the Object component of the Possession sub-feature of explicit
RelationOfAction feature for a ATRANS action type is mapped to a
complemento oggetto” (an object in the logical sense).
– complement-to-syntactic construction rules of the type ICN {SC},
where ICN is an Italian logical complement name and {SC} is a set
(possibly a singleton4) of syntactic constructions to be matched in the
question in order to recognise the complement content. As an example, the
4 In Figure 2 all complement-to-syntactic construction rules have a singleton set in
their right side not to overload the picture with too much information.
rule C OGGETTO (il, lo, la, i, gli, le +) NOUN/NPR means that “a
noun or a named entity (possibly) preceded by either il, lo, la, i, gli, le
particle is tagged as a complemento oggetto”.
5. Complement checking, matching and priority where complements are
extracted from the question. All information classified as explicit feature
in Phase 3. is double-checked and formatted w.r.t. the set of SCs retrieved
in Phase 4., and then tagged with all applicable ICNs. On the other hand,
all information derived from implicit features extraction is added to the set
of complements without additional checking, since such information is not
available in the original question (e.g., “aria” as complemento di mezzo in
Figure 2). Since sentence chunks may be tagged with several ICNs (e.g., the
“John” NPR token), a unique ICN for each sentence chunk is chosen according
to an Italian complement ranking list. For our first case study, QA2SPR
architecture applies the following precedence order among complements:
(a) Subject (SOGGETTO );
(b) Object (C OGGETTO );
(c) Complement regarding possession (C SPECIFICAZIONE );
(d) Complements regarding places (e.g., C MOTO A LUOGO );
(e) Complements regarding times (e.g., C TEMPO );
(f) Other complements.</p>
          <p>According to the list above, the “John” token is regarded as a SOGGETTO,
and “Fred ” token is a C SPECIFICAZIONE.</p>
          <p>The ICN choice based on a complement ranking list represents the first of three
prioritisation levels allowed by PSRL. As already pointed out, also keyword retrieval
phase (Subsection 3.1) and candidate answer passage ranking (Subsection 3.2)
benefit from PSRL inner ranking mechanism.
3.1</p>
          <p>PSRL for keyword retrieval
Complements extracted after PSRL phases are all good candidates as keywords
for the subsequent document extraction phase. The simpler heuristic for keyword
retrieval (as the one used in our first prototype) is to choose all devised
complements as equally important keywords for document search without a precise order,
but more complex combinations may be devised: a strict subset of (un)ordered
complements – e.g., only (un)ordered subject and object – or even a keyword
multi-search based on combinations of most important complements.
3.2</p>
          <p>PSRL for candidate answer passages ranking
The third and last prioritisation phase of our technique is applied in candidate
answer passages ranking. The preference order of Italian complement list may also
induce a preference order among candidate answer passages extracted during the
document processing phase. Consider the question “Che animale ´e Pippo, l’amico
di Topolino?” – “What animal is Goofy, Mickey Mouse’s best friend?”. Suppose
that the question classification phase correctly infers the expected answer type as
animal. PSRL phases applied to such utterance extracts the set of complements
shown in Figure 3(a). If all complements are used as keywords, the first two
documents extracted during document processing phase – e.g., through Google IT
Custom Search – are a Wikipedia page (https://it.wikipedia.org/wiki/Pippo), and a blog
(https://www.orgoglionerd.it/articles/2014/06/che-razza-di-a nimali-sono-personaggi-disney).</p>
          <p>Thanks to PSRL analysis, the system is able to order paragraphs containing
each single keyword based on the complement ranking induced by the list.
Figure 3(b)–(c) shows the first two paragraphs QA2SPR would actually retrieve
for each keyword, and how such paragraphs would be ranked according to the
complement priority list reported in Phase 5. of PSRL. During the subsequent
answer processing phase, all substrings whose semantic content is compatible
with the expected answer type animal are isolated from each paragraph (marked
in Figure 3(b)–(c) with blue ellipses) with the aid of semantic resources such as
MultiWordNet. In this case, the following answers are retrieved (in Italian
alphabetical order): “anatre” (“ducks”), “cane” (“dog”), “oche” (“geese”), “pantegana”
(“sewer rat”), “papero” (“gander”), “Rattus Rattus”, “topo” (“mouse”), “uccelli ”
(“birds”). QA2SPR is instructed to apply another layer of preference among
answers (and retrieved documents in general) according to a ranked list of web page
types: as an example, Wikipedia pages are regarded as containing more reliable
information than blog or forum pages. As such, extracted answers are first ordered
by web page types (first the Wikipedia page, then the blog page) and then ordered
by paragraph ranking, obtaining the following answer order:5 “cane”, {“papero”,
“uccelli”, “anatre”, “oche”, “Rattus Rattus”}, “cane”, {“topo”, “pantegana”}. In
this example, the correct answer to the question is also the top ranked for PSRL.
4</p>
          <p>2</p>
          <p>QA SPR architecture
A complete diagram of QA2SPR architecture customised for the Italian
language – not reported here for space reasons – is freely available for download at
http://semantica.realt.it:81/QAASPR/KDWEB2017/Architecture.pdf.</p>
          <p>QA2SPR conceptual design is slightly di↵erent from those of standard QASs,
which usually consists of three basic modules: (i) a question processing module,
whose main purpose is question classification; (ii) a document processing
module, responsible of information retrieval; and (iii) an answer processing module,
dedicated to answer extraction. In addition, two separated modules have been
designed to manage documents coming from di↵erent web sources. The Knowledge
Base Module (KBM) uses keywords to extract web data coming from knowledge
base and annotated repositories (FreeBase, WikiBrain, DBPedia) – which makes
it highly specialised for factoid answering, while the Full Open Domain Module
(FODM) spans over the entire web to extract both structured (e.g., Wikipedia
pages) and unstructured (e.g., free web text) information. Clearly, the FODM
module is the one that depends the most on PSRL analysis, given that paragraph
extraction and ranking phases are usually not required when information is
extracted from knowledge bases.
5 Answers from (i) same paragraph, (ii) di↵erent paragraphs but for the same
complement, and (iii) from di↵erent paragraph and di↵erent complement, but same
complement type are equally prioritised and represented between square brackets.
Pippo (Goofy, in precedenza Dippy Dawg e Dippy the Goof) è un
personaggio immaginario dei cartoni animati e dei fumetti della
Disney, ideato nel 1932 da Pinto Colvig e dall’animatore Johnny
Cannon come comprimario di Topolino in un cortometraggio, ma
viene caratterizzato definitivamente dall’animatore Art Babbitt nel
1935 e successivamente esordisce nei fumetti realizzati da Floyd
Gottfredson che definisce ulteriormente il personaggio dandogli
spessore come spalla di Topolino.
È un cane antropomorfo, alto, dinoccolato e vestito da contadino;
è goffo, sbadato, smemorato, disordinato e dotato di una
disarmante irrazionalità, e quindi Pippo rappresenta la
controparte ideale del razionale ed efficiente Topolino.</p>
          <p>C_SPECIFICAZIONE</p>
          <p>Pippo abita nella città di Topolinia ed è il migliore amico di Topolino.</p>
          <p>In una storia italiana, Topolino e la controcometa Astritel si afferma che il compleanno di Pippo è il 29 febbraio.</p>
          <p>SOGGETTO
C_OGGETTO
SOGGETTO
C_OGGETTO
Il personaggio riapparve in un altro cartone animato dello stesso
anno, Una festa scatenata (The Whoopee Party) nel quale
compare più giovane e promosso al rango di amico di Topolino.</p>
          <p>Nel 1993 gli è stata dedicata una serie di storie tutta sua, I
mercoledì di Pippo (scritta e ideata da Lino Gorlero e Rudy
Salvagnini), che lo vedeva, in ogni episodio, intento a leggere a
Topolino il suo ultimo racconto basato sempre su fatti privi di
spiegazione logica (infatti si chiama "Ai confini dell'irrealtà" la
serie di romanzi di fantascienza che vorrebbe inaugurare) che
destano il disappunto del razionalissimo Topolino, che lo
interrompe di continuo, pretendendo che l' amico sia un po' più
aderente alla realtà nei suoi racconti, senza riuscirci.</p>
          <p>No subsequent text found containing amico</p>
          <p>NOUN
animale
NOUN
amico
NPR
Topolino
NOUN
animale
NOUN
amico
NPR
Topolino
+</p>
          <p>NOUN
animale
ART
l’</p>
          <p>NOUN
amico</p>
          <p>NPR</p>
          <p>Topolino
NPR
Pippo</p>
          <p>NPR</p>
          <p>Pippo</p>
          <p>No text found containing animale
Eggià, perché scientificamente parlando il “papero” non è nemmeno un animale, bensì un termine popolare che descrive tutto quell'insieme
di uccelli “paperiformi”, dalle anatre alle oche.</p>
          <p>Piuttosto, è il Rattus Rattus l' animale più fisicamente vicino al nostro personaggio preferito: più grosso, quasi il doppio, più forte e
soprattutto completamente nero</p>
          <p>NPR
Pippo</p>
          <p>C_OGGETTO
Pluto e Pippo, discriminazioni e privilegi: Iniziamo questa
carrellata con una storia di ingiustizia e discriminazione.</p>
          <p>Pippo, il vacuo e tonto amico di Topolino, è ovviamente un cane.</p>
          <p>C_SPECIFICAZIONE</p>
          <p>Topolino: topo o pantegana? Questa è facile, direte voi: Topolino non può che essere un topo, e in questo caso nemmeno la traduzione
indica nulla di strano.</p>
          <p>Topolino, ma anche Minni, Tip e Tap e parentame vario, sono neri, nerissimi, e belli grossi.</p>
          <p>Experimental results: a question a day . . . for a year
2
A working prototype of QA SPR architecture for the Italian language was
developed in Java and hosted by a CentOS 6.8 Linux web server with four 64-bit
cores running at 2.40 GHz and 4GB of RAM.</p>
          <p>The system has been further asked
6</p>
          <p>The interested reader may contact the authors and access the online prototype.
to answer a set of 365 general knowledge questions randomly chosen from online
repositories.7 It is our intention in the near future to trial the prototype with
standard question sets like the ones proposed in annual QAS tracks, e.g., the
Text REtrieval Conference (TREC), CLEF workshops for European languages,
and EVALITA tracks for NLP and speech tools evaluation for Italian. However,
it has been noted that standard QAS track evaluation has remained somewhat
controversial, since it is hard to classify the reliability of the answers to some
question types (e.g., TREC and CLEF assessment as correct, unsupported,
inexact, and incorrect) [19]. Despite CLEF and TREC ranking, each answer
candidate the has been classified as (i) correct if at least the information required
by the question is given. In such cases, answers have been further classified as
accurate if they contain neither more nor less the information required, and
inaccurate otherwise; (ii) wrong if they do not provide the required information;
or (iii) unavailable if the system was not able to give a response (e.g., because
no relevant document has been retrieved with the supplied keywords). In the
remainder, an available answer trivially denotes either a wrong or a correct answer.
We report a summary of overall results and performances in Figure 4(a)–(b), and
individual scores related to KBM and FODM in Figure 5(a)–(f). The ratio of 58%
121 (33 %)</p>
          <p>210 (58 %)
34 (9 %)
(a)</p>
          <p>104
]sm3.5
[
e
itm 3
n
o
i
tcu 2.5
e
x
e
gv 2 18,723
A</p>
          <p>32,746
(b)
of correct answers – which increases to 64% if we ignore unavailable answers –
represents in our opinion an encouraging push to follow the current research path,
and suggests that even more satisfactory results might be achieved in view of
future enhancements of the system. In this respect, Figures 5(c) and 5(e) clearly
manifests where to focus our next e↵orts; in fact, answer extraction by KBM
already exhibits excellent outcomes (80% of correct answers with a remarkable
low average execution time), whilst more accurate PSRL heuristics for FODM are
required (a correct answer a little over half the times). We remark, however, that
parallel tests have been conducted showing the presence of the correct answer in
at least one of the paragraphs extracted by FODM using PSRL 86% of the times.</p>
          <p>Furthermore, the system shows a good work load division between answer
retrieval by FODM and KB (59%-41% as revealed in Figure 5(a)), which confirms
a proper choice of the test sample. The high variance between average execution
times is clearly due to the di↵erent complexity carried by the two modules (e.g.,
7 The file http://semantica.realt.it:81/QAASPR/KDWEB2017/Tests.pdf with all
the questions, answers, and execution times is freely available for download.
20,040
wrong,
195 (59 %)</p>
          <p>136 (41 %)
27 (20 %)</p>
          <p>
            109 (80 %)
94 (48 %)
101 (52 %)
number of sub-modules used, structured data from dataset vs. likely unstructured
data from web document pages).
We have already delineated in Section 5 some future investigation paths stemming
from the present dissertation. In addition, we are currently considering feasible
QA2SPR applications in Ambient Semantic Computing (ASC). The main aim is to
combine the semantic technologies o↵ered by QA 2SPR architecture – such as NLP
and ontology related research – with Ambient Intelligence (AI) and Ubiquitous
Pervasive Computing (UPC) capabilities. In this regard, an exploratory study has
been performed about the interactions between QA2SPR and MyElettra, a system
for real-time energy management and saving [
            <xref ref-type="bibr" rid="ref2">2, 15</xref>
            ]. The interaction among those
systems already shows promising results, also thanks to an advanced methodology
of ambient intelligence scheduling [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] and an improved mechanism to extract
energy consumption best practices based on a default logic approach [
            <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
            ].
13. Md. Arafat Rahman, Md-Mizanur Rahoman: sJanta: An Open Domain Question
Answering System. In: Kando, N., Joho, H., Kishida, K. (eds.) Proc. of the 11th
Conf. on Eval. of Inf. Access Technol. (NTCIR), Natl. Cent. of Sci., Tokyo, Japan.
          </p>
          <p>Natl. Inst. of Inf. (NII) (2014)
14. Ray, S.K., Singh, S., Joshi, B.P.: A Semantic Approach for Question Classification
using WordNet and Wikipedia. Pattern Recognit. Lett. 31(13) (2010)
15. Scannapieco, S., Tomazzoli, C.: Ubiquitous and Pervasive Computing for Real-Time
Energy Management and Saving – A System Architecture. In: Barolli, L., Enokido,
T. (eds.) Innov. Mob. and Internet Serv. in Ubiquitous Comput. (IMIS). Adv. in
Intell. Sys. and Comput., vol. 612. Springer Int. Publ. AG (2017)
16. Schank, R.C.: The Fourteen Primitive Actions and Their Inferences. Tech. rep.,</p>
          <p>Stanford Univ., Stanford, CA, USA (1973)
17. Shen, D., Lapata, M.: Using Semantic Roles to Improve Question Answering. In:
Eisner, J. (ed.) Proc. of the Joint Conf. on Empirical Methods in NLP and Comput.</p>
          <p>NLL (EMNLP-CoNLL), Prague, Czech Republic. ACL (2007)
18. Tonelli, S., Pighin, D., Giuliano, C., Pianta, E.: Semi-Automatic Development of</p>
          <p>FrameNet for Italian (2009)
19. Wang, M.: A Survey of Answer Extraction Techniques in Factoid Question
Answering. Tech. rep., Dep. of Comp. Sci., Univ. of Stanford (2006)
20. Ye, Z., Jia, Z., Yang, Y., Huang, J., Yin, H.: Research on Open Domain Question
Answering System. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds.) Nat. Lang. Process.
and Chin. Comput. (NLPCC), Nanchang, China. LNCS, vol. 9362. Springer (2015)</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Allam</surname>
            ,
            <given-names>A.M.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haggag</surname>
            ,
            <given-names>M.H.</given-names>
          </string-name>
          :
          <article-title>The Question Answering Systems: a Survey</article-title>
          .
          <source>Int. J. of Res. and Rev. in Inf. Sci. (IJRRIS) 2</source>
          (
          <issue>3</issue>
          ) (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cristani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karafili</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tomazzoli</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Energy Saving by Ambient Intelligence Techniques</article-title>
          . In: Barolli,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Xhafa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Takizawa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Enokido</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Castiglione</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Santis</surname>
          </string-name>
          , A.D. (eds.) 17th
          <source>Int. Conf. on Network-Based Inform. Sys</source>
          .,
          <source>NBiS</source>
          <year>2014</year>
          , Salerno, Italy,
          <source>September 10-12</source>
          ,
          <year>2014</year>
          . pp.
          <fpage>157</fpage>
          -
          <lpage>164</lpage>
          . IEEE Computer Society (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cristani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karafili</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tomazzoli</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Improving Energy Saving Techniques by Ambient Intelligence Scheduling</article-title>
          . In: Barolli,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Takizawa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Xhafa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Enokido</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.H</surname>
          </string-name>
          . (eds.)
          <source>29th IEEE Int. Conf. on Advanced Inform. Netw. and Appl</source>
          .,
          <source>AINA</source>
          <year>2015</year>
          , Gwangju, South Korea,
          <source>March</source>
          <volume>24</volume>
          -27,
          <year>2015</year>
          . pp.
          <fpage>324</fpage>
          -
          <lpage>331</lpage>
          . IEEE Computer Society (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cristani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Olivieri</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tomazzoli</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Automatic Synthesis of Best Practices for Energy Consumptions</article-title>
          .
          <source>In: 10th Int. Conf. on Innovative Mobile and Internet Services in Ubiquitous Computing, IMIS</source>
          <year>2016</year>
          , Fukuoka,
          <source>Japan, July 6-8</source>
          ,
          <year>2016</year>
          . pp.
          <fpage>154</fpage>
          -
          <lpage>161</lpage>
          . IEEE Computer Society (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cristani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tomazzoli</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karafili</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Olivieri</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Defeasible Reasoning about Electric Consumptions</article-title>
          . In: Barolli,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Takizawa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Enokido</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Jara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.J.</given-names>
            ,
            <surname>Bocchi</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y</surname>
          </string-name>
          . (eds.)
          <source>30th IEEE Int. Conf. on Adv. Inform. Netw. and Appl</source>
          .,
          <source>AINA</source>
          <year>2016</year>
          ,
          <article-title>Crans-</article-title>
          <string-name>
            <surname>Montana</surname>
          </string-name>
          , Switzerland,
          <fpage>23</fpage>
          -
          <issue>25</issue>
          <year>March</year>
          ,
          <year>2016</year>
          . pp.
          <fpage>885</fpage>
          -
          <lpage>892</lpage>
          . IEEE Computer Society (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Datla</surname>
            ,
            <given-names>V.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasan</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benajiba</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qadir</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prakash</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farri</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Open Domain Real-Time Question Answering Based on Semantic and Syntactic Question Similarity</article-title>
          . In: Voorhees,
          <string-name>
            <given-names>E.M.</given-names>
            ,
            <surname>Ellis</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.)
          <source>Proc. of the 25th Text REtrieval Conf. (TREC)</source>
          , Gaithersburg, Maryland, USA. NIST (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Fillmore</surname>
            ,
            <given-names>C.J.: Frame</given-names>
          </string-name>
          <string-name>
            <surname>Semantics</surname>
          </string-name>
          . Hanshin Publ. Co., Seoul, South Korea (
          <year>1982</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Lenci</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johnson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lapesa</surname>
          </string-name>
          , G.:
          <article-title>Building an Italian FrameNet through Semiautomatic Corpus Analysis</article-title>
          . In: Calzolari,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Choukri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Maegaard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Mariani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Odijk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Piperidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Rosner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Tapias</surname>
          </string-name>
          ,
          <string-name>
            <surname>D</surname>
          </string-name>
          . (eds.)
          <source>Proc. of Int. Conf. on Lang. Resour. and Eval</source>
          .
          <source>(LREC)</source>
          , Valletta,
          <string-name>
            <surname>Malta. Eur. Lang. Resour. Assoc.</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Loni</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A Survey of State-of-the-Art Methods on Question Classification. Literature Survey, Published on TU Delft Repository (</article-title>
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Moldovan</surname>
            ,
            <given-names>D.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pasca</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harabagiu</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Surdeanu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>: Performance Issues and Error Analysis in an Open-Domain Question Answering System</article-title>
          .
          <source>ACM Trans. Inf. Syst</source>
          .
          <volume>21</volume>
          (
          <issue>2</issue>
          ) (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Montemagni</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barsotti</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Battista</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calzolari</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corazzari</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenci</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zampolli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fanciulli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Massetani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Ra↵aelli, R.,
          <string-name>
            <surname>Basili</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pazienza</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saracino</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zanzotto</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mana</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pianesi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delmonte</surname>
          </string-name>
          , R.:
          <source>Building the Italian Syntactic-Semantic Treebank</source>
          . Springer Netherlands, Dordrecht (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Pundge</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>S.A.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Mahender</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.N.</surname>
          </string-name>
          :
          <article-title>Question Answering System, Approaches and Techniques: a Review</article-title>
          .
          <source>Int. J. of Comp. Appl</source>
          .
          <volume>141</volume>
          (
          <issue>3</issue>
          ) (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>