Text Frame Detector: Slot Filling Based On Domain Knowledge Bases

               Martina Miliani, Lucia C. Passaro and Alessandro Lenci
 CoLing Lab, Dipartimento di Filologia, Letteratura e Linguistica (FiLeLi), Università di Pisa
                      martina.miliani@fileli.unipi.it
                       lucia.passaro@fileli.unipi.it
                          alessandro.lenci@unipi.it


                        Abstract                                with the consequent high cost of long annotation
                                                                time. On the other hand, unsupervised approaches
    English. In this paper we present a system
                                                                do not need any training data, but mapping
    called Text Frame Detector (TFD) which
                                                                extraction results onto predefined relations or
    aims at populating a frame-based ontology
                                                                ontologies is often quite challenging with this
    in a graph-based structure. Our system
                                                                kind of methods (Fader et al., 2011).
    organizes textual information into frames,
                                                                   Moreover, semi-supervised methods exploit
    according to a predefined set of semanti-
                                                                bootstrap learning, so that any new relation re-
    cally informed patterns linking pre-coded
                                                                quires a small set of labelled data to be extracted
    information such as named entities, sim-
                                                                (Agichtein and Gravano, 2000; Chen et al., 2006;
    ple and complex terms. Given the semi-
                                                                Weld et al., 2008).
    automatic expansion of such information
    with word embeddings, the system can be                        Finally, another kind of approach has been pro-
    easily adapted to new domains.                              posed, which relies on knowledge bases (KBs) to
                                                                produce training data. Introduced by Mintz et al.
1    Introduction                                               (2009), distant supervision detects relations on se-
Textual data are still the most widespread content              mantically annotated texts where entities which
around the Web (Smirnova and Cudré-Mauroux,                     co-occur in the same sentence match with entity-
2018). Information Extraction (IE) is a key task                pairs contained in the KB. Then a classifier is
to structure textual information and make it ma-                trained using features extracted from the annotated
chine understandable. IE can be modelled as the                 relations (Smirnova and Cudré-Mauroux, 2018).
process of filling semantic frames specified within             Although this approach has been proven to be
a domain ontology and consisting of a collection                effective, the supervised step could suffer from
of slots typed with their possible values (Minsky,              scarce amount of data, especially if the relations
1974; Jurafsky and Martin, 2018). Therefore, each               occur with low frequency in small corpora.
frame can be seen as a set of relations whose par-                 In this paper, we present a system to populate a
ticipants are the values of the slots. Following                frame-based ontology, whose values are stored in a
Jean-Louis et al. (2011), we refer to such relations            graph-based structure. Our method exploits some
as complex relations, namely any n-ary relation                 aspects of distant supervision, leveraging on do-
among typed entities.                                           main specific KB to infer the relations, and popu-
   Relation extraction techniques have been                     lates the frames with specific information (i.e., the
widely applied to populate semantic frames                      participants) as well as the portions of text (i.e.,
(Surdeanu, 2013; Zhenjun et al., 2017). However,                the snippets) which contain them. Thus, the out-
both supervised and unsupervised methods have                   put of the system for a single frame is a set of
shown their limits. On the one hand, supervised                 snippets, one for each of its slots. Each snippet is
approaches (Zelenko et al., 2003; Mooney and                    also associated with a weight encoding how likely
Bunescu, 2005; Nguyen and Grishman, 2015;                       it is expected to contain the information about
Zhang et al., 2017) model frame filling as a clas-              a certain relation. Such a weight is calculated
sification task, hence they require labelled data,              with a scoring function based on similarity mea-
                                                                sures and textual distance information. The sys-
     Copyright c 2019 for this paper by its authors. Use per-
mitted under Creative Commons License Attribution 4.0 In-       tem has been tested on the administrative domain,
ternational (CC BY 4.0).                                        with the goal of gathering information related to
taxes and agenda events. Indeed, since the KB can        tagging, then NEs (Passaro et al., 2017) and mul-
be semi-automatically enriched with Named Enti-          tiword terms are identified (Passaro and Lenci,
ties (NEs) and vocabularies of simple and com-           2016). Co-occurrency Analysis (Asim et al., 2018)
plex terms, our approach can be easily adapted           is then performed to identify the participants of
to different domains. Furthermore, system recall         each relation by considering terms and NEs co-
can be increased by expanding the frame and at-          occurring in the same sentence or in adjacent ones.
tribute vocabulary by exploiting word embeddings         The relations are filtered and ranked by applying a
(Mikolov et al., 2013).                                  scoring process (cfr. Section 3.2) to the snippets
   Our approach differs from existing systems like       containing them. The number of slots for each
PIKES (Concoglioniti et al., 2016), Framester            frame is not fixed, therefore we decided to store
(Gangemi et al., 2016), FRED (Gangemi et al.,            frames data in the graph-based database (GBD)
2017), and Framebase (Rouces et al., 2015) pri-          Neo4j1 . Compared to relational databases, GBDs
marily for the notion of semantic frame we have          do not require a pre-defined set of relations, allow-
adopted. The works above are mainly based on             ing for a more flexible object-oriented data stor-
Fillmore’s (1976) definition of frame as encoded         age. Moreover, GBDs can be updated in real-time
in FrameNet: frames and associated roles describe        and show a better performance in terms of query
situations evoked by lexical expressions (i.e. Lex-      execution time.
ical Units). In our system a frame represents a             In order to increase the system recall of relevant
domain entity (e.g. “tax”) by means of attributes        information, we also used the semantic neighbors
and relations associated to that domain. Unlike          of the terms defining the frames. For example, if
FrameNet frames, these attributes and relations are      a text contains the word “versamento” (‘deposit’)
activated by a set of distributed lexico-syntactic       but the KB only contains the word “pagamento”
cues.                                                    (‘payment’), the term “versamento” may be ex-
   This paper is structured as follows: in section 2     tracted because it is a semantic neighbor of the
we describe the general methodology of the sys-          latter (see Table 1).
tem, we define terminology and notation and we
                                                                  Neighbor                 Cosine Similarity
describe the main features of the proposed ap-                    rimborso (‘refund’)            0.89
proach. The system implementation is illustrated                  versamento (‘deposit’)         0.86
in section 3, which shows the extraction algorithm                versare (‘to deposit’)         0.78
as well as the indexing methods in the knowledge
graph. Evaluation and results are reported in sec-       Table 1: Semantic neighbors of “pagamento”
tion 4.                                                  (‘payment’) and their cosine similarity score.

2   Methodology                                          We trained fastText word embeddings (Bo-
                                                         janowski et al., 2017) on a combination of La Re-
Following Riedel et al. (2010), we assume that “if       pubblica corpus (Baroni et al., 2004) and PAWAC
two entities he1 , e2 i participate in a relation hri,   (Passaro and Lenci, 2016) for administrative do-
then there is at least one sentence hsi in the text      main specific knowledge.
expressing such relation”. We adopt this hypothe-           Currently, KB terms are expanded with their 10
sis for both simple and complex relations (cf. in-       nearest semantic neighbors in terms of cosine sim-
fra), by considering the sentence hsi itself and the     ilarity, which can be filtered through a parametric
[hs − ki, . . . , hs + ki] adjacent ones, where k is a   threshold.
system parameter.
   In order to identify sentences where one or more      2.1     Definitions and terminology
relations are expressed, we developed a system
                                                         Frame: Terms and entities contained in the KB
called Text Frame Detector (TFD).
                                                             are organized in frames. Frames allow to
   Given a KB where domain terms are associ-                 structure the implicit knowledge contained
ated to a given set of frames, TFD populates                 in texts around concepts that define the rele-
them, by making explicit the semantic relation be-           vant semantic categories in a domain. For in-
tween terms and named entities (NEs). In partic-             stance, the frame E VENT corresponds to en-
ular, TFD exploits linguistic analysis and IE algo-
                                                            1
rithms: texts are processed up to part of speech                http://neo4j.com/
     tities like concerts, shows, etc. Each frame is         NEs, such as “Firenze” (‘Florence’) or TEs,
     defined by its frame triggers and attributes.           like “18 giugno” (‘18th June’); (ii) complex
                                                             patterns, such as “non inferiore a” (‘not lower
Frame trigger: It corresponds to an instance of              than’).
    the semantic class described by the frame
    (e.g., in the administrative domain, the frame     3     Implementation
    TAX is expressed by its instances: “TARI”          In order to fill the frame slots, textual data are ana-
    (‘Garbage tax’), “IMU” (‘Municipal tax’)).         lyzed by TFD in various steps. After linguistic an-
    Frame triggers suggest the presence of frame       notation, NER, and term extraction, TFD looks for
    attributes in the text.                            frame triggers and for its attribute triggers, in the
                                                       same sentence or in the sentences around it. More
Attribute: A frame is composed by a set of slots,
                                                       specifically, given a snippet , a frame instance F
     which must be filled by specific instances or
                                                       is expressed by a frame trigger Ft , and a set of at-
     data (Minsky, 1974). Each slot value is a
                                                       tributes A, containing both simple (As ) and com-
     participant in a relation with the frame trig-
                                                       plex (Ac ) attributes, so that F = {Ft , A} where ai
     ger. This relation is referred to as an “at-
                                                       ∈ As ∪ Ac .
     tribute”, and describes an aspect of the con-
     cept represented by the frame. For instance,      3.1    Frame and attribute retrieval
     the E VENT frame, requires the following at-      Since both simple and complex attributes of a
     tributes: when, to be filled with time and        frame are expressed by means of the set T of their
     date, where, which corresponds to a location      attribute triggers, we can say that F is instantiated
     and cost, such as the ticket price. Depend-       in a text by the joint occurrence of a frame trigger
     ing on the way they are expressed in texts,       Ft and a set of attribute triggers T related to one or
     we distinguish between simple attributes and      more of its attributes, namely F = {Ft , T } where
     complex attributes.                               T = {t1, ..., tn}.
                                                          In order to retrieve a frame F in a portion of
Simple attribute: Their values correspond to
                                                       text, first of all we look for its frame triggers. Once
    simple and complex terms, NEs or Tempo-
                                                       a Ft has been detected, we search for its potential
    ral Expressions (TEs) identified during the IE
                                                       attributes. Given such F , its potential instances in
    step. The E VENT frame attributes are consid-
                                                       the text consist of the co-occurrence of Ft and a
    ered simple because they usually appear right
                                                       subset of T . To guarantee a certain degree of flex-
    near the frame trigger (cfr. Figure 1).
                                                       ibility, we decided to provide each of the elements
Complex attribute: The values of these at-             in T with a binary feature that can be set to 1 if
   tributes do not correspond to a single entity,      the attribute trigger ti is mandatory to extract the
   but are expressed by whole text segments.           F , and to 0 if the attribute trigger is optional. A
   Concerning the TAX frame, the deadline at-          further implementation could consider to convert
   tribute cannot be filled by simply extracting       these features in continuous weights. In this way
   the due dates from the text, because the re-        the TFD would be able to consider some triggers
   ported information would be incomplete if           as more relevant than others to populate the frame.
   taken out of context (cfr. Figure 2). There-           Moreover, the attribute triggers of F belonging
   fore, it is necessary to return the entire text     to T are selected within terms and entities used to
   snippet, which includes the attribute triggers      express its attribute instances. Such triggers are
   that allow to identify the complex attribute.       then exploited by the attribute retrieval system of
                                                       the TFD. Concerning the retrieval of simple at-
Attribute trigger: They represent the linguistic       tributes, see the extraction of the E VENT frame
     cues of an attribute instance. They are man-      from the sentence in Figure 1.
     ually selected by domain experts and stored       The trigger for the E VENT frame (“spettacolo di
     in the KB with a standard form t and a small      Roger Waters”) in Figure 1 is a clue for the pres-
     number of orthographic and morphosyntactic        ence of its attributes which populate the frame in-
     variants v. Attribute triggers can be: (i) sin-   stance showed in Table 2.
     gle and multiword terms, like “bollettino            Moreover, the TFD stores the raw text in Fig-
     postale” (‘postal order’), “saldo” (‘balance’),   ure 1 as the relevant snippet for both the attributes
 Lo [spettacolo di Roger Waters]nome_evento                 TAX                     IMU
                                                            deadline                18 giugno, 17 dicembre
 si terrà il [26 giugno]data allo [stadio di                methods of payment      bonifico bancario, bollet-
 Firenze]luogo .                                                                    tino postale

Figure 1: Example of a snippet (‘Roger Waters’                  Table 3: An instance of the TAX frame.
show will take place on 26th June at the Florence
Stadium’) containing simple attributes.                 selection and ranking system. Given a potential
                                                        instance of a frame, its attribute triggers are associ-
          E VENT    spettacolo di Roger Waters          ated with a binary feature indicating their compul-
          when      26 giugno
          where     Stadio di Firenze
                                                        sory presence in order associate the attribute with a
          cost      -                                   certain snippet. On the basis of how many features
                                                        are set to 1, the TFD will be more or less strict in
      Table 2: An instance of the E VENT frame.         the selection phase. For example, given the fol-
                                                        lowing sentences, where the frame triggers appear
                                                        in bold and attribute triggers are underlined (the
when and where.
                                                        standard form for “pagata” is “pagamento” and
 Il [versamento]pagamento        dell’[IMU]tassa        “17 giugno” is marked as “data”), Table 4 shows
 deve essere effettuato con [bonifico                   which snippets are extracted according to the bi-
 bancario]mod_pagamento        o       [bollettino      nary values associated to each attribute trigger.
 postale]mod_pagamento in due [rate]somma :               A “L’IMU va pagata entro il 17 giugno” (‘The Munici-
 l’[acconto]somma entro il [18 giugno]data e il             pality tax must be paid before June 17th ’)
 [saldo]somma entro il [17 dicembre]data .                B “La scadenza dell’IMU è fissata al 17 giugno” (‘The
                                                            deadline for the Municipality tax payment is on June
Figure 2: Example of a snippet (‘The Municipal-             17th ’)
ity tax disbursement must be made through wire
transfer or postal order in two installments: down       Line      pagamento      scadenza       data      snippet
payment by June 18th and balance by December              ID      (‘payment’)   (‘deadline’)   (‘date’)   extracted
17th ) containing complex attributes.                      1           0              0           0         A,B
                                                           2           0              0           1         A,B
                                                           3           0              1           0           B
                                                           4           0              1           1           B
   Examples of complex attributes can be found             5           1              0           0           A
in the TAX frame, namely deadline, indicating              6           1              0           1           A
                                                           7           1              1           0           -
the due date of the tax payment, and meth-                 8           1              1           1           -
ods of payment, indicating how it is possible to
pay it. For example, the triggers detected for             Table 4: Mandatoriness of attribute triggers.
the attribute deadline in Figure 2 are “somma”
(‘sum’), “pagamento” (‘payment’) and two TEs,              Each line of the table represents a potential
namely “18 giugno” (‘June 18th ’) and “17 dicem-        combination of attribute triggers, with the respec-
bre” (‘December 17th ’). The snippet contains also      tive mandatoriness. According to these features,
the attribute methods of payment, which is ex-          the absence of mandatory attribute triggers (line 1)
pressed by the triggers “pagamento” (‘payment’)         allows the retrieval of both the snippets A and B.
and “mod_pagamento” (‘methods_payment’), ex-            Otherwise, if the system is expected to find all the
pressed by “bonifico bancario” (‘wire transfer’)        attribute triggers (line 8), none of the two snippets
and “bollettino postale” (’postal order’). Table        is extracted because “pagamento” and “scadenza”
3 shows the TAX frame instantiated with the ex-         never appear in the same sentence. This system is
tracted attributes. Also in this case, the full snip-   useful in order to balance the extraction flexibility
pet (the raw text in Figure 2) is stored for both the   based on the domain. For example, in administra-
attributes deadline and methods of payment.             tive documents, where the language is bounded to
                                                        stereotyped phrases (Brunato, 2015) a more strict
3.2    Snippet selection and ranking                    approach is preferable, whereas in general domain
The binary features associated to each attribute        ones it might be better to work with a higher num-
trigger in a frame instance lead also the snippet       ber of optional triggers.
  Moreover, a second objective of the TFD is to                                  Pn
rank the extracted snippets according to their rele-                              i=1 T S
                                                                         DS =                           (2)
                                                                                      l
vance with respect to a given attribute. Such rele-
vance is calculated through a co-occurrence anal-      where l is the sentence length in terms of tokens,
ysis, which employs measures based on semantic         and T S is the Trigger score of a given variant v.
and distance features. One of these measures is the    T S is defined as:
Sentence score, defined as:                                                      1
                                                                          TS =     × cos                (3)
                                                                                 d
                   SS = |t| × |v|                (1)   where d is the distance between the attribute trig-
                                                       ger (or NEs) and the frame trigger, and cos is the
where t is the number of attribute triggers (stan-     cosine similarity between the trigger variant con-
dard forms) and v is the total of their variants.      tained in the KB and the neighbor found in the text
   This formula takes into account the ratio be-       (the cosine is equal to 1 for the KB terms).
tween the number of attribute triggers and their       3.3   Storage
variants. In particular, the TFD favours the snip-
                                                       Extracted frame instances are stored in a Neo4j
pets containing the highest number of distinct at-
                                                       GDB. The Knowledge Graph (KG) contains sev-
tribute triggers, namely their standard forms. In
                                                       eral root nodes, one for each of the frames detected
the case of simple attributes, t represents the num-
                                                       in the document or in the collection of documents
ber of entity types and v the number of NEs.
                                                       (Figure 3).
   Furthermore, although different frame triggers
may be found all over a given document, they
may refer to the same domain entity, hence to the
same frame instance. For example, we observed
that Italian municipality web pages dedicate en-
tire articles to a single tax, which can be men-
tioned in different ways, such as their full names
and their acronyms (e.g., the Italian Tax “Imposta
Municipale Propria” (‘Municipality tax’) can be
mentioned also with the acronym, “IMU”). In or-
der to avoid that attributes belonging to the same
frame are associated to different ones and affect
the scoring process, our system can be set to ap-
ply a “fuzzy normalization” strategy that is able to
associate all the triggers of a document to a frame
referring to the same entity. For example, the snip-   Figure 3: Information levels in the Knowledge
pets extracted from a municipality web page and        Graph.
associated to the deadline attribute of the TAX           For instance, there are two root-nodes corre-
frame can be ranked together, regardless the frame     sponding to the E VENT and TAX frames. If we
triggers they contain, such as “Imposta Munici-        consider the frame TAX (the node “Frame” in Fig-
pale Propria” (‘Muncipality tax’) or its acronym,      ure 3), the nodes “Frame Trigger” can be popu-
“IMU”.                                                 lated with instances like “Imposta Municipale Pro-
   At a document level, the snippet selected is sim-   pria” (‘Muncipality tax’) or its acronym, “IMU”.
ply the one with the highest Sentence Score, but       Each frame trigger node is linked to the cor-
we provide an additional level of analysis, which      responding frame attributes (“Attribute” node in
is applied when the snippet has to be chosen within    Figure 3) which can be populated with informa-
a group of documents, instead of a single one. In      tion like “scadenza” (‘deadline’) and “modalità di
that case, TFD selects the snippet with the high-      pagamento” (‘methods of payment’). Document-
est Document score (DS), which encodes how             nodes (“Document” node in Figure 3), labelled
likely the document contains a relevant informa-       by document names, are placed between attribute-
tion about a certain attribute. The Document score     nodes and attribute-trigger-nodes in order to fa-
is calculated as follows:                              cilitate the retrieval phase. Each document node
is associated with the snippet having the high-              Frame          Precision   Recall    F1
                                                             TAX              0.771     0.519    0.621
est Sentence score for the connected attribute-              E VENT           0.808     0.955    0.875
node (e.g., ‘deadline’), along with its Document             Total            0.799     0.793    0.796
score. In the retrieval phase, unless the informa-
tion is extracted from a single document, the snip-             Table 5: TFD evaluation results.
pet with the higher Document score is selected
and returned (see Section 3.2). The other levels       generalization capability of the models used to ex-
of the graph contain information extracted from        tract those entities. In other cases, a wrong snip-
each document. Every attribute-trigger-node (“At-      pet is selected as relevant for an attribute, although
tribute Trigger” node in Figure 3) is labelled by      triggers and NEs are correctly annotated and ex-
the standard form of the attribute trigger extracted   tracted. Moreover, additional errors depend on the
from the connected document-node (e.g., ‘sum’).        absence of attribute triggers variants in the Knowl-
Then, each attribute-trigger-node is connected to      edge Graph.
one or more nodes representing the trigger vari-          More specifically, errors are mainly related to
ants (“Attribute Variant” node in Figure 3). Con-      a wrong NE annotation (35%). In the 22.8% of
tinuing with this example, attribute variants can      cases, a wrong sentence is selected as relevant for
consist in ‘installments’, ‘balance’ and ‘down pay-    a certain attribute, although triggers and NEs are
ment’. Finally, the last node of the graph consists    correctly annotated and extracted. False negative
of the snippet-node (“Doc. snippet” node in Fig-       errors are caused by relevant information spread in
ure 3), storing the snippet containing the informa-    several sentences (8.8%), whereas each extracted
tion extracted. For example, the node can be popu-     snippet consists of a single sentence, by unknown
lated with a snippet like the one reported in Figure   triggers describing an attribute (7.5%), by partial
2: “Il versamento dell’IMU deve essere effettuato      information contained in the extracted sentence
con bonifico bancario o bollettino postale in due      (5%), by wrong lemmatization (1.75%) or by the
rate: l’acconto entro il 18 giugno e il saldo entro    overlapping of named entities and events (1.75%)
il 17 dicembre” (‘The Municipality tax disburse-       (e.g., ‘Roger Waters’ show’ is not annotated as
ment must be made through wire transfer or postal      an event, however ‘Roger Waters’ is extracted as
order in two installments: down payment by June        a named entity). In other cases (3.5%), attribute
18th and balance by December 17th’).                   triggers are too distant from their frame trigger
                                                       to be extracted. Although this span is customiz-
4   Evaluation and Results                             able, an excessive distance between frame and at-
                                                       tribute triggers could produce noise in the retrieval
The extraction of attributes related to TAX and
                                                       phase. Finally, the application of the “fuzzy nor-
E VENT frames were evaluated on Italian language
                                                       malization” strategy (see Section 3.2) led to errors
texts by an administrative domain expert. We de-
                                                       in the ranking phase (14.3%). One of the munic-
cided to evaluate these frames because the first
                                                       ipality web pages in which the strategy has been
one is very specific of the administrative domain,
                                                       applied contained information on more than one
whereas the second one can be seen as a general
                                                       tax, but only one frame instance has been returned.
purpose one. The gold standard includes both
                                                       This kind of errors can be limited by automatically
administrative documents as well as social me-
                                                       checking the frame triggers cited on the text, and
dia texts and news published on the municipal-
                                                       deciding whether applying or not the normaliza-
ities websites. Both frames were evaluated on
                                                       tion according to external lexical resources, such
50 texts, including information about taxes (mu-
                                                       as gazetteers or dictionaries.
nicipality online guidelines), events (administra-
tive acts, press releases, Facebook statuses and
                                                       5   Conclusions
tweets) and other topics (municipality web pages).
For municipality guidelines web pages, the “fuzzy      In this paper we presented a domain independent
normalization” strategy has been applied (see Sec-     system for slot filling that exploits a graph to pop-
tion 3.2). The results of the TFD are shown in Ta-     ulate a frame-based ontology. The Text Frame De-
ble 5.                                                 tector extracts a relevant snippet for each frame at-
   Since simple attribute values consist mostly of     tribute from textual information with good results
NEs, these results are strictly dependent on the       in terms of F1 score (0.796). Nonetheless, the
evaluation showed that there is room for improve-         Piotr Bojanowski, Edouard Grave, Armand Joulin, and
ment in some of the TFD modules. For exam-                   Tomas Mikolov. 2017. Enriching word vectors with
                                                             subword information. Transactions of the Associ-
ple, the annotation of the semantic neighborhood
                                                             ation for Computational Linguistics, 5(Dec):135–
of single and multiword terms, which are particu-            146.
larly relevant in technical domains, should led to
                                                          Dominique Brunato. 2015. A Study on Linguis-
further improve recall performances for complex
                                                            tic Complexity from a Computational Linguistics
attributes.                                                 Perspective. A Corpus-based Investigation of Ital-
   Moreover, although we did not adopted Fill-              ian Bureaucratic Texts. Ph.D. thesis, Università di
more’s semantic frames in the present work, we              Siena.
would like to explore the possibility of integrat-        Jinxiu Chen, Donghong Ji, Chew Lim Tan, and
ing our domain frames with FrameNet ones, which              Zhengyu Niu. 2006. Relation extraction using la-
might contribute to enhance the system flexibility.          bel propagation based semi-supervised learning. In
                                                             Proceedings of the 21st International Conference on
   Finally, in the near future, we plan to fine-             Computational Linguistics and 44th Annual Meet-
tune parameters and to implement additional fea-             ing of the Association for Computational Linguis-
tures such as to associate multiple snippets to the          tics, pages 129–136, Sydney, Australia. Association
same attribute. Furhermore, we intend to convert             for Computational Linguistics.
the binary features used in the snippet selection         Francesco Concoglioniti, Marco Rospocher, and
system into continuous weights. These weights,              Alessio Palmero Aprosio. 2016. Frame-based on-
along with the collected data about frame popula-           tology population with pikes. IEEE Transactions
                                                            on Knowledge and Data Engineering, 8(12):3261–
tion, would be also employed to train a supervised          3275.
model for slot filling, in order to test TFD across
new domains.                                              Anthony Fader, Oren Etzioni, and Stephen Soderland.
                                                            2011. Identifying relations for open information ex-
                                                            traction. In Proceedings of EMNLP 2011. the Con-
Acknowledgments                                             ference on Empirical Methods in Natural Language
                                                            Processing, pages 1535–1545, Edinburgh, Scotland,
This research has been funded by the Project                UK.
“SEM il Chattadino” (SEM), funded by Regione
                                                          Charles J. Fillmore. 1976. Frame semantics and
Toscana (POR CreO Fesr 2014-2020). The project
                                                            the nature of language. Annals of the New York
brings together the CoLing Lab and the companies            Academy of Sciences: Conference on the origin and
ETI3 s.r.l. (coordinator), BNova s.r.l. and Rigel           development of language and speech, 280(1).
Engineering s.r.l.
                                                          Aldo Gangemi, Mehwish Alam, Luigi Asprino,
                                                            Valentina Presutti, and Diego Reforgiato Recupero.
                                                            2016. ramester: a wide coverage linguistic linked
References                                                  data hub. In Proceedings European Knowledge Ac-
                                                            quisition Workshop, Cham. Springer.
Eugene Agichtein and Luis Gravano. 2000. Snow-
  ball: Extracting relations from large plain-text col-   Aldo Gangemi, Valentina Presutti, Diego Refor-
  lections. In Proceedings ACM 2000, the fifth confer-      giato Recupero, Andrea Giovanni Nuzzolese,
  ence of the Association for Computing Machinery on        Francesco Draicchio, and Misael Mongiovì. 2017.
  Digital libraries, pages 85–94, New York, NY, USA.        Semantic web machine reading with fred. Semantic
                                                            Web, 8(6):873–893.
Muhammad Nabeel Asim, Muhammad Wasim,
 Muhammad Usman Ghani Khan, Waqar Mahmood,                Ludovic Jean-Louis, Romaric Besançon, and Olivier
 and Hafiza Mahnoor Abbasi. 2018. A survey                  Ferret. 2011. Text segmentation and graph-based
 of ontology learning techniques and applications.          method for template filling in information extrac-
 Database: the journal of biological databases and          tion. In Proceedings of IJCNLP 2011, the fifth In-
 curation 2018.                                             ternational Joint Conference on Natural Language
                                                            Processing, pages 723–731, Chiang Mai, Thailand.
Marco Baroni, Silvia Bernardini, Federica Comastri,       Dan Jurafsky and James H. Martin. 2018. Speech
 Lorenzo Piccioni, Alessandra Volpi, Guy Aston, and         and language processing.   Third edition draft
 Marco Mazzoleni. 2004. Introducing the la re-              on webpage: https://web.stanford.edu/
 pubblica corpus: A large, annotated, TEI(XML)-             ~jurafsky/slp3/. Accessed: 3 July 2019.
 compliant corpus of newspaper Italian. In Pro-
 ceedings LREC’04, the fourth International Confer-       Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor-
 ence on Language Resources and Evaluation, Lis-            rado, and Jeff Dean. 2013. Distributed representa-
 bon, Portugal. European Language Resources Asso-           tions of words and phrases and their compositional-
 ciation (ELRA).                                            ity. In Proceedings of NIPS 2013, 26th Conference
  on Advances in Neural Information Processing Sys-       Dmitry Zelenko, Chinatsu Aone, and Anthony
  tems, pages 171–178, Lake Tahoe, Nevada, USA.            Richardella. 2003. Kernel methods for relation
                                                           extraction. Journal of machine learning research,
Marvin Minsky. 1974. A framework for representing          3(Feb):1083–1106.
 knowledge. Massachusetts Institute of Technology,
 Cambridge, MA.                                           Meishan Zhang, Yue Zhang, and Guohong Fu. 2017.
                                                           End-to-end neural relation extraction with global op-
Mike Mintz, Steven Bills, Rion Snow, and Daniel Ju-        timization. In Proceedings EMNLP 2017, confer-
  rafsky. 2009. Distant supervision for relation ex-       ence on Empirical Methods in Natural Language
  traction without labeled data. In Proceedings of         Processing, pages 1730–1740, Copenhagen, Den-
  the Joint Conference of the 47th Annual Meeting of       mark.
  the ACL and the 4th International Joint Conference
  on Natural Language Processing of the AFNLP,            Ming Zhenjun, Yan Yan Guoxin Wang, Janet K. Allen
  pages 1003–1011, Suntec, Singapore. Association           Joseph Dal Santo, and Farrokh Mistree. 2017. An
  for Computational Linguistics.                            ontology for reusable and executable decision tem-
                                                            plates. Journal of Computing and Information Sci-
Raymond J. Mooney and Razvan C. Bunescu. 2005.
                                                            ence in Engineering, 17(3):031008.
  Subsequence kernels for relation extraction. In
  Proceedings of NIPS 2005, 18th Conference on
  Advances in Neural Information Processing Sys-
  tems, pages 171–178, Vancouver, British Columbia,
  Canada.
Thien Huu Nguyen and Ralph Grishman. 2015. Rela-
  tion extraction: Perspective from convolutional neu-
  ral networks. In Proceedings of VS@NAACL-HLT
  2015, the 1st Workshop on Vector Space Model-
  ing for Natural Language Processing, pages 39–48,
  Denver, Colorado.
Lucia C. Passaro and A. Lenci. 2016. Extracting terms
  with Extra. Computerised and Corpus-based Ap-
  proaches to Phraseology: Monolingual and Multi-
  lingual Perspectives, pages 188–196.
Lucia C. Passaro, Alessandro Lenci, and Anna Gab-
  bolini. 2017. Informed pa: A ner for the italian pub-
  lic administration domain. In Proceedings of Clic-
  It 2017. The fouth Italian Conference on Computa-
  tional Linguistics, pages 246–252, Rome, Italy.
Sebastian Riedel, Limin Yao, and Andrew McCallum.
  2010. Modeling relations and their mentions with-
  out labeled text. In Proceedings of ECML PKDD
  2010, the European Conference on Machine Learn-
  ing and Principles and Practice of Knowledge Dis-
  covery in Databases, pages 148–163, Barcelona,
  Catalonia, Spain. Springer.
Jacobo Rouces, Gerard De Melo, and Katja Hose.
   2015. Framebase: Enabling integration of hetero-
   geneous knowledge. In Proceedings European Se-
   mantic Web Conference, Cham. Springer.
Alisa Smirnova and Philippe Cudré-Mauroux. 2018.
  Relation extraction using distant supervision: A sur-
  vey. ACM Computing Survey, 51(5):1–35.
Mihai Surdeanu. 2013. Overview of the tac2013
  knowledge base population evaluation: English slot
  filling and temporal slot filling. In Proceedings
  of TAC 2013, the Sixth Text Analysis Conference,
  Gaithersburg, Maryland USA.
Daniel S. Weld, Raphael Hoffmann, and Fei Wu. 2008.
  Using wikipedia to bootstrap open information ex-
  traction. SIGMOD record, 37(4):62–68.