=Paper= {{Paper |id=Vol-2693/paper1 |storemode=property |title=Explainable OpenIE Classifier with Morpho-syntactic Rules |pdfUrl=https://ceur-ws.org/Vol-2693/paper1.pdf |volume=Vol-2693 |authors=Bruno Cabral,Marlo Souza,Daniela Barreiro Claro |dblpUrl=https://dblp.org/rec/conf/ecai/CabralSC20 }} ==Explainable OpenIE Classifier with Morpho-syntactic Rules== https://ceur-ws.org/Vol-2693/paper1.pdf
                Proceedings of the Workshop on Hybrid Intelligence for Natural Language Processing Tasks HI4NLP (co-located at ECAI-2020)
                                          Santiago de Compostela, August 29, 2020, published at http://ceur-ws.org




      Explainable OpenIE Classifier with Morpho-syntactic
                            Rules
                                  Bruno Cabral and Marlo Souza and Daniela Barreiro Claro1


Abstract. Open information extraction (OpenIE) is a task of                        “I could only see the ball came in the goal, because it fell next to
extracting structured information from unstructured texts indepen-                                            where I was.”
dently of the domain. Recent advances have applied Deep Learn-
                                                                                    An Open IE system can generate valid extractions, such as:
ing for Natural Language tasks improving the state-of-the-art, even
though those methods usually require a large and high-quality cor-                                     (the ball, came in, the goal).
pus. The construction of an OpenIE dataset is a tedious and error-
                                                                                  Or the following invalid tuple:
prone task, and one technique employed concerns the extractions
from rule-based techniques and manual validation of those extraction                                      (the ball, came in, it)
triples. As low-resource languages usually lack available datasets
for the application of high-performance Deep Learning techniques,                    Since 2007, with the TEXTRUNNER [2], multiple OpenIE sys-
our intuition is that a low-resource model based-on multilingual in-              tems have been designed and proposed for the many different lan-
formation can learn generalizations across languages and benefits                 guages. These systems have had different types of approaches, from
from cross-lingual data. Moreover, we would like to interpret the                 rule-based systems to deep neural networks. A continued number of
set of generalized information gathered from multilingual learning                innovations in Deep Learning have been pushing multiple Natural
to increase the Open IE classification task. In this paper, we intro-             Language Processing (NLP) tasks to achieve a better performance,
duce TabOIEC, a multilingual classifier based on generic morpho-                  thanks in part to large-scale annotated datasets. Recently, OpenIE
syntactic features. Our classifier carries a glass-box method which               neural networks have been used for supervised learning in Open IE
can provide interpretation about some of the classifier decisions. We             [57, 16, 58, 61], achieving state-of-the-art results for English.
evaluate our approach through a small corpus of Open IE extractions                  As noted by Glauber and Claro [28], major advances in Open IE,
for the English, Spanish, and Portuguese languages. Our results con-              have mainly focused on the English language. Although the focus on
sider that for all languages our approach improves F1 measures, par-              the English language may be due its origin and the usage language
ticularly for monolinguality. Experiments on Zero-shot learning pro-              over the world, it has been recognized by the scientific community
vide evidence that our TabOIEC generalizes the classifier on other                that the focus on the English language with its particular characteris-
languages than that trained, although there is a shy transfer learning            tics may introduce some bias to the area [7, 6].
among them. Experiments on multilinguality do reduce the cost of                     While a constant number of innovations in Natural Language Pro-
training, however, in our experiments were difficult to provide ap-               cessing (NLP) research enable models to achieve impressive perfor-
propriate generalizations.                                                        mance, such developments are not available to all languages since
                                                                                  only a handful of them have the labelled data necessary for train-
                                                                                  ing deep neural nets [12]. In fact, for Open IE, the availability
1     Introduction                                                                of such datasets [56, 37] has led to the development of methods
                                                                                  [57, 16, 58, 61] achieving the Open IE state-of-the-art results.
Every day we have a greater volume of data, and we need tools that                   We believe one reason for this focus on the English language is
help us to extract relevant information from this growing set. Much               the lack of available resources for the area in other languages. Un-
of this information is composed of texts created in an unstructured               fortunately, manual creation of annotated corpora for Open IE is a
way, such as books, news and conversations. Open Information Ex-                  difficult task, as noted by [30, 37], due to vague notion of semantic
traction (OpenIE), as introduced by Banko et al. [2], is a useful tool            relation advocated in the area [60, 37] and the multiplicity of possible
in this context, because it is capable of extracting knowledge from               interpretations for the same sentence.
large collections of textual documents independently of the domain                   As Brants and Plaehn [8] observe, the use of automatic tools for as-
[5]. By extracting information, we mean that these systems generate               sisting annotation of a corpus facilitates rapid semi-automatic corpus
structured representation of information in the original documents,               annotation in an interactive process. As noisy candidate extractions
usually in the form of relational tuples, such as (arg1 , rel, arg2 ),            can be easily generated from a corpus based on simple morphosyn-
where arg1 and arg2 are the arguments of the relation, usually de-                tactic patterns [3, 21, 59] and parsing technology [26, 19, 29], an
scribed by noun phrases, and rel a relation descriptor that describes             important bottleneck in an Open IE annotation process is deciding
the semantic relation between arg1 and arg2 [24]. For example, con-               whether a given candidate extraction corresponds to a valid relation
sider the sentence:                                                               on the corpus. Hence, in this work we aim to construct a tool for
                                                                                  assessing the quality/correctness of Open IE extractions, aiming to
1 Federal University of Bahia, FORMAS Research Group, Computer Science            assist on semi-automatic construction of corpora for the area for dif-
    Department, Salvador - Bahia - Brazil, email: dclaro@ufba.br                  ferent languages.




           Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).



                                                                              7
   While similar classifiers have been proposed before as post-               cOIE [53] and DptOIE [29].
processing tools in Second generation Open IE systems, e.g [21,                  Classification-based tools to asses quality of extractions has been
48, 18, 22], these classifiers are usually constructed in language-           employed by different systems [48, 22], mainly following the success
dependent manner, for which the generalization to other languages             of the ReVerb [21]. These works are based on the manual construc-
has not been investigated, and/or generate models which are not eas-          tion of language-specific features to assess the quality of extractions,
ily interpretable [4, 10].                                                    based on morphosyntactic patterns and grammatical rules for each
   An important characteristic of our method relies on the fact that          language, which seldom generalize to other non-typologically related
we explore the use of machine learning methods which generate in-             languages.
terpretable models. Since in Open IE manual annotation, as observed              Language-independent classification methods have been proposed
by [30], agreement among annotators can be very low and anno-                 before [4, 10, 13]. The work of Barbosa and Claro [4] is the clos-
tations have to be discussed. Our focus on interpretable models al-           est to ours, proposing a set of feature which the authors claim to
low for the generation of explanations for the predictions, which can         be language-independent for the task of open IE extraction quality
be exploited in this process, as well as to generate underlying non-          assessment. The authors’ empirical evaluation of their proposed fea-
documented rules/hypothesis in the annotation process - as explored           ture set on multilingual data and their proposed method is based on
by [8].                                                                       Support Vector Machine classifiers which are not easily interpreted.
   Interpretable or explainable models are decision models for which          The work of Cabral et al. [11], on the other hand, proposes the use
predictions can be traced back to explicit relationships in the data.         of multilingual language models, as M-BERT [20] and XLM [36]
Recently, the application of neural methods in natural language pro-          to perform quality assessment and classification of Open IE extrac-
cessing has led to a profound advances in the area. These advances,           tions. The authors evaluate their method on multilingual data, but due
however, are hard to understand and evaluate, due to opaqueness               to the use of opaque language models and classification techniques,
of the new models developed in the area. Indeed, several recent re-           their predictions are not explainable and, thus, cannot be easily inte-
searches [33, 40, 42] show that the predictions made by the systems           grated within a semi-automatic annotation process.
in the area may be based on spurious or unclear reasons, thus subject
to adversarial attacks, and that their reported performance may be
explained by unrelated artifacts and regularities on the used datasets,
                                                                              3     TabOIEC
not on the inherent quality of the model. In fact, adversarial examples       In this work, our goal is to have an explainable OpenIE triple classi-
seem to be an unavoidable characteristic of such methods, a rising            fier capable of supporting multiple languages, by changing the train-
from their foundation geometric principles [32].                              ing dataset. In this Section, we briefly revisit the formulation of Ope-
   In this work, we propose a classification method to asses the qual-        nIE, and the components used in our model.
ity of Open IE system extractions aiming to assist on the semi-
automatic annotation of data. This method is based on the use of
tabular learning methods, i.e. methods specific to deal with tabular          3.1    Problem Definition
data and which generate interpretable models. By the use of generic           Let X = hx1 , x2 , · · · , xn i be a sentence composed of to-
features and multilingual pre-processing tools, our method can be             kens xi , an Open IE extractor is a function that maps X
directly trained on data from different languages without the need of         into a set Y = hy1 , y2 , · · · , yj i as a set of tuples y i =
engineering any pre-processing tools. To conduct our experiments,             hreli , arg1i , arg2i , · · · , argni i, which describe the information ex-
we investigate the application of several different explainable learn-        pressed in sentence X. In this work, we consider that the tuples are
ing architectures on data from three different languages. This tool           always in the format of y = (arg1 , rel, arg2 ), where arg1 and arg2
enables the classification of generated extractions of any previously         are noun phrases, not necessarily formed from tokens present in X,
developed OpenIE tool, independently of the language or type of im-           and rel is a descriptor of a relation holding between arg1 and arg2 .
plementation. In Portuguese, this model can trade recall performance          We do not consider extractions formed by n-nary extractions.
for up to 65% improvement in F1 score.                                           Given a sentence X as above, we are interested in determining for
   This article is organized as follows: Section 2 presents some re-          every extraction yi ∈ Y whether yi is a valid extraction from X , the
lated work. Section 3 describes our approach and our methodology.             factors that the classifier made their decision well as the confidence
Section 4 shows our experiments, results and discussions. Finally,            score for such classification . An OpenIE extraction classifier can be
Section 5 concludes our paper.                                                expressed as a decision function that for every single sentence X and
                                                                              extractions Y , returns a pair (Z, P ) ∈ {0, 1}|Y | × [0, 1]|Y | , where
                                                                              Z = hz1 , z2 , · · · , zn i is a binary vector s.t. zi = 1 denotes that yi is
2   Related Work
                                                                              a valid extraction, and P = hp1 , p2 , · · · , pn i is a probability vector,
Recently, new machine learning-based approaches for Open IE                   s.t. pi denotes that extraction yi has an associated probability pi of
[57, 16, 58, 61] have been proposed, leading to a new generation of           being classified as zi , given the input sentence X.
Open IE systems. While these systems represent the state-of-the-art
in the area, their focus on the English language and need of annotated
                                                                              3.2    Fine-tuned Multilingual Contextual
data make it hard to generalize their results to other languages. For
                                                                                     Embedding
the Portuguese language, new data-based methods have been pro-
posed as a cross-lingual approach due to the lack of resources for            In this work, our plan is to create an explainable language-agnostic
this task [10]. Early methods use linguistically-inspired patterns for        classifier, and for that, we use a Multilingual Contextual Embed-
extraction, such as ArgOE [25], or adaptation of methods for the En-          dings. Multilingual means that those models represent words of mul-
glish language, such as SGS[18], SGC 2017 [55] and RePort [48].               tiples languages into a shared semantic representation space. As
Recently, new pattern-based methods have risen as the new state-of-           such, these models are able to represent semantic similarities be-
the-art for the language [14] such as InferPORToie [54], Pragmati-            tween words in different languages. Contextual Embeddings means




                                                                          8
that the meaning of the word is represented taking its context into               process consists of running the feature function and saving the value
consideration.                                                                    obtained to a tabular structure. The process is depicted on Figure 1.
    One such Multilingual Contextual Embedding is M-BERT [20], a                     The process is the following: feed the sentence X and the list of
12-layer transformer trained on 104 languages from a Wikipedia with               extractions Y to the multilingual words embedding model (in our
a shared word piece vocabulary. According to tests conducted by                   case, the UDify model) to compute the set of features of each token
Pires et al. [49], M-BERT is able to transfer knowledge between lan-              in the sentence. Afterwards, the indexing step goal is performed to
guages with no lexical overlap, an indication that it captures multilin-          identify the start and end positions of each relation inside the triple
gual representations. It is capable of generating across languages be-            arguments through the rest of the sentence. The sentence X and the
cause common word pieces such as numbers are mapped to a shared                   list of extractions Y are inputted to the Algorithm 1.
space, spreading the effect to other word pieces, until similar words
in different languages are close in the vector space [49].                          Input: Original sentence S , arg1 , rel, arg2
    The problem with using M-BERT directly is that it does not ful-                 Output: F eat arg1 , F eatr el, F eat/arg2
fill our requirement of an explainable classifier, due to its ability to            F eat sen ← GenerateU dif yF eatures(S)
represent tokens in a multidimensional vector of values. One alterna-               for part in [ arg1 , rel, arg2 ] do
tive is the use of UDify model, a multilingual multi-task model ca-                     // Check if the string is a substring
pable of predicting universal part-of-speech, morphological features,                        of the original sentence
lemmas, and dependency trees across 75 languages [34]. This model                       if substring(part, S) then
uses M-BERT and fine-tunes it on the Universal Dependencies (UD)                             F eat part ←
dataset, as it provides syntactic annotations consistent across a large                       GetSubsetF eatures(part, F eat sen);
collection of languages [43]. UDify is able to represent of syntac-                          // Extract the features of this part
tic knowledge transfer across multiple languages including lemmas                                 from the already generated
(LEMMAS), treebank-specific part-of-speech tags (XPOS), univer-                                   features from the whole sentence
sal part-of-speech tags (UPOS), morphological features (UFEATS),                        else
and dependency edges and labels (DEPS) for each sentence [34].                               // The relation is not a substring
    Finally, for training our classifier, we use the final output of UDify                        of the original sentence, thus
to extract features of sentences’ inputs and extractions. Those fea-                              generate new features isolated
tures are than tabulated in a specific format so that they can be used                       F eat part ← GenerateU dif yF eatures(part)
in classification algorithms that create rules on a set of predefined at-               end
tributes. One example of such algorithm is a Decision Tree [9]. This                end
type of classifier has the characteristic of creating high-interpretable
                                                                                         Algorithm 1: Finding Features from a sentence
models.

                                                                                     This algorithm first generates the features using the original sen-
3.3     Architecture                                                              tence and then tries to match the constituent parts of each extracted
                                                                                  triple to the original sentence, as shown visually in Figure 1. This is
Our general architecture and classifier are illustrated in Figure 1. It
                                                                                  necessary due to the way that contextual embeddings work: a word
consists of three main steps. Firstly, we pre-process the input, then we
                                                                                  will have a different set of features, depending on the full sentence,
generate the feature set, and finally we feed the computed features to
                                                                                  and we want the representation to be the same as the original sen-
a Classifier. Each step is detailed in the subsections below.
                                                                                  tence.
                                                                                     In some cases, the constituents are not a sub sequence of the origi-
3.3.1    Pre-processing                                                           nal sentence, such as in implicit extractions. For example, in the sen-
                                                                                  tence “The covid-19 virus is very dangerous”, the triple (Covid-19, is
In the pre-processing step our objective is to convert the textual out-           a, virus) is valid, however the tokens “is a” are not present directly
put of the OpenIE Extractors to a structured format to be processed               in the original sentence. This makes it impossible to determine the
in the later steps. Relational triple data is textual and its contents can-       start and end of the relation extraction in the original sentence.
not be used directly in the classification algorithms implemented in                 In this case, we generate a new embedding as if the individual part
TabOIEC. This step is illustrated on Figure 2.                                    is a sentence. The output of the algorithm is F eat arg1 , F eat rel,
   It first receives a sentence X and a list of extractions Y , each in           F eat arg2 , each is an array of features for each constituent of an ex-
the form yi = harg1 , rel, arg2 i. The first step is to split the sen-            tracted triple. Each array of features is then transformed into a fixed-
tence into tokens. For the tokenization step we utilize the Spacy [31]            length vector of a manually defined feature as can be seen in Table 1.
xx ent wiki sm tokenizer, a Multi-lingual CNN trained on Nothman                  All features are based on the Universal Dependencies (UD) version
et al.[51] Wikipedia corpus. Afterwards we perform the contraction                2.3 tagset and each one is described below:
expansion. For example, the English contraction I’m could be tok-
enized as the two words I am, and we’ve could become we have.                     • 1-3 – Relative distance between parts
This is needed because in an extraction, different parts of a token                 Those features represent the relative distance between each con-
could appear on different parts of an extraction.                                   stituent part of the relation. The objective of this feature is to cap-
                                                                                    ture improbable distances. Analyzing the rules learned by the clas-
                                                                                    sifiers, we identified that this feature represents the location of the
3.3.2    Feature Extraction                                                         constituents, which together with the features below is a good in-
As explainability is a requirement in our classifier, we chose to use               dicator if those relationships happen in the correct order.
classifiers that work on a fixed set of features. For that, we need to            • 4-6 – UPOS features These tags mark the core part-of-speech
convert the Sentence and the extractions to a set of features. This                 (POS) categories. There are in this version of UD, 17 Universal




                                                                              9
 I could only see the
    ball came in the
          goal
                                  Dependency Tree
                                                                                                         Arg1 UPOS
                                                                                                         Rel UPOS
                                                                                                         Arg2 UPOS
                                                                       UPOS
                                                                                                      Arg1 Dependency              Classifier
                                                                                                            Tree
                                                                                                             .....
       mBERT fine tuned                                                 UFeats
           on UD

       Original Sentence                                                                                Tabulated
                                               Layer Attention
        with Extractions                                                                                Features

                                                                            Figure 1.       Architecture overview



                                                                                                       categories that generalize well across language boundaries. The
                                                                                                       objective of this feature is to identify valid or invalid relationships
                Figure 2.       TabOIE Pre-processing overview.                                        between different POS in a sentence. For example, the presence of
                                                                                                       many verbs in the relation increases the probability that the triple
                                                                                                       is invalid. Because our classifier algorithm requires a fixed set of
                                                                                                       features, we create a total of 51 features based on those rules. For
                                                  Indexing
                                                                                                       each relation we have 17 possible features, one for every single
      Sentences and                    Feature construction                 Pre-processed
    labeled extractions                                                          data
                                                                                                       UPOS category.
                                                                                                     • 7-9 – UFeat features
                               Data                                                                    In the Universal Dependencies (UD), those features distinguish
                                                      Pre-processing
                               input                                                                   additional lexical and grammatical properties of words, not cov-
                                                                                                       ered by the POS tags. In UD version 2.3, 50 different features are
                                                                                                       available, such as animacy, noun type, evidentiality and type of
                                     Metric                                                            named entity. This feature could help to identify for example that
            Evaluation                                         Classifier
                                   Calculation
                                                                                                       a part of a relation has a Named Entity, and this could be an indi-
                                                                                                       cator of a valid extraction. A list of features is created composed
                                                                                                       of all combinations between the existing Ufeat and each relation
                                                                                                       totaling 150 (50*3) possible features;
                                                                                                     • 10-21 – Dependency tree - Tags and Head location
                                                                                                       This set of features is the count of each 37 universal syntactic rela-
                          Table 1. Multilingual feature set.
                                                                                                       tions for each relation and where the head of the relation is located
                                                                                                       (inside one relation, or OU T if the head is located in a token not
   N     Feature                                                                                       located in any relation). For example, the possible categories are
   1     Relative distance between arg1 and rel                                                        nsubj (nominal subject) and advmod (adverbial modifier). It is cre-
   2     Relative distance between rel and arg2                                                        ated 444 possible features (37 * 12 combinations). This rule is in-
   3     Relative distance between arg1 and arg2
   4     arg1 count of each UPOS feature
                                                                                                       spired by the work of Oliveira et al [29]. Where they identify a set
   5     rel count of each UPOS feature                                                                of hand-crafted rules for Portuguese to identify valid extractions
   6     arg2 count of each UPOS feature                                                               based on the Dependency Tree. For example, they identify a rule
   7     arg1 count of each UFeat feature                                                              that a valid extraction might be composed of a subject (arg1), a
   8     rel count of each UFeat feature
   9     arg2 count of each UFeat feature
                                                                                                       verbal phrase (rel) (SV) and one or more arguments (arg2). Where
  10     arg1 count of each Dependency Tag with Head pointing to arg1                                  the arg1 have in the dependency tree a nsubj.
  11     arg1 count of each Dependency Tag with Head pointing to rel
  12     arg1 count of each Dependency Tag with Head pointing to arg2
  13     arg1 count of each Dependency Tag with Head pointing to OU T                                3.3.3   Classification
  14     rel count of each Dependency Tag with Head pointing to arg1
  15     rel count of each Dependency Tag with Head pointing to rel
  16     rel count of each Dependency Tag with Head pointing to arg2                                 In this work, we compared the performance of different inter-
  17     rel count of each Dependency Tag with Head pointing to OU T                                 pretable models in the classification task for predicting the quality
  18     arg2 count of each Dependency Tag with Head pointing to arg1                                of Open IE extractions. We compare the performances of the fol-
  19     arg2 count of each Dependency Tag with Head pointing to rel                                 lowing methods: CatBoost [50], a gradient boosting method for de-
  20     arg2 count of each Dependency Tag with Head pointing to arg2
  21     arg2 count of each Dependency Tag with Head pointing to OU T                                cision trees; SKLearn, the SciKit Learn Learn [47] implementation
                                                                                                     of Histogram-based Gradient Boosting Classification Tree [41]; Ex-
                                                                                                     plainable Boosting Machine[44], an Interpretable Gradient Boosting




                                                                                                10
Classifier; SKOPE-Rules, which uses predictive rule generation over            zero-shot test, we train the classifier with the whole corpus, exclud-
an ensemble of decision trees [23]; and TabNet [1], a tabular-data             ing the language to be tested (e.g., the zero-shot test for Portuguese
based explainable Neural Network.                                              is trained using the whole English and Spanish corpus and evaluated
                                                                               on the whole Portuguese corpus).
                                                                                  Each split on our k-fold strategy is carried on a sentence level. As
4     Experiments                                                              a consequence, each split has the same number of sentences, but it
In this section, we describe the empirical validation of our pro-              may differ on the number of extractions. Our results are a weighted
posed method to classify Open IE extractions based on language-                average on the number of extracted facts for each test folds using
independent features and interpretable models.                                 the Precision (P), Recall (R), F1-measure and the Matthews correla-
                                                                               tion coefficient (MCC) [39]. MCC is employed in machine learning
                                                                               as a quality measure of the classifier. To compute Precision-Recall
4.1    Dataset                                                                 curves, we select the n extractions with the highest confidence score
For comparability, in our experiments we employ the same data used             and compute the classifier’s precision. The possible values of con-
by Cabral et al. [11] for their multilingual Open IE classifier. This          fidence considered were: [0.6, 0.7, 0.8, 0.85, 0.9, 0.93, 0.95, 0.98,
dataset is composed of relations extracted by five different Open IE           0.99, 0.995, 0.999]. The code of our experiments is available at
systems, namely ClausIE, OLLIE, ReVerb, WOE, and TextRunner,                   https://github.com/FORMAS/HybridOIEClassifier
from texts in Portuguese, English, and Spanish languages, and la-
beled as valid or invalid (zi ) by human judges. A valid extraction
(zi = 1) corresponds to a coherent triple with the sentence. These
                                                                               4.3    Results
linguistic resources were obtained through the studies of [19] and
                                                                               We consider three evaluation performances. For monolingual learn-
[25]. The statistics of the dataset are summarized in Table 2.
                                                                               ing, we provide on Table 3 the precision (Prec.), recall (Recall), F1-
                                                                               measure (F1), Accuracy (Acc) and the Matthews metrics [39] (con-
                        Table 2.   Dataset statistics                          fidence coefficient among the extractions) for each language: Por-
                                                                               tuguese, Spanish and English. It is important to observe that the
                              # Sentences      # Extractions
                                                                               Recall measure for an OpenIE task corresponds to the total num-
               Portuguese     200              1856                            ber of triple extraction performed by all systems. We consider this
               English        500              7093                            as a 100% recall. In the scientific community, some researchers are
               Spanish        159              375                             denominating this restriction as a yield measure [17].
                                                                                  For the Portuguese language, the EBM model achieves a recall of
                                                                               80.8% in comparison with the 100% from the Original model. How-
                                                                               ever, the best precision performance was achieved by the Sklearn
4.2    Experimental Setup                                                      model with over 58%. Taking the Spanish language, we observe that
Our work uses the AllenNLP [27] library built with the PyTorch [45]            the best results were obtained from Sklearn and no impressive re-
framework. The fine-tuned model that extract the UD features is the            sult gathered from EBM model. The F1 measure surpassed all the
UDify [35] with the fine-tuned BERT weights available2 .                       Portuguese models. For English models, the best F1 results were ob-
   We implemented our Open IE classifier architecture directly on              tained by Catboost model. All confidence coeficients were over 87%
top of the AllenNLP. We also test with the following classifiers:              of agreement.

• Scikit-learn (Sklearn)[46] version 0.23 - A Gradient Boosting                             Table 3.     Metrics scores for languages classifiers
  Classifier
• Catboost[50] version 0.22 - A Gradient Boosting Classifier
                                                                                                              Prec.    Recall       F1      Acc.     MCC
• Skope 3 - A decision rule Classifier                                          Portuguese
• Explainable Boosting Machine (EBM) - implementation in                        Original                       0.181     1.000     0.307     0.181   0.000
                                                                                Best Precision                 0.580     0.383     0.459     0.836   0.976
  Interpret[44] version 0.1.22 - A Interpretable Gradient Boosting              (C:0.6 - Sklearn)
  Classifier                                                                    Best F1                        0.452     0.581     0.508     0.796   0.986
• TabNet - Attentive Interpretable Tabular Learning[1] Classifier,              (C:0.8 - Catboost)
                                                                                Best Acc. / Recall > 0.8       0.290     0.808     0.427     0.605   0.990
  version 1.0.64                                                                (C:0.9 - EBM)
                                                                                Spanish
                                                                                Original                       0.730     1.000     0.844     0.730   0.000
   Among these classifiers, the Skope and Explainable Boosting Ma-              Best Precision, F1             0.833     0.948     0.886     0.824   0.966
chine are considered glass-box classifiers, where they output high              and Accuracy(C:0.6-Sklearn)
                                                                                English
interpretable rules. In addition, with the other classifiers, there are         Original                       0.454     1.000     0.624     0.454   0.000
blackbox explainers such as SHAP Tree Explainer [38] that are able              Best Precision                 0.624     0.643     0.633     0.661   0.897
                                                                                (C:0.6 - Sklearn)
to explain their outputs.                                                       Best F1                        0.567     0.773     0.654     0.628   0.888
   For all classifiers we utilize the default hyper-parameters, with            (C:0.7 - Catboost)
                                                                                Best Acc. / Recall > 0.8       0.536     0.811     0.645     0.594   0.875
no additional tuning, only the number of epochs was changed to                  (C:0.9 - Sklearn)
300. For each single-language test, we split our corpus into training
and testing using a 5-fold cross-validation strategy. However, for the
                                                                                  To evaluate whether our models were able to explore cross-lingual
2 https://github.com/hyperparticle/udify                                       information, i.e. to apply information learned from a set of different
3 https://github.com/scikit-learn-contrib/skope-rules
                                                                               languages to a new language, we also performed zero-shot and one-
4 https://github.com/dreamquark-ai/tabnet
                                                                               shot classification.




                                                                          11
                              (a) English                                                               (b) Portuguese




                                                                    (c) Spanish



                                                    Figure 3.   Language-specific performance



                                                                                The zero-shot classification is a task where the classifier is eval-
                                                                             uated on a language not seen during the training. For Portuguese,
                                                                             we observe an average decrease in F1 performance between 3% (for
                                                                             SKOPE) and 16% (Sklearn and TabNet) on all models, with max-
                                                                             imum decrease of 19% on CatBoost at confidence 0.8. Similar be-
                                                                             haviour has been observed for zero-shot classification for Spanish -
                                                                             between 2% SKOPE and 20% Sklearn, maximun decrease of 52%
                                                                             with Catboost at 0.6 - and English - betwenn 1% for SKOPE and
                                                                             15% for SkLearn, with maximum decrease of 24% for SkLearn ate
                                                                             0.7.
                                                                                The one-shot classification is a task where the classifier is trained
                                                                             with the data from other languages and part of data on the target
                                                                             language, and tested on the remaining (unseen) data for the target
                                                                             language. For Portuguese, we observe an average decrease in F1 per-
                                                                             formance between 3% (for SKOPE) and 10% (Sklearn) on all mod-
                                                                             els, with maximum decrease of 15% on SKLearn at confidence 0.99.
                                                                             Similar behaviour has been observed for one-shot classification for
                                                                             Spanish - between 0% SKOPE and 10% TabNet, maximun decrease
                                                                             of 13% with TabNet at 0.7 - and English - betwenn 0% for SKOPE
                                                                             and 1% for EMB, with maximum decrease of 1% for EMB 0.8.
Figure 4.   Comparison of the performances for Monolingual, Zero-shot
                            and One-shot
                                                                             4.4    Discussion
                                                                             While in the single language experiments, results of the classifier
                                                                             are more robust, in the sense that the decline in Precision is much
                                                                             more nuanced for almost all representations in the three languages,
                                                                             in the zero-shot experiments, however, this decline is much more




                                                                        12
pronounced. These results indicate that there may be a discrepancy              ACKNOWLEDGEMENTS
between the datasets for each language regarding the relations ex-
tracted. This discrepancy may arise from the fact that the datasets             We would like to thank CNPQ and CAPES for their financial support.
were created using (i) different Open IE systems for each language
(ii) annotated by different teams at different times, and (iii) using
texts of different linguistic styles - for English, encyclopedic, jour-         REFERENCES
nalistic and user-generated (Web pages), for Spanish and Portuguese,
encyclopedic texts- and domains - multiple domains for English and               [1] Sercan O. Arik and Tomas Pfister. Tabnet: Attentive interpretable tabu-
Spanish and domain-specific for Portuguese. It may also be the case                  lar learning, 2019.
that linguistic parameters of each language, such syntactic struc-               [2] Michele Banko, Michael J Cafarella, Stephen Soderland, Matthew
                                                                                     Broadhead, and Oren Etzioni, ‘Open information extraction for the
ture and stylistic choices of each language community, may play an                   web’, in IJCAI, volume 7, pp. 2670–2676, (2007).
important role on structuring information through language and, as               [3] Michele Banko, Oren Etzioni, and Turing Center, ‘The tradeoffs be-
such, on how this information is extracted.                                          tween open and traditional relation extraction.’, in ACL, volume 8, pp.
   It is also worth noticing that the English dataset is considerably                28–36, (2008).
larger than both datasets for Spanish and Portuguese, thus in the zero-          [4] George Caique Gouveia Barbosa and Daniela Barreiro Claro, ‘Uti-
                                                                                     lizando features linguı́sticas genéricas para classificação de triplas rela-
shot learning, it may dominate the training process and can overfit the              cionais em português’, in Proceedings of the 11th Brazilian Symposium
classifier to the English dataset-specific characteristics. As such, ex-             in Information and Human Language Technology, pp. 132–141, (2017).
periments with a higher number of languages to provide the classifier            [5] David Soares Batista, David Forte, Rui Silva, Bruno Martins, and Mário
with a more diverse set of examples is recommended.                                  Silva, ‘Extracçao de relaçoes semânticas de textos em português explo-
                                                                                     rando a dbpédia e a wikipédia’, Linguamatica, 5(1), 41–57, (2013).
   Considering the multilinguality, we observe that our monolingual              [6] Emily Bender, ‘English isn’t generic for language, despite what nlp pa-
model is slightly better than the model trained for three languages,                 pers might lead you to believe’, in Symposium and Data Science and
except for the English one. Our results corroborate with the findings                Statistics, (2019). [Online; accessed 15-may-2020].
of [52] which mention the curse of multilinguality from [15] which               [7] Emily M. Bender, ‘Linguistically naı̈ve != language independent: Why
states that adding mode languages to a model can degrade the perfor-                 NLP needs linguistic typology’, in Proceedings of the EACL 2009
                                                                                     Workshop on the Interaction between Linguistics and Computational
mance as the capacity of the model remain the same. For the English                  Linguistics: Virtuous, Vicious or Vacuous?, pp. 26–32, Athens, Greece,
language, there is no significant difference from training with mono-                (March 2009). Association for Computational Linguistics.
lingual nor multilingual (i.e. three languages) approach.                        [8] Thorsten Brants and Oliver Plaehn, ‘Interactive corpus annotation’,
   Observing our results on zero-shot learning, it is important to no-               in Second International Conference on Language Resources and
                                                                                     Evaluation LREC-200, (2000).
tice that all three languages achieve a slight learning rate, increasing         [9] Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Ol-
the original performance indicating a limited but possible exploration               shen, Classification and regression trees, CRC press, 1984.
of cross-language information. For dissimilar languages such as in              [10] Cabral B.S., Glauber R., Souza M., and Claro D.B., ‘Crossoie: Cross-
the case of training in the extraction from Spanish and Portuguese                   lingual classifier for open information extraction’, in Computational
sentences and testing on extractions from English sentences, the re-                 Processing of the Portuguese Language (PROPOR 2020), ed., Aluı́sio
                                                                                     S. Moniz H. Batista F. Gonçalves T. Quaresma P., Vieira R., vol-
sults are less conclusive due probable to their dissimilar linguistic                ume 12037 of Lecture Notes in Computer Science, 201–213, Springer,
characteristics. Our intuition is that if the models are presented with              Cham, (February 2020).
examples of varied linguistic characteristics, the classifier can be ap-        [11] Bruno Souza Cabral, Rafael Glauber, Marlo Souza, and Daniela Bar-
plied to a wide range of low-resource languages - facilitating the de-               reiro Claro, ‘Crossoie: Cross-lingual classifier for open information ex-
                                                                                     traction’, in International Conference on Computational Processing of
velopment of computational linguistic resources in these languages.                  the Portuguese Language, pp. 368–378. Springer, (2020).
                                                                                [12] Xilun Chen, Ahmed Hassan Awadallah, Hany Hassan, Wei Wang, and
                                                                                     Claire Cardie, ‘Zero-resource multilingual model transfer: Learning
5   Conclusion and Future Work                                                       what to share’, arXiv preprint arXiv:1810.03552, (2018).
                                                                                [13] D.B. Claro, M. Souza, C. Castellã Xavier, and L. Oliveira, ‘Multi-
                                                                                     lingual open information extraction: Challenges and opportunities’,
In this work, we presented the TabOIEC, a language-independent ex-                   Information, 10(7), 228, (2019).
plainable relation extraction binary classifier. The evaluation results         [14] Sandra Collovini, Joaquim Santos, Bernardo Consoli, Juliano Terra,
                                                                                     Renata Vieira, Paulo Quaresma, Marlo Souza, Daniela Barreiro Claro,
demonstrated that a single model could improve the output of multi-
                                                                                     and Rafael Glauber, ‘Iberlef 2019 portuguese named entity recognition
ple state-of-art systems across three languages: Portuguese, English,                and relation extraction tasks’, in Proceedings of the Iberian Languages
and Spanish. Our results give evidence that simple and explainable                   Evaluation Forum (IberLEF 2019), volume 2421, pp. 390–410. CEUR-
models for extraction quality assessment could be a useful resource                  WS.org, (2019).
for the construction of Open IE datasets systems for different lan-             [15] Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaud-
                                                                                     hary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle
guages.                                                                              Ott, Luke Zettlemoyer, and Veselin Stoyanov. Unsupervised cross-
   In the future, we plan to evaluate the use of hand-crafted features               lingual representation learning at scale, 2019.
by linguist experts. Another point of improvement would test the so-            [16] Lei Cui, Furu Wei, and Ming Zhou, ‘Neural open information extrac-
lution in larger datasets, and utilize some techniques to improve the                tion’, arXiv preprint arXiv:1805.04270, (2018).
                                                                                [17] Leandro Souza de Oliveira, Rafael Glauber, and Daniela Barreiro Claro,
classifier such as Fine-tuning the classifier on the Open IE tuples.
                                                                                     ‘Dependentie: An open information extraction system on portuguese by
   Once mature, we intend to employ the trained models in an an-                     a dependence analysis’, Encontro Nacional de Inteligência Artificial e
notation tool, allowing the creation of Open IE and Relation Extrac-                 Computacional, (2017).
tion datasets for different languages. With such a tool, we aim to en-          [18] Erick Nilsen Pereira de Souza, Daniela Barreiro Claro, and Rafael
courage the development of Relation Extraction techniques and tech-                  Glauber, ‘A similarity grammatical structures based method for improv-
                                                                                     ing open information systems’, J. UCS, 24, 43–69, (2018).
nology for different languages, given the importance of Information             [19] Luciano Del Corro and Rainer Gemulla, ‘Clausie: clause-based open
Extraction technology for the development of advanced intelligent                    information extraction’, in Proceedings of the 22nd international
systems and interfaces.                                                              conference on World Wide Web, pp. 355–366. ACM, (2013).




                                                                           13
[20] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova,                     (BBA)-Protein Structure, 405(2), 442–451, (1975).
     ‘Bert: Pre-training of deep bidirectional transformers for language un-          [40] Tom McCoy, Ellie Pavlick, and Tal Linzen, ‘Right for the wrong rea-
     derstanding’, arXiv preprint arXiv:1810.04805, (2018).                                sons: Diagnosing syntactic heuristics in natural language inference’,
[21] Anthony Fader, Stephen Soderland, and Oren Etzioni, ‘Identifying                      in Proceedings of the 57th Annual Meeting of the Association for
     relations for open information extraction’, in Proceedings of the                     Computational Linguistics, pp. 3428–3448, Florence, Italy, (July 2019).
     Conference on Empirical Methods in Natural Language Processing, pp.                   Association for Computational Linguistics.
     1535–1545. Association for Computational Linguistics, (2011).                    [41] Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming
[22] Tobias Falke, Gabriel Stanovsky, Iryna Gurevych, and Ido Dagan,                       Ma, and Tie-Yan Liu, ‘A communication-efficient parallel algorithm for
     ‘Porting an open information extraction system from english to ger-                   decision tree’, in Advances in Neural Information Processing Systems,
     man’, in Proceedings of the 2016 Conference on Empirical Methods in                   pp. 1279–1287, (2016).
     Natural Language Processing, pp. 892–898, (2016).                                [42] Timothy Niven and Hung-Yu Kao, ‘Probing neural network com-
[23] Jerome H Friedman, Bogdan E Popescu, et al., ‘Predictive learning                     prehension of natural language arguments’, CoRR, abs/1907.07355,
     via rule ensembles’, The Annals of Applied Statistics, 2(3), 916–954,                 (2019).
     (2008).                                                                          [43] Joakim Nivre, Željko Agić, Lars Ahrenberg, Lene Antonsen, Maria Je-
[24] Pablo Gamallo, ‘An Overview of Open Information Extraction (In-                       sus Aranzabe, Masayuki Asahara, Luma Ateyah, Mohammed Attia,
     vited talk)’, in 3rd Symposium on Languages, Applications and                         Aitziber Atutxa, Liesbeth Augustinus, et al. Universal dependencies
     Technologies, eds., Maria João Varanda Pereira, José Paulo Leal,                    2.1, 2017. LINDAT/CLARIAH-CZ digital library at the Institute of
     and Alberto Simões, volume 38 of OpenAccess Series in Informatics                    Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and
     (OASIcs), pp. 13–16, Dagstuhl, Germany, (2014). Schloss Dagstuhl–                     Physics, Charles University.
     Leibniz-Zentrum fuer Informatik.                                                 [44] Harsha Nori, Samuel Jenkins, Paul Koch, and Rich Caruana, ‘Inter-
[25] Pablo Gamallo and Marcos Garcia, ‘Multilingual open information ex-                   pretml: A unified framework for machine learning interpretability’,
     traction’, in Portuguese Conference on Artificial Intelligence, pp. 711–              arXiv preprint arXiv:1909.09223, (2019).
     722. Springer, (2015).                                                           [45] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James
[26] Pablo Gamallo, Marcos Garcia, and Santiago Fernández-Lanza,                          Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia
     ‘Dependency-based open information extraction’, in Proceedings of the                 Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward
     joint workshop on unsupervised and semi-supervised learning in NLP,                   Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chil-
     pp. 10–18. Association for Computational Linguistics, (2012).                         amkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chin-
[27] Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep                        tala, ‘Pytorch: An imperative style, high-performance deep learning li-
     Dasigi, Nelson F. Liu, Matthew Peters, Michael Schmitz, and Luke S.                   brary’, in Advances in Neural Information Processing Systems 32, eds.,
     Zettlemoyer, ‘Allennlp: A deep semantic natural language processing                   H. Wallach, H. Larochelle, A. Beygelzimer, F. d Alché-Buc, E. Fox, and
     platform’, (2017).                                                                    R. Garnett, 8024–8035, Curran Associates, Inc., (2019).
[28] Rafael Glauber and Daniela Barreiro Claro, ‘A systematic map-                    [46] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,
     ping study on open information extraction’, Expert Systems with                       O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander-
     Applications, 112, 372–387, (2018).                                                   plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch-
[29] Rafael Glauber, Daniela Barreiro Claro, and Leandro Souza de Oliveira,                esnay, ‘Scikit-learn: Machine learning in Python’, Journal of Machine
     ‘Dependency parser on open information extraction for portuguese                      Learning Research, 12, 2825–2830, (2011).
     texts - dptoie and dependentie on iberlef’, in Proceedings of the Iberian        [47] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent
     Languages Evaluation Forum (IberLEF 2019), volume 2421, pp. 442–                      Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Pret-
     448. CEUR-WS.org, (2019).                                                             tenhofer, Ron Weiss, Vincent Dubourg, et al., ‘Scikit-learn: Machine
[30] Rafael Glauber, Leandro Souza de Oliveira, Cleiton Fernando Lima                      learning in python’, the Journal of machine Learning research, 12,
     Sena, Daniela Barreiro Claro, and Marlo Souza, ‘Challenges of                         2825–2830, (2011).
     an annotation task for open information extraction in portuguese’,               [48] Victor Pereira and Vládia Pinheiro, ‘Report-um sistema de extração
     in International Conference on Computational Processing of the                        de informações aberta para lı́ngua portuguesa’, in Proceedings of
     Portuguese Language, pp. 66–76. Springer, (2018).                                     Symposium in Information and Human Language Technology, pp.
[31] Matthew Honnibal and Ines Montani, ‘spaCy 2: Natural language un-                     191–200. Sociedade Brasileira de Computação, (2015).
     derstanding with Bloom embeddings, convolutional neural networks                 [49] Telmo Pires, Eva Schlinger, and Dan Garrette, ‘How multilingual is
     and incremental parsing’. To appear, 2017.                                            multilingual bert?’, arXiv preprint arXiv:1906.01502, (2019).
[32] Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom,               [50] Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev,
     Brandon Tran, and Aleksander Madry, ‘Adversarial examples are not                     Anna Veronika Dorogush, and Andrey Gulin, ‘Catboost: Unbi-
     bugs, they are features’, in Advances in Neural Information Processing                ased boosting with categorical features’, in Proceedings of the 32nd
     Systems, pp. 125–136, (2019).                                                         International Conference on Neural Information Processing Systems,
[33] Robin Jia and Percy Liang, ‘Adversarial examples for evaluating read-                 NIPS’18, p. 6639–6649, Red Hook, NY, USA, (2018). Curran
     ing comprehension systems’, in Proceedings of the 2017 Conference on                  Associates Inc.
     Empirical Methods in Natural Language Processing, pp. 2021–2031,                 [51] William Radford, Joel Nothman, Matthew Honnibal, James R Curran,
     Copenhagen, Denmark, (September 2017). Association for Computa-                       and Ben Hachey, ‘Document-level entity linking: Cmcrc at tac 2010.’,
     tional Linguistics.                                                                   in TAC, (2010).
[34] Dan Kondratyuk and Milan Straka, ‘75 languages, 1 model: Pars-                   [52] Nils Reimers and Iryna Gurevych. Making monolingual sentence em-
     ing universal dependencies universally’, in Proceedings of the 2019                   beddings multilingual using knowledge distillation, 2020.
     Conference on Empirical Methods in Natural Language Processing and               [53] Cleiton F. L. Sena and D. B. Claro, ‘Pragmaticoie: a prag-
     the 9th International Joint Conference on Natural Language Processing                 matic open information extraction for portuguese language’,
     (EMNLP-IJCNLP), pp. 2779–2795, Hong Kong, China, (2019). Asso-                        Knowledge and Information Systems, 201–213, (February 2020).
     ciation for Computational Linguistics.                                                https://doi.org/10.1007/s10115-020-01442-7.
[35] Daniel Kondratyuk, ‘75 languages, 1 model: Parsing universal depen-              [54] Cleiton Fernando Lima Sena and Daniela Barreiro Claro, ‘Inferpor-
     dencies universally’, arXiv preprint arXiv:1904.02099, (2019).                        toie: A portuguese open information extraction system with inferences’,
[36] Guillaume Lample and Alexis Conneau, ‘Cross-lingual language model                    Natural Language Engineering, 25(2), 287–306, (2019).
     pretraining’, arXiv preprint arXiv:1901.07291, (2019).                           [55] Cleiton Fernando Lima Sena, Rafael Glauber, and Daniela Barreiro
[37] William Léchelle, Fabrizio Gotti, and Philippe Langlais, ‘Wire57:                    Claro, ‘Inference approach to enhance a portuguese open informa-
     A fine-grained benchmark for open information extraction’, arXiv                      tion extraction’, in Proceedings of the 19th International Conference
     preprint arXiv:1809.08962, (2018).                                                    on Enterprise Information Systems - Volume 1: ICEIS,, pp. 442–451,
[38] Scott M Lundberg, Gabriel G Erion, and Su-In Lee, ‘Consistent in-                     Porto, Portugal, (2017). INSTICC, ScitePress.
     dividualized feature attribution for tree ensembles’, arXiv preprint             [56] Gabriel Stanovsky and Ido Dagan, ‘Creating a large benchmark for
     arXiv:1802.03888, (2018).                                                             open information extraction’, in Proceedings of the 2016 Conference
[39] Brian W Matthews, ‘Comparison of the predicted and observed sec-                      on Empirical Methods in Natural Language Processing, pp. 2300–2305,
     ondary structure of t4 phage lysozyme’, Biochimica et Biophysica Acta                 (2016).




                                                                                 14
[57] Gabriel Stanovsky, Julian Michael, Luke Zettlemoyer, and Ido Da-
     gan, ‘Supervised open information extraction’, in Proceedings of the
     2018 Conference of the North American Chapter of the Association for
     Computational Linguistics: Human Language Technologies, Volume 1
     (Long Papers), pp. 885–895, (2018).
[58] Mingming Sun, Xu Li, Xin Wang, Miao Fan, Yue Feng, and Ping Li,
     ‘Logician: a unified end-to-end neural approach for open-domain infor-
     mation extraction’, in Proceedings of the Eleventh ACM International
     Conference on Web Search and Data Mining, pp. 556–564. ACM,
     (2018).
[59] Clarissa Castellã Xavier, Vera Lúcia Strube de Lima, and Marlo Souza,
     ‘Open information extraction based on lexical-syntactic patterns’, in
     Intelligent Systems (BRACIS), 2013 Brazilian Conference on, pp. 189–
     194. IEEE, (2013).
[60] Clarissa Castellã Xavier, Vera Lúcia Strube de Lima, and Marlo Souza,
     ‘Open information extraction based on lexical semantics’, Journal of
     the Brazilian Computer Society, 21(1), 1–14, (2015).
[61] Sheng Zhang, Kevin Duh, and Benjamin Van Durme, ‘Mt/ie: Cross-
     lingual open information extraction with neural sequence-to-sequence
     models’, in Proceedings of the 15th Conference of the European
     Chapter of the Association for Computational Linguistics: Volume 2,
     Short Papers, pp. 64–70, (2017).




                                                                               15