=Paper=
{{Paper
|id=Vol-2693/paper1
|storemode=property
|title=Explainable OpenIE Classifier with Morpho-syntactic
Rules
|pdfUrl=https://ceur-ws.org/Vol-2693/paper1.pdf
|volume=Vol-2693
|authors=Bruno Cabral,Marlo Souza,Daniela Barreiro Claro
|dblpUrl=https://dblp.org/rec/conf/ecai/CabralSC20
}}
==Explainable OpenIE Classifier with Morpho-syntactic
Rules==
Proceedings of the Workshop on Hybrid Intelligence for Natural Language Processing Tasks HI4NLP (co-located at ECAI-2020) Santiago de Compostela, August 29, 2020, published at http://ceur-ws.org Explainable OpenIE Classifier with Morpho-syntactic Rules Bruno Cabral and Marlo Souza and Daniela Barreiro Claro1 Abstract. Open information extraction (OpenIE) is a task of “I could only see the ball came in the goal, because it fell next to extracting structured information from unstructured texts indepen- where I was.” dently of the domain. Recent advances have applied Deep Learn- An Open IE system can generate valid extractions, such as: ing for Natural Language tasks improving the state-of-the-art, even though those methods usually require a large and high-quality cor- (the ball, came in, the goal). pus. The construction of an OpenIE dataset is a tedious and error- Or the following invalid tuple: prone task, and one technique employed concerns the extractions from rule-based techniques and manual validation of those extraction (the ball, came in, it) triples. As low-resource languages usually lack available datasets for the application of high-performance Deep Learning techniques, Since 2007, with the TEXTRUNNER [2], multiple OpenIE sys- our intuition is that a low-resource model based-on multilingual in- tems have been designed and proposed for the many different lan- formation can learn generalizations across languages and benefits guages. These systems have had different types of approaches, from from cross-lingual data. Moreover, we would like to interpret the rule-based systems to deep neural networks. A continued number of set of generalized information gathered from multilingual learning innovations in Deep Learning have been pushing multiple Natural to increase the Open IE classification task. In this paper, we intro- Language Processing (NLP) tasks to achieve a better performance, duce TabOIEC, a multilingual classifier based on generic morpho- thanks in part to large-scale annotated datasets. Recently, OpenIE syntactic features. Our classifier carries a glass-box method which neural networks have been used for supervised learning in Open IE can provide interpretation about some of the classifier decisions. We [57, 16, 58, 61], achieving state-of-the-art results for English. evaluate our approach through a small corpus of Open IE extractions As noted by Glauber and Claro [28], major advances in Open IE, for the English, Spanish, and Portuguese languages. Our results con- have mainly focused on the English language. Although the focus on sider that for all languages our approach improves F1 measures, par- the English language may be due its origin and the usage language ticularly for monolinguality. Experiments on Zero-shot learning pro- over the world, it has been recognized by the scientific community vide evidence that our TabOIEC generalizes the classifier on other that the focus on the English language with its particular characteris- languages than that trained, although there is a shy transfer learning tics may introduce some bias to the area [7, 6]. among them. Experiments on multilinguality do reduce the cost of While a constant number of innovations in Natural Language Pro- training, however, in our experiments were difficult to provide ap- cessing (NLP) research enable models to achieve impressive perfor- propriate generalizations. mance, such developments are not available to all languages since only a handful of them have the labelled data necessary for train- ing deep neural nets [12]. In fact, for Open IE, the availability 1 Introduction of such datasets [56, 37] has led to the development of methods [57, 16, 58, 61] achieving the Open IE state-of-the-art results. Every day we have a greater volume of data, and we need tools that We believe one reason for this focus on the English language is help us to extract relevant information from this growing set. Much the lack of available resources for the area in other languages. Un- of this information is composed of texts created in an unstructured fortunately, manual creation of annotated corpora for Open IE is a way, such as books, news and conversations. Open Information Ex- difficult task, as noted by [30, 37], due to vague notion of semantic traction (OpenIE), as introduced by Banko et al. [2], is a useful tool relation advocated in the area [60, 37] and the multiplicity of possible in this context, because it is capable of extracting knowledge from interpretations for the same sentence. large collections of textual documents independently of the domain As Brants and Plaehn [8] observe, the use of automatic tools for as- [5]. By extracting information, we mean that these systems generate sisting annotation of a corpus facilitates rapid semi-automatic corpus structured representation of information in the original documents, annotation in an interactive process. As noisy candidate extractions usually in the form of relational tuples, such as (arg1 , rel, arg2 ), can be easily generated from a corpus based on simple morphosyn- where arg1 and arg2 are the arguments of the relation, usually de- tactic patterns [3, 21, 59] and parsing technology [26, 19, 29], an scribed by noun phrases, and rel a relation descriptor that describes important bottleneck in an Open IE annotation process is deciding the semantic relation between arg1 and arg2 [24]. For example, con- whether a given candidate extraction corresponds to a valid relation sider the sentence: on the corpus. Hence, in this work we aim to construct a tool for assessing the quality/correctness of Open IE extractions, aiming to 1 Federal University of Bahia, FORMAS Research Group, Computer Science assist on semi-automatic construction of corpora for the area for dif- Department, Salvador - Bahia - Brazil, email: dclaro@ufba.br ferent languages. Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 7 While similar classifiers have been proposed before as post- cOIE [53] and DptOIE [29]. processing tools in Second generation Open IE systems, e.g [21, Classification-based tools to asses quality of extractions has been 48, 18, 22], these classifiers are usually constructed in language- employed by different systems [48, 22], mainly following the success dependent manner, for which the generalization to other languages of the ReVerb [21]. These works are based on the manual construc- has not been investigated, and/or generate models which are not eas- tion of language-specific features to assess the quality of extractions, ily interpretable [4, 10]. based on morphosyntactic patterns and grammatical rules for each An important characteristic of our method relies on the fact that language, which seldom generalize to other non-typologically related we explore the use of machine learning methods which generate in- languages. terpretable models. Since in Open IE manual annotation, as observed Language-independent classification methods have been proposed by [30], agreement among annotators can be very low and anno- before [4, 10, 13]. The work of Barbosa and Claro [4] is the clos- tations have to be discussed. Our focus on interpretable models al- est to ours, proposing a set of feature which the authors claim to low for the generation of explanations for the predictions, which can be language-independent for the task of open IE extraction quality be exploited in this process, as well as to generate underlying non- assessment. The authors’ empirical evaluation of their proposed fea- documented rules/hypothesis in the annotation process - as explored ture set on multilingual data and their proposed method is based on by [8]. Support Vector Machine classifiers which are not easily interpreted. Interpretable or explainable models are decision models for which The work of Cabral et al. [11], on the other hand, proposes the use predictions can be traced back to explicit relationships in the data. of multilingual language models, as M-BERT [20] and XLM [36] Recently, the application of neural methods in natural language pro- to perform quality assessment and classification of Open IE extrac- cessing has led to a profound advances in the area. These advances, tions. The authors evaluate their method on multilingual data, but due however, are hard to understand and evaluate, due to opaqueness to the use of opaque language models and classification techniques, of the new models developed in the area. Indeed, several recent re- their predictions are not explainable and, thus, cannot be easily inte- searches [33, 40, 42] show that the predictions made by the systems grated within a semi-automatic annotation process. in the area may be based on spurious or unclear reasons, thus subject to adversarial attacks, and that their reported performance may be explained by unrelated artifacts and regularities on the used datasets, 3 TabOIEC not on the inherent quality of the model. In fact, adversarial examples In this work, our goal is to have an explainable OpenIE triple classi- seem to be an unavoidable characteristic of such methods, a rising fier capable of supporting multiple languages, by changing the train- from their foundation geometric principles [32]. ing dataset. In this Section, we briefly revisit the formulation of Ope- In this work, we propose a classification method to asses the qual- nIE, and the components used in our model. ity of Open IE system extractions aiming to assist on the semi- automatic annotation of data. This method is based on the use of tabular learning methods, i.e. methods specific to deal with tabular 3.1 Problem Definition data and which generate interpretable models. By the use of generic Let X = hx1 , x2 , · · · , xn i be a sentence composed of to- features and multilingual pre-processing tools, our method can be kens xi , an Open IE extractor is a function that maps X directly trained on data from different languages without the need of into a set Y = hy1 , y2 , · · · , yj i as a set of tuples y i = engineering any pre-processing tools. To conduct our experiments, hreli , arg1i , arg2i , · · · , argni i, which describe the information ex- we investigate the application of several different explainable learn- pressed in sentence X. In this work, we consider that the tuples are ing architectures on data from three different languages. This tool always in the format of y = (arg1 , rel, arg2 ), where arg1 and arg2 enables the classification of generated extractions of any previously are noun phrases, not necessarily formed from tokens present in X, developed OpenIE tool, independently of the language or type of im- and rel is a descriptor of a relation holding between arg1 and arg2 . plementation. In Portuguese, this model can trade recall performance We do not consider extractions formed by n-nary extractions. for up to 65% improvement in F1 score. Given a sentence X as above, we are interested in determining for This article is organized as follows: Section 2 presents some re- every extraction yi ∈ Y whether yi is a valid extraction from X , the lated work. Section 3 describes our approach and our methodology. factors that the classifier made their decision well as the confidence Section 4 shows our experiments, results and discussions. Finally, score for such classification . An OpenIE extraction classifier can be Section 5 concludes our paper. expressed as a decision function that for every single sentence X and extractions Y , returns a pair (Z, P ) ∈ {0, 1}|Y | × [0, 1]|Y | , where Z = hz1 , z2 , · · · , zn i is a binary vector s.t. zi = 1 denotes that yi is 2 Related Work a valid extraction, and P = hp1 , p2 , · · · , pn i is a probability vector, Recently, new machine learning-based approaches for Open IE s.t. pi denotes that extraction yi has an associated probability pi of [57, 16, 58, 61] have been proposed, leading to a new generation of being classified as zi , given the input sentence X. Open IE systems. While these systems represent the state-of-the-art in the area, their focus on the English language and need of annotated 3.2 Fine-tuned Multilingual Contextual data make it hard to generalize their results to other languages. For Embedding the Portuguese language, new data-based methods have been pro- posed as a cross-lingual approach due to the lack of resources for In this work, our plan is to create an explainable language-agnostic this task [10]. Early methods use linguistically-inspired patterns for classifier, and for that, we use a Multilingual Contextual Embed- extraction, such as ArgOE [25], or adaptation of methods for the En- dings. Multilingual means that those models represent words of mul- glish language, such as SGS[18], SGC 2017 [55] and RePort [48]. tiples languages into a shared semantic representation space. As Recently, new pattern-based methods have risen as the new state-of- such, these models are able to represent semantic similarities be- the-art for the language [14] such as InferPORToie [54], Pragmati- tween words in different languages. Contextual Embeddings means 8 that the meaning of the word is represented taking its context into process consists of running the feature function and saving the value consideration. obtained to a tabular structure. The process is depicted on Figure 1. One such Multilingual Contextual Embedding is M-BERT [20], a The process is the following: feed the sentence X and the list of 12-layer transformer trained on 104 languages from a Wikipedia with extractions Y to the multilingual words embedding model (in our a shared word piece vocabulary. According to tests conducted by case, the UDify model) to compute the set of features of each token Pires et al. [49], M-BERT is able to transfer knowledge between lan- in the sentence. Afterwards, the indexing step goal is performed to guages with no lexical overlap, an indication that it captures multilin- identify the start and end positions of each relation inside the triple gual representations. It is capable of generating across languages be- arguments through the rest of the sentence. The sentence X and the cause common word pieces such as numbers are mapped to a shared list of extractions Y are inputted to the Algorithm 1. space, spreading the effect to other word pieces, until similar words in different languages are close in the vector space [49]. Input: Original sentence S , arg1 , rel, arg2 The problem with using M-BERT directly is that it does not ful- Output: F eat arg1 , F eatr el, F eat/arg2 fill our requirement of an explainable classifier, due to its ability to F eat sen ← GenerateU dif yF eatures(S) represent tokens in a multidimensional vector of values. One alterna- for part in [ arg1 , rel, arg2 ] do tive is the use of UDify model, a multilingual multi-task model ca- // Check if the string is a substring pable of predicting universal part-of-speech, morphological features, of the original sentence lemmas, and dependency trees across 75 languages [34]. This model if substring(part, S) then uses M-BERT and fine-tunes it on the Universal Dependencies (UD) F eat part ← dataset, as it provides syntactic annotations consistent across a large GetSubsetF eatures(part, F eat sen); collection of languages [43]. UDify is able to represent of syntac- // Extract the features of this part tic knowledge transfer across multiple languages including lemmas from the already generated (LEMMAS), treebank-specific part-of-speech tags (XPOS), univer- features from the whole sentence sal part-of-speech tags (UPOS), morphological features (UFEATS), else and dependency edges and labels (DEPS) for each sentence [34]. // The relation is not a substring Finally, for training our classifier, we use the final output of UDify of the original sentence, thus to extract features of sentences’ inputs and extractions. Those fea- generate new features isolated tures are than tabulated in a specific format so that they can be used F eat part ← GenerateU dif yF eatures(part) in classification algorithms that create rules on a set of predefined at- end tributes. One example of such algorithm is a Decision Tree [9]. This end type of classifier has the characteristic of creating high-interpretable Algorithm 1: Finding Features from a sentence models. This algorithm first generates the features using the original sen- 3.3 Architecture tence and then tries to match the constituent parts of each extracted triple to the original sentence, as shown visually in Figure 1. This is Our general architecture and classifier are illustrated in Figure 1. It necessary due to the way that contextual embeddings work: a word consists of three main steps. Firstly, we pre-process the input, then we will have a different set of features, depending on the full sentence, generate the feature set, and finally we feed the computed features to and we want the representation to be the same as the original sen- a Classifier. Each step is detailed in the subsections below. tence. In some cases, the constituents are not a sub sequence of the origi- 3.3.1 Pre-processing nal sentence, such as in implicit extractions. For example, in the sen- tence “The covid-19 virus is very dangerous”, the triple (Covid-19, is In the pre-processing step our objective is to convert the textual out- a, virus) is valid, however the tokens “is a” are not present directly put of the OpenIE Extractors to a structured format to be processed in the original sentence. This makes it impossible to determine the in the later steps. Relational triple data is textual and its contents can- start and end of the relation extraction in the original sentence. not be used directly in the classification algorithms implemented in In this case, we generate a new embedding as if the individual part TabOIEC. This step is illustrated on Figure 2. is a sentence. The output of the algorithm is F eat arg1 , F eat rel, It first receives a sentence X and a list of extractions Y , each in F eat arg2 , each is an array of features for each constituent of an ex- the form yi = harg1 , rel, arg2 i. The first step is to split the sen- tracted triple. Each array of features is then transformed into a fixed- tence into tokens. For the tokenization step we utilize the Spacy [31] length vector of a manually defined feature as can be seen in Table 1. xx ent wiki sm tokenizer, a Multi-lingual CNN trained on Nothman All features are based on the Universal Dependencies (UD) version et al.[51] Wikipedia corpus. Afterwards we perform the contraction 2.3 tagset and each one is described below: expansion. For example, the English contraction I’m could be tok- enized as the two words I am, and we’ve could become we have. • 1-3 – Relative distance between parts This is needed because in an extraction, different parts of a token Those features represent the relative distance between each con- could appear on different parts of an extraction. stituent part of the relation. The objective of this feature is to cap- ture improbable distances. Analyzing the rules learned by the clas- sifiers, we identified that this feature represents the location of the 3.3.2 Feature Extraction constituents, which together with the features below is a good in- As explainability is a requirement in our classifier, we chose to use dicator if those relationships happen in the correct order. classifiers that work on a fixed set of features. For that, we need to • 4-6 – UPOS features These tags mark the core part-of-speech convert the Sentence and the extractions to a set of features. This (POS) categories. There are in this version of UD, 17 Universal 9 I could only see the ball came in the goalDependency Tree Arg1 UPOS Rel UPOS Arg2 UPOS UPOS Arg1 Dependency Classifier Tree ..... mBERT fine tuned UFeats on UD Original Sentence Tabulated Layer Attention with Extractions Features Figure 1. Architecture overview categories that generalize well across language boundaries. The objective of this feature is to identify valid or invalid relationships Figure 2. TabOIE Pre-processing overview. between different POS in a sentence. For example, the presence of many verbs in the relation increases the probability that the triple is invalid. Because our classifier algorithm requires a fixed set of features, we create a total of 51 features based on those rules. For Indexing each relation we have 17 possible features, one for every single Sentences and Feature construction Pre-processed labeled extractions data UPOS category. • 7-9 – UFeat features Data In the Universal Dependencies (UD), those features distinguish Pre-processing input additional lexical and grammatical properties of words, not cov- ered by the POS tags. In UD version 2.3, 50 different features are available, such as animacy, noun type, evidentiality and type of Metric named entity. This feature could help to identify for example that Evaluation Classifier Calculation a part of a relation has a Named Entity, and this could be an indi- cator of a valid extraction. A list of features is created composed of all combinations between the existing Ufeat and each relation totaling 150 (50*3) possible features; • 10-21 – Dependency tree - Tags and Head location This set of features is the count of each 37 universal syntactic rela- Table 1. Multilingual feature set. tions for each relation and where the head of the relation is located (inside one relation, or OU T if the head is located in a token not N Feature located in any relation). For example, the possible categories are 1 Relative distance between arg1 and rel nsubj (nominal subject) and advmod (adverbial modifier). It is cre- 2 Relative distance between rel and arg2 ated 444 possible features (37 * 12 combinations). This rule is in- 3 Relative distance between arg1 and arg2 4 arg1 count of each UPOS feature spired by the work of Oliveira et al [29]. Where they identify a set 5 rel count of each UPOS feature of hand-crafted rules for Portuguese to identify valid extractions 6 arg2 count of each UPOS feature based on the Dependency Tree. For example, they identify a rule 7 arg1 count of each UFeat feature that a valid extraction might be composed of a subject (arg1), a 8 rel count of each UFeat feature 9 arg2 count of each UFeat feature verbal phrase (rel) (SV) and one or more arguments (arg2). Where 10 arg1 count of each Dependency Tag with Head pointing to arg1 the arg1 have in the dependency tree a nsubj. 11 arg1 count of each Dependency Tag with Head pointing to rel 12 arg1 count of each Dependency Tag with Head pointing to arg2 13 arg1 count of each Dependency Tag with Head pointing to OU T 3.3.3 Classification 14 rel count of each Dependency Tag with Head pointing to arg1 15 rel count of each Dependency Tag with Head pointing to rel 16 rel count of each Dependency Tag with Head pointing to arg2 In this work, we compared the performance of different inter- 17 rel count of each Dependency Tag with Head pointing to OU T pretable models in the classification task for predicting the quality 18 arg2 count of each Dependency Tag with Head pointing to arg1 of Open IE extractions. We compare the performances of the fol- 19 arg2 count of each Dependency Tag with Head pointing to rel lowing methods: CatBoost [50], a gradient boosting method for de- 20 arg2 count of each Dependency Tag with Head pointing to arg2 21 arg2 count of each Dependency Tag with Head pointing to OU T cision trees; SKLearn, the SciKit Learn Learn [47] implementation of Histogram-based Gradient Boosting Classification Tree [41]; Ex- plainable Boosting Machine[44], an Interpretable Gradient Boosting 10 Classifier; SKOPE-Rules, which uses predictive rule generation over zero-shot test, we train the classifier with the whole corpus, exclud- an ensemble of decision trees [23]; and TabNet [1], a tabular-data ing the language to be tested (e.g., the zero-shot test for Portuguese based explainable Neural Network. is trained using the whole English and Spanish corpus and evaluated on the whole Portuguese corpus). Each split on our k-fold strategy is carried on a sentence level. As 4 Experiments a consequence, each split has the same number of sentences, but it In this section, we describe the empirical validation of our pro- may differ on the number of extractions. Our results are a weighted posed method to classify Open IE extractions based on language- average on the number of extracted facts for each test folds using independent features and interpretable models. the Precision (P), Recall (R), F1-measure and the Matthews correla- tion coefficient (MCC) [39]. MCC is employed in machine learning as a quality measure of the classifier. To compute Precision-Recall 4.1 Dataset curves, we select the n extractions with the highest confidence score For comparability, in our experiments we employ the same data used and compute the classifier’s precision. The possible values of con- by Cabral et al. [11] for their multilingual Open IE classifier. This fidence considered were: [0.6, 0.7, 0.8, 0.85, 0.9, 0.93, 0.95, 0.98, dataset is composed of relations extracted by five different Open IE 0.99, 0.995, 0.999]. The code of our experiments is available at systems, namely ClausIE, OLLIE, ReVerb, WOE, and TextRunner, https://github.com/FORMAS/HybridOIEClassifier from texts in Portuguese, English, and Spanish languages, and la- beled as valid or invalid (zi ) by human judges. A valid extraction (zi = 1) corresponds to a coherent triple with the sentence. These 4.3 Results linguistic resources were obtained through the studies of [19] and We consider three evaluation performances. For monolingual learn- [25]. The statistics of the dataset are summarized in Table 2. ing, we provide on Table 3 the precision (Prec.), recall (Recall), F1- measure (F1), Accuracy (Acc) and the Matthews metrics [39] (con- Table 2. Dataset statistics fidence coefficient among the extractions) for each language: Por- tuguese, Spanish and English. It is important to observe that the # Sentences # Extractions Recall measure for an OpenIE task corresponds to the total num- Portuguese 200 1856 ber of triple extraction performed by all systems. We consider this English 500 7093 as a 100% recall. In the scientific community, some researchers are Spanish 159 375 denominating this restriction as a yield measure [17]. For the Portuguese language, the EBM model achieves a recall of 80.8% in comparison with the 100% from the Original model. How- ever, the best precision performance was achieved by the Sklearn 4.2 Experimental Setup model with over 58%. Taking the Spanish language, we observe that Our work uses the AllenNLP [27] library built with the PyTorch [45] the best results were obtained from Sklearn and no impressive re- framework. The fine-tuned model that extract the UD features is the sult gathered from EBM model. The F1 measure surpassed all the UDify [35] with the fine-tuned BERT weights available2 . Portuguese models. For English models, the best F1 results were ob- We implemented our Open IE classifier architecture directly on tained by Catboost model. All confidence coeficients were over 87% top of the AllenNLP. We also test with the following classifiers: of agreement. • Scikit-learn (Sklearn)[46] version 0.23 - A Gradient Boosting Table 3. Metrics scores for languages classifiers Classifier • Catboost[50] version 0.22 - A Gradient Boosting Classifier Prec. Recall F1 Acc. MCC • Skope 3 - A decision rule Classifier Portuguese • Explainable Boosting Machine (EBM) - implementation in Original 0.181 1.000 0.307 0.181 0.000 Best Precision 0.580 0.383 0.459 0.836 0.976 Interpret[44] version 0.1.22 - A Interpretable Gradient Boosting (C:0.6 - Sklearn) Classifier Best F1 0.452 0.581 0.508 0.796 0.986 • TabNet - Attentive Interpretable Tabular Learning[1] Classifier, (C:0.8 - Catboost) Best Acc. / Recall > 0.8 0.290 0.808 0.427 0.605 0.990 version 1.0.64 (C:0.9 - EBM) Spanish Original 0.730 1.000 0.844 0.730 0.000 Among these classifiers, the Skope and Explainable Boosting Ma- Best Precision, F1 0.833 0.948 0.886 0.824 0.966 chine are considered glass-box classifiers, where they output high and Accuracy(C:0.6-Sklearn) English interpretable rules. In addition, with the other classifiers, there are Original 0.454 1.000 0.624 0.454 0.000 blackbox explainers such as SHAP Tree Explainer [38] that are able Best Precision 0.624 0.643 0.633 0.661 0.897 (C:0.6 - Sklearn) to explain their outputs. Best F1 0.567 0.773 0.654 0.628 0.888 For all classifiers we utilize the default hyper-parameters, with (C:0.7 - Catboost) Best Acc. / Recall > 0.8 0.536 0.811 0.645 0.594 0.875 no additional tuning, only the number of epochs was changed to (C:0.9 - Sklearn) 300. For each single-language test, we split our corpus into training and testing using a 5-fold cross-validation strategy. However, for the To evaluate whether our models were able to explore cross-lingual 2 https://github.com/hyperparticle/udify information, i.e. to apply information learned from a set of different 3 https://github.com/scikit-learn-contrib/skope-rules languages to a new language, we also performed zero-shot and one- 4 https://github.com/dreamquark-ai/tabnet shot classification. 11 (a) English (b) Portuguese (c) Spanish Figure 3. Language-specific performance The zero-shot classification is a task where the classifier is eval- uated on a language not seen during the training. For Portuguese, we observe an average decrease in F1 performance between 3% (for SKOPE) and 16% (Sklearn and TabNet) on all models, with max- imum decrease of 19% on CatBoost at confidence 0.8. Similar be- haviour has been observed for zero-shot classification for Spanish - between 2% SKOPE and 20% Sklearn, maximun decrease of 52% with Catboost at 0.6 - and English - betwenn 1% for SKOPE and 15% for SkLearn, with maximum decrease of 24% for SkLearn ate 0.7. The one-shot classification is a task where the classifier is trained with the data from other languages and part of data on the target language, and tested on the remaining (unseen) data for the target language. For Portuguese, we observe an average decrease in F1 per- formance between 3% (for SKOPE) and 10% (Sklearn) on all mod- els, with maximum decrease of 15% on SKLearn at confidence 0.99. Similar behaviour has been observed for one-shot classification for Spanish - between 0% SKOPE and 10% TabNet, maximun decrease of 13% with TabNet at 0.7 - and English - betwenn 0% for SKOPE and 1% for EMB, with maximum decrease of 1% for EMB 0.8. Figure 4. Comparison of the performances for Monolingual, Zero-shot and One-shot 4.4 Discussion While in the single language experiments, results of the classifier are more robust, in the sense that the decline in Precision is much more nuanced for almost all representations in the three languages, in the zero-shot experiments, however, this decline is much more 12 pronounced. These results indicate that there may be a discrepancy ACKNOWLEDGEMENTS between the datasets for each language regarding the relations ex- tracted. This discrepancy may arise from the fact that the datasets We would like to thank CNPQ and CAPES for their financial support. were created using (i) different Open IE systems for each language (ii) annotated by different teams at different times, and (iii) using texts of different linguistic styles - for English, encyclopedic, jour- REFERENCES nalistic and user-generated (Web pages), for Spanish and Portuguese, encyclopedic texts- and domains - multiple domains for English and [1] Sercan O. Arik and Tomas Pfister. Tabnet: Attentive interpretable tabu- Spanish and domain-specific for Portuguese. It may also be the case lar learning, 2019. that linguistic parameters of each language, such syntactic struc- [2] Michele Banko, Michael J Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni, ‘Open information extraction for the ture and stylistic choices of each language community, may play an web’, in IJCAI, volume 7, pp. 2670–2676, (2007). important role on structuring information through language and, as [3] Michele Banko, Oren Etzioni, and Turing Center, ‘The tradeoffs be- such, on how this information is extracted. tween open and traditional relation extraction.’, in ACL, volume 8, pp. It is also worth noticing that the English dataset is considerably 28–36, (2008). larger than both datasets for Spanish and Portuguese, thus in the zero- [4] George Caique Gouveia Barbosa and Daniela Barreiro Claro, ‘Uti- lizando features linguı́sticas genéricas para classificação de triplas rela- shot learning, it may dominate the training process and can overfit the cionais em português’, in Proceedings of the 11th Brazilian Symposium classifier to the English dataset-specific characteristics. As such, ex- in Information and Human Language Technology, pp. 132–141, (2017). periments with a higher number of languages to provide the classifier [5] David Soares Batista, David Forte, Rui Silva, Bruno Martins, and Mário with a more diverse set of examples is recommended. Silva, ‘Extracçao de relaçoes semânticas de textos em português explo- rando a dbpédia e a wikipédia’, Linguamatica, 5(1), 41–57, (2013). Considering the multilinguality, we observe that our monolingual [6] Emily Bender, ‘English isn’t generic for language, despite what nlp pa- model is slightly better than the model trained for three languages, pers might lead you to believe’, in Symposium and Data Science and except for the English one. Our results corroborate with the findings Statistics, (2019). [Online; accessed 15-may-2020]. of [52] which mention the curse of multilinguality from [15] which [7] Emily M. Bender, ‘Linguistically naı̈ve != language independent: Why states that adding mode languages to a model can degrade the perfor- NLP needs linguistic typology’, in Proceedings of the EACL 2009 Workshop on the Interaction between Linguistics and Computational mance as the capacity of the model remain the same. For the English Linguistics: Virtuous, Vicious or Vacuous?, pp. 26–32, Athens, Greece, language, there is no significant difference from training with mono- (March 2009). Association for Computational Linguistics. lingual nor multilingual (i.e. three languages) approach. [8] Thorsten Brants and Oliver Plaehn, ‘Interactive corpus annotation’, Observing our results on zero-shot learning, it is important to no- in Second International Conference on Language Resources and Evaluation LREC-200, (2000). tice that all three languages achieve a slight learning rate, increasing [9] Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Ol- the original performance indicating a limited but possible exploration shen, Classification and regression trees, CRC press, 1984. of cross-language information. For dissimilar languages such as in [10] Cabral B.S., Glauber R., Souza M., and Claro D.B., ‘Crossoie: Cross- the case of training in the extraction from Spanish and Portuguese lingual classifier for open information extraction’, in Computational sentences and testing on extractions from English sentences, the re- Processing of the Portuguese Language (PROPOR 2020), ed., Aluı́sio S. Moniz H. Batista F. Gonçalves T. Quaresma P., Vieira R., vol- sults are less conclusive due probable to their dissimilar linguistic ume 12037 of Lecture Notes in Computer Science, 201–213, Springer, characteristics. Our intuition is that if the models are presented with Cham, (February 2020). examples of varied linguistic characteristics, the classifier can be ap- [11] Bruno Souza Cabral, Rafael Glauber, Marlo Souza, and Daniela Bar- plied to a wide range of low-resource languages - facilitating the de- reiro Claro, ‘Crossoie: Cross-lingual classifier for open information ex- traction’, in International Conference on Computational Processing of velopment of computational linguistic resources in these languages. the Portuguese Language, pp. 368–378. Springer, (2020). [12] Xilun Chen, Ahmed Hassan Awadallah, Hany Hassan, Wei Wang, and Claire Cardie, ‘Zero-resource multilingual model transfer: Learning 5 Conclusion and Future Work what to share’, arXiv preprint arXiv:1810.03552, (2018). [13] D.B. Claro, M. Souza, C. Castellã Xavier, and L. Oliveira, ‘Multi- lingual open information extraction: Challenges and opportunities’, In this work, we presented the TabOIEC, a language-independent ex- Information, 10(7), 228, (2019). plainable relation extraction binary classifier. The evaluation results [14] Sandra Collovini, Joaquim Santos, Bernardo Consoli, Juliano Terra, Renata Vieira, Paulo Quaresma, Marlo Souza, Daniela Barreiro Claro, demonstrated that a single model could improve the output of multi- and Rafael Glauber, ‘Iberlef 2019 portuguese named entity recognition ple state-of-art systems across three languages: Portuguese, English, and relation extraction tasks’, in Proceedings of the Iberian Languages and Spanish. Our results give evidence that simple and explainable Evaluation Forum (IberLEF 2019), volume 2421, pp. 390–410. CEUR- models for extraction quality assessment could be a useful resource WS.org, (2019). for the construction of Open IE datasets systems for different lan- [15] Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaud- hary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle guages. Ott, Luke Zettlemoyer, and Veselin Stoyanov. Unsupervised cross- In the future, we plan to evaluate the use of hand-crafted features lingual representation learning at scale, 2019. by linguist experts. Another point of improvement would test the so- [16] Lei Cui, Furu Wei, and Ming Zhou, ‘Neural open information extrac- lution in larger datasets, and utilize some techniques to improve the tion’, arXiv preprint arXiv:1805.04270, (2018). [17] Leandro Souza de Oliveira, Rafael Glauber, and Daniela Barreiro Claro, classifier such as Fine-tuning the classifier on the Open IE tuples. ‘Dependentie: An open information extraction system on portuguese by Once mature, we intend to employ the trained models in an an- a dependence analysis’, Encontro Nacional de Inteligência Artificial e notation tool, allowing the creation of Open IE and Relation Extrac- Computacional, (2017). tion datasets for different languages. With such a tool, we aim to en- [18] Erick Nilsen Pereira de Souza, Daniela Barreiro Claro, and Rafael courage the development of Relation Extraction techniques and tech- Glauber, ‘A similarity grammatical structures based method for improv- ing open information systems’, J. UCS, 24, 43–69, (2018). nology for different languages, given the importance of Information [19] Luciano Del Corro and Rainer Gemulla, ‘Clausie: clause-based open Extraction technology for the development of advanced intelligent information extraction’, in Proceedings of the 22nd international systems and interfaces. conference on World Wide Web, pp. 355–366. ACM, (2013). 13 [20] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, (BBA)-Protein Structure, 405(2), 442–451, (1975). ‘Bert: Pre-training of deep bidirectional transformers for language un- [40] Tom McCoy, Ellie Pavlick, and Tal Linzen, ‘Right for the wrong rea- derstanding’, arXiv preprint arXiv:1810.04805, (2018). sons: Diagnosing syntactic heuristics in natural language inference’, [21] Anthony Fader, Stephen Soderland, and Oren Etzioni, ‘Identifying in Proceedings of the 57th Annual Meeting of the Association for relations for open information extraction’, in Proceedings of the Computational Linguistics, pp. 3428–3448, Florence, Italy, (July 2019). Conference on Empirical Methods in Natural Language Processing, pp. Association for Computational Linguistics. 1535–1545. Association for Computational Linguistics, (2011). [41] Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming [22] Tobias Falke, Gabriel Stanovsky, Iryna Gurevych, and Ido Dagan, Ma, and Tie-Yan Liu, ‘A communication-efficient parallel algorithm for ‘Porting an open information extraction system from english to ger- decision tree’, in Advances in Neural Information Processing Systems, man’, in Proceedings of the 2016 Conference on Empirical Methods in pp. 1279–1287, (2016). Natural Language Processing, pp. 892–898, (2016). [42] Timothy Niven and Hung-Yu Kao, ‘Probing neural network com- [23] Jerome H Friedman, Bogdan E Popescu, et al., ‘Predictive learning prehension of natural language arguments’, CoRR, abs/1907.07355, via rule ensembles’, The Annals of Applied Statistics, 2(3), 916–954, (2019). (2008). [43] Joakim Nivre, Željko Agić, Lars Ahrenberg, Lene Antonsen, Maria Je- [24] Pablo Gamallo, ‘An Overview of Open Information Extraction (In- sus Aranzabe, Masayuki Asahara, Luma Ateyah, Mohammed Attia, vited talk)’, in 3rd Symposium on Languages, Applications and Aitziber Atutxa, Liesbeth Augustinus, et al. Universal dependencies Technologies, eds., Maria João Varanda Pereira, José Paulo Leal, 2.1, 2017. LINDAT/CLARIAH-CZ digital library at the Institute of and Alberto Simões, volume 38 of OpenAccess Series in Informatics Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and (OASIcs), pp. 13–16, Dagstuhl, Germany, (2014). Schloss Dagstuhl– Physics, Charles University. Leibniz-Zentrum fuer Informatik. [44] Harsha Nori, Samuel Jenkins, Paul Koch, and Rich Caruana, ‘Inter- [25] Pablo Gamallo and Marcos Garcia, ‘Multilingual open information ex- pretml: A unified framework for machine learning interpretability’, traction’, in Portuguese Conference on Artificial Intelligence, pp. 711– arXiv preprint arXiv:1909.09223, (2019). 722. Springer, (2015). [45] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James [26] Pablo Gamallo, Marcos Garcia, and Santiago Fernández-Lanza, Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia ‘Dependency-based open information extraction’, in Proceedings of the Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward joint workshop on unsupervised and semi-supervised learning in NLP, Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chil- pp. 10–18. Association for Computational Linguistics, (2012). amkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chin- [27] Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep tala, ‘Pytorch: An imperative style, high-performance deep learning li- Dasigi, Nelson F. Liu, Matthew Peters, Michael Schmitz, and Luke S. brary’, in Advances in Neural Information Processing Systems 32, eds., Zettlemoyer, ‘Allennlp: A deep semantic natural language processing H. Wallach, H. Larochelle, A. Beygelzimer, F. d Alché-Buc, E. Fox, and platform’, (2017). R. Garnett, 8024–8035, Curran Associates, Inc., (2019). [28] Rafael Glauber and Daniela Barreiro Claro, ‘A systematic map- [46] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, ping study on open information extraction’, Expert Systems with O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander- Applications, 112, 372–387, (2018). plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch- [29] Rafael Glauber, Daniela Barreiro Claro, and Leandro Souza de Oliveira, esnay, ‘Scikit-learn: Machine learning in Python’, Journal of Machine ‘Dependency parser on open information extraction for portuguese Learning Research, 12, 2825–2830, (2011). texts - dptoie and dependentie on iberlef’, in Proceedings of the Iberian [47] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Languages Evaluation Forum (IberLEF 2019), volume 2421, pp. 442– Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Pret- 448. CEUR-WS.org, (2019). tenhofer, Ron Weiss, Vincent Dubourg, et al., ‘Scikit-learn: Machine [30] Rafael Glauber, Leandro Souza de Oliveira, Cleiton Fernando Lima learning in python’, the Journal of machine Learning research, 12, Sena, Daniela Barreiro Claro, and Marlo Souza, ‘Challenges of 2825–2830, (2011). an annotation task for open information extraction in portuguese’, [48] Victor Pereira and Vládia Pinheiro, ‘Report-um sistema de extração in International Conference on Computational Processing of the de informações aberta para lı́ngua portuguesa’, in Proceedings of Portuguese Language, pp. 66–76. Springer, (2018). Symposium in Information and Human Language Technology, pp. [31] Matthew Honnibal and Ines Montani, ‘spaCy 2: Natural language un- 191–200. Sociedade Brasileira de Computação, (2015). derstanding with Bloom embeddings, convolutional neural networks [49] Telmo Pires, Eva Schlinger, and Dan Garrette, ‘How multilingual is and incremental parsing’. To appear, 2017. multilingual bert?’, arXiv preprint arXiv:1906.01502, (2019). [32] Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, [50] Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Brandon Tran, and Aleksander Madry, ‘Adversarial examples are not Anna Veronika Dorogush, and Andrey Gulin, ‘Catboost: Unbi- bugs, they are features’, in Advances in Neural Information Processing ased boosting with categorical features’, in Proceedings of the 32nd Systems, pp. 125–136, (2019). International Conference on Neural Information Processing Systems, [33] Robin Jia and Percy Liang, ‘Adversarial examples for evaluating read- NIPS’18, p. 6639–6649, Red Hook, NY, USA, (2018). Curran ing comprehension systems’, in Proceedings of the 2017 Conference on Associates Inc. Empirical Methods in Natural Language Processing, pp. 2021–2031, [51] William Radford, Joel Nothman, Matthew Honnibal, James R Curran, Copenhagen, Denmark, (September 2017). Association for Computa- and Ben Hachey, ‘Document-level entity linking: Cmcrc at tac 2010.’, tional Linguistics. in TAC, (2010). [34] Dan Kondratyuk and Milan Straka, ‘75 languages, 1 model: Pars- [52] Nils Reimers and Iryna Gurevych. Making monolingual sentence em- ing universal dependencies universally’, in Proceedings of the 2019 beddings multilingual using knowledge distillation, 2020. Conference on Empirical Methods in Natural Language Processing and [53] Cleiton F. L. Sena and D. B. Claro, ‘Pragmaticoie: a prag- the 9th International Joint Conference on Natural Language Processing matic open information extraction for portuguese language’, (EMNLP-IJCNLP), pp. 2779–2795, Hong Kong, China, (2019). Asso- Knowledge and Information Systems, 201–213, (February 2020). ciation for Computational Linguistics. https://doi.org/10.1007/s10115-020-01442-7. [35] Daniel Kondratyuk, ‘75 languages, 1 model: Parsing universal depen- [54] Cleiton Fernando Lima Sena and Daniela Barreiro Claro, ‘Inferpor- dencies universally’, arXiv preprint arXiv:1904.02099, (2019). toie: A portuguese open information extraction system with inferences’, [36] Guillaume Lample and Alexis Conneau, ‘Cross-lingual language model Natural Language Engineering, 25(2), 287–306, (2019). pretraining’, arXiv preprint arXiv:1901.07291, (2019). [55] Cleiton Fernando Lima Sena, Rafael Glauber, and Daniela Barreiro [37] William Léchelle, Fabrizio Gotti, and Philippe Langlais, ‘Wire57: Claro, ‘Inference approach to enhance a portuguese open informa- A fine-grained benchmark for open information extraction’, arXiv tion extraction’, in Proceedings of the 19th International Conference preprint arXiv:1809.08962, (2018). on Enterprise Information Systems - Volume 1: ICEIS,, pp. 442–451, [38] Scott M Lundberg, Gabriel G Erion, and Su-In Lee, ‘Consistent in- Porto, Portugal, (2017). INSTICC, ScitePress. dividualized feature attribution for tree ensembles’, arXiv preprint [56] Gabriel Stanovsky and Ido Dagan, ‘Creating a large benchmark for arXiv:1802.03888, (2018). open information extraction’, in Proceedings of the 2016 Conference [39] Brian W Matthews, ‘Comparison of the predicted and observed sec- on Empirical Methods in Natural Language Processing, pp. 2300–2305, ondary structure of t4 phage lysozyme’, Biochimica et Biophysica Acta (2016). 14 [57] Gabriel Stanovsky, Julian Michael, Luke Zettlemoyer, and Ido Da- gan, ‘Supervised open information extraction’, in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 885–895, (2018). [58] Mingming Sun, Xu Li, Xin Wang, Miao Fan, Yue Feng, and Ping Li, ‘Logician: a unified end-to-end neural approach for open-domain infor- mation extraction’, in Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 556–564. ACM, (2018). [59] Clarissa Castellã Xavier, Vera Lúcia Strube de Lima, and Marlo Souza, ‘Open information extraction based on lexical-syntactic patterns’, in Intelligent Systems (BRACIS), 2013 Brazilian Conference on, pp. 189– 194. IEEE, (2013). [60] Clarissa Castellã Xavier, Vera Lúcia Strube de Lima, and Marlo Souza, ‘Open information extraction based on lexical semantics’, Journal of the Brazilian Computer Society, 21(1), 1–14, (2015). [61] Sheng Zhang, Kevin Duh, and Benjamin Van Durme, ‘Mt/ie: Cross- lingual open information extraction with neural sequence-to-sequence models’, in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 64–70, (2017). 15