Combining Semantic Parsing Frameworks for Automated Knowledge Base Construction⋆ Martin Verrev1,*,† 1 Tallinn University of Technology, Akadeemia Tee 15a, Tallinn, 12618, Estonia Abstract One of the crucial tasks for constructing a knowledge base for commonsense question answering is to automate extracting background knowledge from unstructured sources. While structured data sources like ConceptNet, Quasimodo, ATOMIC, and ontologies like WordNet provide facts and simple rules, they contain a lot of un-parsed English phrases and also lack most of the common everyday knowledge that everyone is expected to know. For both enriching these knowledge bases and asking questions using natural language, we need to perform semantic parsing of natural language phrases and sentences in a way that would be compatible with the structured data sources used. The contribution of this paper is combining AMR and UD notations to perform the said task - extracting the meaning from unstructured texts and representing it as knowledge graphs that are transformed into first-order logic formulae that can then be used for answering questions on the provided passage. The paper provides an example of such an experimental system, intended to be used with the Graph Knowledge (GK) logic engine. Keywords knowledge extraction, natural language understanding, commonsense reasoning, meaning representa- tions 1. Introduction Question-answering systems that rely purely on vector-based approaches struggle with an- swering questions based on commonsense knowledge, the most apparent shortcoming being a lack of transparency and interpretability while performing inference. The results obtained may be caused either by actual correlations or superficial cues that have been demonstrated to exist in commonsense reasoning benchmarks. On the other hand, humans are capable of solving tasks that need reasoning based on often incomplete information. That is possible due to having prior commonsense knowledge - facts about the everyday world everyone is expected to know. Natural language is the medium of capturing such knowledge. Such representations capture the meaning of a sentence as understood by a native speaker to the point where it can be used to train a system for automated reasoning. The paper describes one such system used for constructing such a knowledge base. In addition, the paper describes experiments conducted to measure the suitability of such a system. 6th Workshop on Advances In Argumentation In Artificial Intelligence (AI 2022), November 28 – December 2, 2022, University of Udine, Udine, Italy ⋆ You can use this document as the template for preparing your publication. We recommend using the latest version of the ceurart style. $ martin.verrev@taltech.ee (M. Verrev)  0000-0003-4890-9283 (M. Verrev) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 1.1. The Task of Semantic Parsing Semantic parsing is a subfield of natural language understanding that maps natural-language utterances to detailed representations of their meaning. Such representations reflect the meaning of a sentence as understood by a native speaker to the point where it can be used for question answering or automated reasoning - one of such systems being GK [1] that uses JSON-LD- LOGIC – a notation that combines first-order logic with JSON that is compatible with both the JSON-LD standard and the TPTP format [2]. This enables the programmatic management of logical problems and provides a specification for the output format for the knowledge base so created. The format thus allows encoding knowledge from different sources in a unified format. Semantic Enrichment Automated COMMONSENSE MEANING LOGICAL Parsing and Trans- Reasoning KNOWLEDGE REPRESENTATIONS FORMS formation Figure 1: A broad overview of constructing a knowledge-base in the context of automated reasoning. Given a passage, it is transformed into a graph representation via the task of semantic parsing. As in general current semantic parsers use the sentence as a unit of information, a sentence segmentation task is performed. A context is created during the enhancement and transformation phase to overcome that limitation. The intermediate semantic parse result can be enriched either with external information - e.g. Quasimodo 1 , DBpedia 2 , Wikidata 3 or any other structured datasource. In addition, translation rules are applied during this stage to provide uniform representations across multiple datasources. The representations are converted to first-order-logic formulate that can be saved and restored to be run on an automated reasoner for question answering. 2. Methodology For constructing the experimental system, the following methodology was used. See our previous paper [3] for the details of steps 1-3. 1. During the preliminary phase, a review was conducted of existing semantic parsing frameworks to identify the key features and attributes of each framework. 2. Parsers were identified and chosen for each framework. Exclusion criteria were developed and applied to select a set of parsers for further evaluation. 3. For evaluating the parsers a corpus was defined and measures specified. Experiments were conducted to evaluate the parsers. 4. As a result the parsers and models were chosen: AMR for semantic representation and UD for syntactic and dependency parse trees. As both AMR and UD use sentences as a unit of meaning, a context was constructed that refers to the whole discourse. 1 https://quasimodo.mpi-inf.mpg.de/ 2 https://github.com/dbpedia/ 3 https://www.wikidata.org/ 5. For constructing the context, the classes of sentences were identified: conceptual, that do not describe specific situations but general concepts; factual which describe named entities and situational - describing concrete situations. A dictionary was constructed to map PropBank roles to predicates based on the sentence type. 6. Logical forms in JSON-LD-LOGIC were constructed based on the passage provided. An experimental system was deployed. 3. Identification of Technologies During the initial phase, a survey and a set of experiments were conducted by the author to identify the features of common semantic parsing frameworks: see [3] for additional details. The findings indicated that a hybrid system should be constructed that uses a set of parsers: optionally UDS parser for pre-processing the input passage for simplifying the sentence structure; AMR for unanchored representations to generalize the syntactic layer; and UD to constrain and specify the generalized graphs for the passage provided. 3.1. Identification of Frameworks The following frameworks were identified during the preliminary phase of the study: AMR is a notation based on propositional logic and neo-Davidsonian semantics. AMR encodes the semantics of a sentence into a directed graph. Nodes in the graph represent semantic terms in the sentence and the edges identify the semantic relations between nodes. Concepts can be either English verbs, PropBank framesets, or specific keywords. AMR includes frame arguments based on Propbank conventions, general semantic relations; and relations for quantities, date-entities, and lists. It lacks universal quantifiers and support for inflectional morphology. Annotations do not link between concept and original span – thus input has to be aligned first [4]. Universal Conceptual Cognitive Annotation (UCCA) is a language-agnostic annotation scheme based on Basic Linguistic Theory where natural language utterances are converted to a graph containing purely semantic categories and structure where the base layer is a scene describing movement, action, or event [5]. The focus of UCCA has been ease of annotation and the foundational layer can be extended by extra domain or language specific layers. Universal Dependencies (UD) is a framework for consistent annotation of grammar. It solves the problem of different corpora having different tagsets and annotation schemes [6] by providing a universal scheme and category set: suitable for parser development, cross-lingual support, and language parsing, allowing language-based extensions if necessary [7]. It combines Stanford Dependencies, Google universal part-of-speech tags, and Interset interlingua4 for morphosyntactic tagsets - e.g. encoding additional morphological information to the syntactic form [8]. UD corpora consist of over 200 treebanks in over 100 languages. Elementary Dependency Structures (EDS) present an approach to Minimal Recursion Semantics (MRS) banking. MRS is an approach where each input item in a corpus is paired with elementary predicates - meaning single relation with its associated arguments - followed by 4 https://ufal.mff.cuni.cz/interset manual disambiguation of quantifiers [9]. The semantic form is based on the notion of semantic discriminants - local dependencies extracted from full-fledged semantic representation [10]. Prague Tectogrammatical Graphs (PDT) provides annotations in English and Czech languages. The English sentences are from a complete English Web Text Treebank 5 and a parallel Czech corpus morphologically annotated and parsed into surface-syntax dependency trees in the Prague Dependency Treebank (PDT) 2.0 annotation style based on the same sentences. Noteworthy is the annotations having multiple layers - an analytical (surface-syntax) layer consisting of dependency structures, semantic labels, argument structure, and ellipsis resolution; and a manually constructed deep-syntax tectogrammatical layer on top of that [11]. Discourse Representation Structures (DRS) is a semantic formalism based on Discourse Representation Theory. In contrast to ordinary treebanks, the units of annotation in the corpus are texts rather than isolated sentences. [12]. Basic DRSs consist of discourse referents like 𝑥 representing entities and discourse conditions like 𝑚𝑎𝑛(𝑥) representing information about discourse referents. [13] The corpus is based on Groningen Meaning Bank that annotates English texts with formal meaning representations rooted in Combinatory Categorial Grammar[14]. Universal Decompositional Semantics (UDS) framework is different from other for- malisms because it decodes the meaning in a feature-based scheme — using continuous scales rather than categorical labels. The meaning is captured as a node- and edge-level attribute in a single semantic graph having the structure deterministically extracted from Universal Dependencies. UDS treats parsing as a sequence-to-graph problem - the graph nodes created are based on input sequence, and edges are dynamically added during generation [15]. 3.1.1. Choosing the Frameworks We follow the classification of Kollar [16] for generated dependency graphs based on the relation of graph elements to surface tokens. For bi-lexical dependency graphs (type 0), the graph nodes correspond to surface lexical units. Anchored semantic graphs (type 1) are characterized by relaxing the correspondence relations between nodes and tokens while still explicitly annotating the correspondence between nodes and parts of the sentence. For unanchored dependency graphs (type 2), the correspondence between the nodes and tokens is not explicitly annotated. Table 1 Summary of Semantic Representation Frameworks Name Unit of Annotation Flavor Format Primary Languages AMR sentence unanchored Penman English (+Chinese) UCCA sentence anchored XML English (+4 others) UD sentence bi-lexical CoNLL-U multilingual EDS sentence anchored DAG English PTG sentence anchored DAG English, Czech DRS passage anchored nested boxes English UDS sentence anchored Predicates + UD (CoNLL-U) English 5 https://catalog.ldc.upenn.edu/LDC2015T13 A summary of semantic parsing frameworks is presented in Table 2 with chosen frameworks highlighted. AMR was chosen as an un-anchored representation, providing the highest level of abstraction from surface tokens. From anchored frameworks that provide a level of abstraction from the surface form but still retain portions of it, UCCA supports the single sentence as a unit of information, and DRS for supporting a passage consisting of multiple sentences was chosen. In addition, UDS was added to the test battery due to deterministically including UD bi-lexical annotations in addition to extracting predicates from the input sentence. 3.2. Identification of Parsers A representative sample of parsers was identified for conducting the experiments that are summarized in Table 2 with chosen parsers highlighted. The following inclusion criteria were applied: availability - only publicly available parsers were included; development activity - how active and up-to-date the development of said parser is and accuracy - the parsing accuracy of said tool based on literature; During the initial review, 16 parsers were identified. For conducting the experiments the following exclusion criteria were applied: interoperability As the parsers will be integrated into an existing workflow, those not using Python were discarded; freshness The parsers where the last commit was made after January 2020 were discarded; lightness - given a choice between otherwise matching and equally performant parsers, the more lightweight one was chosen. Table 2 Summary of Semantic Parsers Notation Parser Platform Stars Forks Commits Last Commit AMR JAMR6 scala 192 50 825 March 2019 AMR Transition AMR parser7 python 144 35 1838 November 2022 AMR amrlib8 python 146 22 166 March 2022 UCCA UCCA parser9 python 18 7 6 June 2019 UCCA TUPA10 python 73 22 2135 December 2020 UD UDepLambda11 java 85 22 225 July 2018 UD uuparser12 python 77 26 125 October 2020 UD stanza13 python 6400 830 3146 September 2022 EDS Pydelphin14 python 68 24 1043 October 2022 EDS HRG Parser15 python,java 9 0 4 October 2018 PTG Perin16 python 41 5 36 Oct 04 2021 DRS TreeDRSparsing17 python 5 2 59 March 2020 DRS EncDecDRSParsing18 python 36 11 15 August 2019 UDS Predpatt19 python 110 23 59 February 2021 UDS MISO20 python 7 1 1019 September 2021 6 https://github.com/jflanigan/jamr 7 https://github.com/IBM/transition-amr-parser 3.3. Evaluation of Parsing Frameworks Initial experiments were conducted to measure the performance of said parsers. For this, a test corpus was constructed to evaluate the robustness of chosen parsers. The sources for sentences were: CommonsenseQA21 (312 sentences), Geoquery Data 22 (5 sentences) and synthetic examples capturing the essential linguistic features for translating the text to logical form (32 sentences). These features are: handling simple facts; extraction of predicates from traditional set theory; extraction of universal and existential quantifiers; handling of negation; handling logical connectives: conjunction, disjunction, and implication; handling of equality; handling of multiple variables and identification and extraction of questions. After initial experiments, the baseline test corpus was pruned, and as a result, a minimal corpus - consisting of 58 sentences (594 tokens) remained. The corpus and output data can be found at https: //cs.taltech.ee/research/commonsense/. Due to a variety of annotation schemes and not having the ‘correct’ gold-label annotations, two evaluation measures were defined: granularity and robustness. Granularity was defined as the ratio of tokens in the input sentence compared to the number of semantic attributes captured, averaged over the whole corpus. To evaluate robustness, human evaluation was conducted by the author. Each result was manually graded on a scale of 0..1. If the information captured was deemed complete and accurate, it was graded 1. If a portion of information was missing, it was graded 0.5. If arbitrary or non-relevant information was added – as it is hard to detect such errors in the knowledge base – the score was lowered by 0.3 points. If essential information was not present or the parse failed, the grade was 0. For each framework, the grades were averaged over the whole corpus. The results of an experiment are documented in Table 3. The robustness metric suggests that all frameworks chosen performed similarly well - though none achieved flawless results - providing incomplete parsing results. The granularity metric indicates additional information generated by the parsing process - a lower value indicates additional information being added - temporal variables in the case of UDS and named entities for AMR. The usefulness of such information is dependent on the context of the results being used. 8 https://github.com/bjascob/amrlib 9 https://github.com/SUDA-LA/ucca-parser 10 https://github.com/danielhers/tupa 11 https://github.com/sivareddyg/UDepLambda 12 https://github.com/UppsalaNLP/uuparser 13 https://github.com/stanfordnlp/stanza 14 https://github.com/delph-in/pydelphin 15 https://github.com/draplater/hrg-parser 16 https://github.com/ufal/perin 17 https://github.com/LeonCrashCode/TreeDRSparsing 18 https://github.com/EdinburghNLP/EncDecDRSparsing 19 https://github.com/hltcoe/PredPatt 20 https://github.com/esteng/miso_uds 21 https://huggingface.co/datasets/commonsense_qa 22 https://www.cs.utexas.edu/users/ml/nldata/geoquery.html Table 3 Comparative summary of robustness and granularity values for semantic parsing frameworks Framework Parser Model Robustness Granularity AMR AMRLib Parse T5 v0.2.0 0.94 0.14 UCCA TUPA ucca-bilstm-1.3.1 0.96 0.19 UDS PredPatt UDS 1.0 0.92 0.08 DRS TreeDrsParser built-in 0.92 0.02 4. Combining the Parsers Due to the ambiguous nature of natural language, no parser alone was ideally suitable for the task. On the other hand, all the parsers chosen performed well in different aspects of capturing the meaning. The robustness of UDS is suitable for preprocessing the input - simplifying the structure of the sentence, and splitting it into key components. At the same time, the splitting boundary for some input sentences seemed arbitrary. On the other hand, AMR is suitable for explicit negation extraction and question detection. Additionally, the Penman output format is suitable for further post-processing due to its rigid yet flexible structure. On the other hand - adding additional hand annotations is not a viable option, and for the current task, we must rely on publicly available annotations. UCCA performed the best on the correctness scale but did not explicitly state negation and entity recognition. At the same time, due to annotation tooling, it is possible to implement the required layers if deemed necessary. To evaluate using the parsers in parallel a hybrid system for representing knowledge in first- order logic using said parsers in an ensemble. AMR was chosen for unanchored representations to generalize the syntactic layer, and UD to constrain and specify the generalized graphs. Amrlib was chosen with Parse T5 0.20.0 model23 . Stanza, a Python NLP toolkit was used for text processing tasks: text tokenization, named entity recognition, part-of-speech tagging, constituency parsing, dependency parsing, and lemmatization. 4.1. Conversion to Logical Form We will follow the method outlined by Hatzilygeroudis [17] to construct the logical form from the sentence: find predicates and specify their arguments; construct corresponding atoms; divide atoms on the same level into groups; specify connectives between atoms of each group and construct corresponding formulas; divide formulas and/or any of the remaining atoms of the same level into groups; specify connectives between elements of each group; specify quantifiers for the variables and finally construct the final FOL formula. The pipeline is the following: 1. Given an input passage tokenize it into sentences. 2. For each sentence extract its semantic representation via AMR and syntactic structure via UD 23 https://github.com/bjascob/amrlib-models/releases/download/model_parse_t5-v0_2_0/model_parse_t5-v0_2_0. tar.gz 3. Perform sentence classification. If no sentence type can be determined, ignore it and store it for further analysis. 4. Create a local context for the sentence 5. Combine the contexts for individual sentences. 6. Use AMR representation with global context to construct the logical clauses. 7. (Not implemented) Given a question parse the question and answer it on generated clauses. If given, it is assumed the questions are provided at the end of the passage. The naive approach assumes that the question ends with a question mark or starts with one of the seven question words in English. 4.2. Limitations of the Pure AMR Based Approach AMR parse graphs are represented in Penman notation with well-formed results that can easily be parsed to first-order logic [18] as demonstrated by amr2fol 24 project. Still, AMR has several constraints when parsing natural language and fails unexpectedly. This is not due to the limitations of AMR itself but more due to the ambiguous nature of natural language. Given the general domain, it is not possible to craft the rules or train the model that covers the full scope of the language. Two types of inconsistencies were recognized when conducting the preliminary experiments: informational where incorrect or wrong type information is inferred. Generally, this happened with named entity recognition; and structural, where the structure of the parse tree does not reflect its true meaning, e.g ‘black and white’ interpreted as a single property in a listing of colors. In addition, it was recognized that AMR does not support scenes (in contrast to UCCA and DRS). Thus, a manual context creation for coreference resolution similar to the approach described for DocAMR representation a [19] was implemented. 4.3. Passage Context To overcome the limitation of scene support, the context was created for a sequence of sentences similar to [20]. For describing situations it was assumed that the sequence of sentences follows the sequence of events. Additionally, the context keeps track of concepts and named entities: having "John" occurring in a passage and later "he" we can assume that "he" refers to "John" when answering questions. A sample of context object is presented in Figure 2. 4.4. Translation Rules A minimal ontology was constructed to provide interoperability across several contexts. Based on sentence class, mappings were generated for AMR core attributes. The following rules were applied 1. For verbs, lemmatized version was used as a predicate: ’walked’ → ’walk-01’ → ’walk’ 24 https://github.com/papagandalf/amr2fol Figure 2: A sample context object for the sentence "Brutus stabs Caesar with a knife.". { ’amr_root’: {’lemma’: ’stab’, ’upos’: ’VERB’}, ’entities’: [ {’text’: ’Brutus’, ’type’: ’PERSON’}, {’text’: ’Caesar’, ’type’: ’PERSON’}], ’IDX’: 0, ’question’: False, ’type’: ’sit’, ’ud_root’: {’lemma’: ’stab’, ’upos’: ’VERB’}} 2. For custom AMR predicates, custom mappings were created e.g: ‘have-org-role-91‘ → ‘role‘ or ‘:poss’ → ‘belongsTo’ 3. Sentence class-based semantic role mappings were applied, e.g. ‘ARG0‘ denotes typically agent role in general but instrument in other cases. For this, a sentence classification used during the context creation step was used. 4.5. Sentence Classification Based on the knowledge represented therein, we label the sentence into one of three classes: (a) Conceptual statements does not describe a specific situation and are not dependent on uncommon circumstances. Typically, they describe concepts or relations between concepts. Example: John is a man. (b) Fact statements describe named entities and are also not dependent on uncommon circum- stances. Example: Tallinn is the capital of Estonia. (c) Situational statements describe a concrete situation and events happening within this situa- tion. Example: Brutus stabbed Caesar with a knife. To evaluate sentence classification accuracy and optimize the heuristic parameters for classi- fication accuracy, an experiment was constructed. A dataset was chosen for each sentence type. Heuristics based on UD were constructed to classify the sentences. A classifier was constructed and F1 score was calculated to finetune the parameters of heuristics and verify the accuracy of the classifier. Table 4 Sentence classification evaluation results Sentence Type Dataset Snt. count F1 Score Conceptual OpenbookQA 100 0.91 Fact DBPedia 1.4 100 0.18 Situational SocialIQa 110 0.95 The following sources were chosen for test data: OpenbookQA 25 for conceptual sentences, DBPedia 26 for factual sentences and Social IQa dataset 27 for situational sentences. 25 https://ai2-public-datasets.s3.amazonaws.com/open-book-qa/OpenBookQA-V1-Sep2018.zip 26 https://huggingface.co/datasets/p 27 https://leaderboard.allenai.org/socialiqa/submissions/get-started The accuracy for the classification of factual sentences is much lower than others - due to classifying the sentences as situational. 5. Conclusions A preliminary study was conducted to identify the most common semantic parsing frameworks. A set of parsers were chosen, and a corpus was created to evaluate the suitability of said frameworks for automated knowledge base construction. Based on the results of said experiments, the technologies were chosen, and an experimental system was constructed to extract logical representations and perform text-to-logic conversion – combining AMR unanchored parse trees with bi-lexical UD annotations. The experimental system is found at https://cs.taltech.ee/research/commonsense The system built has currently several limitations • Compound sentence segmentation. Given compound sentences they are not split before parsing, resulting in increased complexity of generated logical forms. • Context order. It is assumed the events in the passage occur in the order of the sequence of sentences provided. • Question detection and scope. Question parsing is not implemented in the current version of the system. Nonetheless, combining AMR with UD has several benefits. It allows us to: (a) improve, prune and specify the generated parse trees and (b) help to generate and specify the context: identifying and storing the references to named entities and classifying the sentence type. References [1] T. Tammet, D. Draheim, P. Järv, GK: Implementing Full First Order Default Logic for Com- monsense Reasoning (System Description), in: J. Blanchette, L. Kovács, D. Pattinson (Eds.), Automated Reasoning, Springer International Publishing, Cham, 2022, pp. 300–309. [2] T. Tammet, G. Sutcliffe, Combining JSON-LD with First Order Logic, in: 2021 IEEE 15th International Conference on Semantic Computing (ICSC), IEEE, 2021, pp. 256–261. [3] M. Verrev, in: Evaluation of Semantic Parsing Frameworks for Automated Knowledge Base Construction, To apperar in ISDA2022. Lecture Notes in Networks and Systems, Springer, 2022. [4] W.-T. Chen, M. Palmer, Unsupervised AMR-Dependency Parse Alignment, in: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Association for Computational Linguistics, 2017, pp. 558–567. [5] O. Abend, A. Rappoport, UCCA: A Semantics-based Grammatical Annotation Scheme., in: IWCS, volume 13, 2013, pp. 1–12. [6] D. Zeman, Reusable Tagset Conversion Using Tagset Drivers, in: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), European Language Resources Association (ELRA), 2008. [7] K. Haverinen, J. Nyblom, T. Viljanen, V. Laippala, S. Kohonen, A. Missilä, S. Ojala, T. Salakoski, F. Ginter, Building the essential resources for Finnish: the Turku Dependency Treebank, Language Resources and Evaluation (2014). [8] J. Nivre, M.-C. de Marneffe, F. Ginter, J. Hajič, C. D. Manning, S. Pyysalo, S. Schuster, F. Tyers, D. Zeman, Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection, in: Proceedings of the 12th Language Resources and Evaluation Conference, European Language Resources Association, 2020, pp. 4034–4043. URL: https://www.aclweb. org/anthology/2020.lrec-1.497. [9] A. Copestake, D. Flickinger, C. Pollard, I. A. Sag, Minimal Recursion Semantics: An introduction, Research on Language and computation 3 (2005) 281–332. [10] S. Oepen, J. T. Lønning, Discriminant-based MRS banking, in: LREC, 2006, pp. 1250–1255. [11] J. Hajic, E. Hajicová, J. Panevová, P. Sgall, O. Bojar, S. Cinková, E. Fucíková, M. Mikulová, P. Pajas, J. Popelka, et al., Announcing Prague Czech-English Dependency Treebank 2.0, in: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), 2012, pp. 3153–3160. [12] V. Basile, J. Bos, K. Evang, N. Venhuizen, Developing A Large Semantically Annotated Corpus, in: LREC 2012, Eighth International Conference on Language Resources and Evaluation, 2012. [13] Y. Liu, W. Che, B. Zheng, B. Qin, T. Liu, An AMR Aligner Tuned by Transition-based Parser, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 2422–2430. [14] J. Bos, V. Basile, K. Evang, N. J. Venhuizen, J. Bjerva, The Groningen Meaning Bank, in: Handbook Of Linguistic Annotation, Springer, 2017, pp. 463–496. [15] A. S. White, D. Reisinger, K. Sakaguchi, T. Vieira, S. Zhang, R. Rudinger, K. Rawlins, B. Van Durme, Universal Decompositional Semantics On Universal Dependencies, in: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, pp. 1713–1723. [16] T. Kollar, D. Berry, L. Stuart, K. Owczarzak, T. Chung, L. Mathias, M. Kayser, B. Snow, S. Matsoukas, The Alexa Meaning Representation Language, in: NAACL-HLT (3), 2018, pp. 177–184. [17] I. Hatzilygeroudis, Teaching NL to FOL and FOL to CF Conversions., in: FLAIRS Conference, 2007, pp. 309–314. [18] J. Bos, Squib: Expressive Power of Abstract Meaning Representations, Computational Linguistics 42 (2016) 527–535. doi:10.1162/COLI_a_00257. [19] T. Naseem, A. Blodgett, S. Kumaravel, T. O’Gorman, Y.-S. Lee, J. Flanigan, R. F. Astudillo, R. Florian, S. Roukos, N. Schneider, DocAMR: Multi-Sentence AMR Representation and Evaluation, in: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021. [20] J. Bos, Separating Argument Structure From Logical Structure In AMR, arXiv preprint arXiv:1908.01355 (2019).