Common-Knowledge Concept Recognition for SEVA Jitin Krishnan1 , Patrick Coronado2 , Hemant Purohit3 , Huzefa Rangwala4 1,4 Department of Computer Science, George Mason University 2 Instrument Development Center, NASA Goddard Space Flight Center 3 Information Sciences & Technology Department, George Mason University jkrishn2@gmu.edu, patrick.l.coronado@nasa.gov, hpurohit@gmu.edu, rangwala@gmu.edu Abstract We build a common-knowledge concept recognition system for a Systems Engineer’s Virtual Assistant (SEVA) which can be used for downstream tasks such as relation extraction, knowledge graph construction, and question-answering. The problem is formulated as a token classification task similar to named entity extraction. With the help of a domain expert and text processing methods, we construct a dataset annotated at the word-level by carefully defining a labelling scheme to train a sequence model to recognize systems engineering con- cepts. We use a pre-trained language model and fine-tune it with the labeled dataset of concepts. In addition, we also cre- Figure 1: Common-knowledge concept recognition and sim- ate some essential datasets for information such as abbrevi- ple relation extraction ations and definitions from the systems engineering domain. Finally, we construct a simple knowledge graph using these extracted concepts along with some hyponym relations. SE commonsense comes from years of experience and learning which involves background knowledge that goes Keywords: Natural Language Processing, Named En- beyond any handbook. Although constructing an assis- tity Recognition, Concept Recognition, Relation Extraction, tant like SEVA system is the overarching objective, a key Systems Engineering. problem to first address is to extract elementary common- knowledge concepts using the SE handbook and domain ex- INTRODUCTION perts. We use the term ‘common-knowledge’ as the ‘com- monsense’ knowledge of a specific domain. This knowledge The Systems Engineer’s Virtual Assistant (SEVA) (Krish- can be seen as a pivot that can be used later to collect ‘com- nan, Coronado, and Reed 2019) was introduced with the monsense’ knowledge for the SE domain. We propose a pre- goal to assist systems engineers (SE) in their problem- liminary research study that can pave a path towards a com- solving abilities by keeping track of large amounts of infor- prehensive commonsense knowledge acquisition for an ef- mation of a NASA-specific project and using the informa- fective Artificial Intelligence (AI) application for the SE do- tion to answer queries from the user. In this work, we address main. Overall structure of this work is summarized in Fig- a system element by constructing a common-knowledge ure 1. Implementation with demo and dataset is available at: concept recognition system for improving the performance https://github.com/jitinkrishnan/NASA-SE . of SEVA, using the static knowledge collected from the Sys- tems Engineering Handbook (NASA 2017) that is widely used in projects across the organization as domain-specific BACKGROUND AND MOTIVATION commonsense knowledge. At NASA, although there exists Creating commonsense AI still remains an important and knowledge engines and ontologies for the SE domain such challenging task in AI research today. Some of the inspir- as MBSE (Hart 2015), IMCE (JPL 2016), and OpenCaesar ing works are the CYC project (Panton et al. 2006) that tries (Elaasar 2019), generic commonsense acquisition is rarely to serve as a foundational knowledge to all systems with discussed; we aim to address this challenge. millions of everyday life commonsense assertions, Mosaic Commonsense Knowledge Graphs and Reasoning (Zellers Copyright c 2020 held by the author(s). In A. Martin, K. Hinkel- et al. 2018) that addresses aspects like social situations, men- mann, H.-G. Fill, A. Gerber, D. Lenat, R. Stolle, F. van Harmelen (Eds.), Proceedings of the AAAI 2020 Spring Symposium on Com- tal states, and causal relationships, and Aristo System (AI2 bining Machine Learning and Knowledge Engineering in Practice Allen Institute for AI ) that focuses on basic science knowl- (AAAI-MAKE 2020). Stanford University, Palo Alto, California, edge. In NASA’s context, systems engineering combines USA, March 23-25, 2020. Use permitted under Creative Commons several engineering disciplines requiring extreme coordina- License Attribution 4.0 International (CC BY 4.0). tion and is prone to human errors. This, in combination with the lack of efficient knowledge transfer of generic lessons- BIO Labelling Scheme learned makes most technology-based missions risk-averse. 1. abb: represents abbreviations such as TRL representing Thus, a comprehensive commonsense engine can signifi- Technology Readiness Level. cantly enhance the productivity of any mission by letting the 2. grp: represents a group of people or an individual such experts focus on what they do best. as Electrical Engineers, Systems Engineers or a Project Concept Recognition (CR) is a task identical to the tra- Manager. ditional Named Entity Recognition (NER) problem. A typi- 3. syscon: represents any system concepts such as engineer- cal NER task seeks to identify entities like name of a per- ing unit, product, hardware, software, etc. They mostly son such as ‘Shakespeare’, a geographical location such represent physical concepts. as ‘London’, or name of an organisation such as ‘NASA’ 4. opcon: represents operational concepts such as decision from unstructured text. A supervised NER dataset consists analysis process, technology maturity assessment, system of the above mentioned entities annotated at the word-token requirements review, etc. level using labelling schemes such as BIO which provides 5. seterm: represents generic terms that are frequently used beginning (B), continuation or inside (I), and outside (O) in SE text and those that do not fall under syscon or op- representation for each word of an entity. (Baevski et al. con such as project, mission, key performance parameter, 2019) is the current top-performing NER model for CoNLL- audit etc. 2003 shared task (Sang and De Meulder 2003). Off-the-shelf 6. event: represents event-like information in SE text such as named entity extractors do not suffice in the SE common- Pre-Phase A, Phase A, Phase B, etc. knowledge scenario because the entities we want to extract 7. org: represents an organization such as ‘NASA’, are domain-specific concepts such as ‘system architecture’ ‘aerospace industry’, etc. or ‘functional requirements’ rather than physical entities 8. art: represents names of artifacts or instruments such as such as ‘Shakespeare’ or ‘London’. This requires defining ‘AS1300’ new labels and fine-tuning. 9. cardinal: represents numerical values such as ‘1’, ‘100’, Relation extraction tasks extract semantic relationships ’one’ etc. from text. These extractors aim to connect named entities 10. loc: represents location-like entities such as component such as ‘Shakespeare’ and ‘England’ using relations such facilities or centralized facility. as ‘born-in’. Relations can be as simple as using hand- 11. mea: represents measures, features, or behaviors such as built patterns or as challenging as using unsupervised meth- cost, risk, or feasibility. ods like Open IE (Etzioni et al. 2011); with bootstrapping, supervised, and semi-supervised methods in between. (Xu Abbreviations and Barbosa 2019) and (Soares et al. 2019) are some of Abbreviations are used frequently in SE text. We automat- the high performing models that extract relations from New ically extract abbreviations using simple pattern-matching York Times Corpus (Riedel, Yao, and McCallum 2010) and around parentheses. Given below is a sample regex that TACRED challenges (Zhang et al. 2017) respectively. Hy- matches most abbreviations in the SE handbook. ponyms represent hierarchical connection between entities r"\([ ]*[A-Z][A-Za-z]*[ ]*\)" of a domain and represent important relationships. For in- An iterative regex matching procedure using this pattern stance, a well-known work by (Hearst 1992) uses syntactic over the preceding words will produce the full phrase of the patterns such as [Y such as A, B, C], [Y including X], or [Y, abbreviation. ‘A process to determine a system’s technologi- including X] to extract hyponyms. Our goal is to extract pre- cal maturity based on Technology Readiness Levels (TRLs)’ liminary hyponym relations from the concepts extracted by produces the abbreviation TRL which stands for Technology the CR and to connect the entities through verb phrases. Readiness Levels. ‘Define one or more initial Concept of Operations (ConOps) scenarios’ produces the abbreviation ConOps which stands for Concept of Operations. We pre- CONCEPT RECOGNITION label these abbreviations as concept entities. Many of these abbreviations are also provided in the Appendix section of SE concepts are less ambiguous as compared to generic the handbook which is also extracted and used as concepts. natural language text. A word usually means one concept. For example, the word ‘system’ usually means the same Common-Knowledge Definitions when referring to a ‘complex system’, ‘system structure’, or Various locations of the handbook and the glossary pro- ‘management system’ in the SE domain. In generic text, the vide definitions of several SE concepts. We collect these and meaning of terms like ‘evaluation’, ‘requirement’, or ‘anal- compile a comprehensive definitions document which is also ysis’ may contextually differ. We would like domain specific used for the concept recognition task. An example definition phrases such as ‘system evaluation’, ‘performance require- and its description is shown below: ment’, or ‘system analysis’ to be single entities. Based on Definition: Acceptable Risk the operational and system concepts described in (Krishnan, Description: The risk that is understood and agreed to Coronado, and Reed 2019), we carefully construct a set of by the program/project, governing authority, mission direc- concept-labels for the SE handbook which is shown in the torate, and other customer(s) such that no further specific next section. mitigating action is required. precision recall f1-score support syscon 0.94 0.89 0.91 320 opcon 0.87 0.91 0.89 1154 seterm 0.98 0.94 0.96 287 mea 0.91 0.90 0.90 248 grp 0.94 0.93 0.94 89 org 1.00 0.11 0.21 26 cardinal 0.90 0.92 0.91 71 event 0.71 0.78 0.76 77 Figure 2: A Snippet of the concept-labelled dataset abb 0.82 0.58 0.68 79 art 0.00 0.00 0.00 4 O 73944 B-cardinal 414 I-grp 132 loc 0.00 0.00 0.00 1 B-opcon 5530 B-abb 354 B-org 87 micro/macro-avg 0.90 0.88 0.88 2356 B-syscon 1640 B-event 350 I-seterm 26 B-seterm 1431 I-event 218 B-art 17 Table 2: Performance of different labels I-opcon 1334 I-syscon 201 I-org 12 B-mea 1117 I-abb 156 I-loc 3 F1-Score Accuracy Accuracy without ‘O’-tag B-grp 499 I-mea 145 B-loc 2 0.89 0.97 0.86 Table 1: Unique Tag Count from the CR dataset Table 3: Overall Performance of CR; For fairness, we also provide the accuracy when the most common ‘O’-tag is ex- cluded from the analysis. CR Dataset Construction and Pre-processing Using python tools such as PyPDF2, NLTK, and RegEx we build a pipeline to convert PDF to raw text along with ex- chunking connects the named entities recognized by the CR tensive pre-processing which includes joining sentences that model through verbs. are split, removing URLs, shortening duplicate non-alpha characters, and replacing full forms of abbreviations with Hyponyms from Definitions their shortened forms. We assume that the SE text is free of spelling errors. For the CR dataset, we select coherent The definition document consists of 241 SE definitions paragraphs and full sentences by avoiding headers and short and their descriptions. We iteratively construct entities in blurbs. Using domain keywords and a domain expert, we an- increasing order of number of words in the definitions notate roughly 3700 sentences at the word-token level. An with the help of their parts-of-speech tags. This helps in example is shown in Figure 2 and the unique tag count is creating subset-of relation between a lower-word entity and shown in Table 1. a higher-word entity. Each root entity is lemmatized such that entities like processes and process appear only once. Fine tuning with BERT Any language model can be used for the purpose of cus- tomizing an NER problem to CR. We choose to go with BERT (Devlin et al. 2018) because of its general-purpose nature and usage of contextualized word embeddings. Hyponyms from POS tags In the hand-labelled dataset, each word gets a label. The Using the words (especially nouns) that surround an already idea is to perform multi-class classification using BERT’s identified named entity, more specific entities can be iden- pre-trained cased language model. We use pytorch trans- tified. This is performed on a few selected entity tags such formers and hugging face as per the tutorial by (Huang as opcon and syscon. For example, consider the sentence 2019) which uses BertF orT okenClassif ication. The ‘SE functions should be performed’. ‘SE’ has tag NNP text is embedded as tokens and masks with a maximum to- and ‘functions’ has tag NNS. We create a relation called ken length. This embedded tokens are provided as the input subset-of between ‘SE functions’ and ‘SE’. to the pre-trained BERT model for a full fine-tuning. The model gives an F1-score of 0.89 for the concept recognition task. An 80-20 data split is used for training and evaluation. Detailed performance of the CR is shown in Table 2 and 3. Additionally, we also implemented CR using spaCy (Honni- bal and Johnson 2015) which also produced similar results. Relations from Abbreviations RELATION EXTRACTION In this work, for relation extraction, we focus on hyponyms and verb phrase chunking. Hyponyms are more specific con- Relations from abbreviations are simple direct connections cepts such as earth to planet or rose to flower. Verb phrase between the abbreviation and its full form described in the abbreviations dataset. Figure 3 shows a snippet of References knowledge graph constructed using stands-for and subset-of AI2 Allen Institute for AI. Aristo: An intelligent system relationships. Larger graphs are shown in the demo. that reads, learns, and reasons about science. https:// allenai.org/aristo/. Accessed: 2019-08-12. Baevski, A.; Edunov, S.; Liu, Y.; Zettlemoyer, L.; and Auli, M. 2019. Cloze-driven pretraining of self-attention net- works. arXiv preprint arXiv:1903.07785. Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2018. Bert: Pre-training of deep bidirectional transformers for lan- guage understanding. arXiv preprint arXiv:1810.04805. Elaasar, M. 2019. Open casesar; the case for integrated model centric engineering. Etzioni, O.; Fader, A.; Christensen, J.; Soderland, S.; and Figure 3: A snippet of the knowledge graph generated Mausam. 2011. Open information extraction: the second generation. IJCAI 3–10. Hart, L. E. 2015. Introduction to model-based system Relation Extraction using Verb Phrase Chunking engineering (mbse) and sysml. http://www.incose. Finally, we explore creating contextual triples from sen- org/docs/default-source/delaware-valley/ tences using all the entities extracted using the CR model mbse-overview-incose-30-july-2015.pdf. Ac- and entities from definitions. Only those phrases that con- cessed: 11-09-2017. nect two entities are selected for verb phrase extraction. Us- Hearst, M. A. 1992. Automatic acquisition of hyponyms ing NLTK’s regex parser and chunker, a grammar such as from large text corpora. In Proceedings of the 14th confer- VP: {(||||| ence on Computational linguistics-Vol. 2, 539–545. ACL. )*+(|||| Honnibal, M., and Johnson, M. 2015. An improved non- |)*} monotonic transition system for dependency parsing. In with at least one verb, can extract relation-like phrases from Proceedings of the 2015 Conference on Empirical Methods the phrase that links two concepts. An example is shown in in Natural Language Processing, 1373–1378. ACL. Figure 4. Further investigation of relation extraction from Huang, B. 2019. Ner with bert in action. SE handbook is left as future work. JPL. 2016. Imce ontological modeling framework. Krishnan, J.; Coronado, P.; and Reed, T. 2019. Seva: A systems engineer’s virtual assistant. In AAAI-MAKE Spring Symposium. NASA. 2017. Nasa systems engineering handbook. Panton, K.; Matuszek, C.; Lenat, D.; Schneider, D.; Wit- brock, M.; Siegel, N.; and Shepard, B. 2006. Common Sense Reasoning – From Cyc to Intelligent Assistant. Berlin, Hei- delberg: Springer Berlin Heidelberg. 1–31. Riedel, S.; Yao, L.; and McCallum, A. 2010. Modeling relations and their mentions without labeled text. In Joint European Conference on Machine Learning and Knowledge Figure 4: Relation Extraction using Verb Phrase Discovery in Databases, 148–163. Springer. Sang, E. F., and De Meulder, F. 2003. Introduction to the CONCLUSION AND FUTURE WORK conll-2003 shared task: Language-independent named entity recognition. arXiv preprint cs/0306050. We presented a common-knowledge concept extractor for Soares, L. B.; FitzGerald, N.; Ling, J.; and Kwiatkowski, the Systems Engineer’s Virtual Assistant (SEVA) system T. 2019. Matching the blanks: Distributional similarity for and showed how it can be beneficial for downstream tasks relation learning. arXiv preprint arXiv:1906.03158. such as relation extraction and knowledge graph construc- Xu, P., and Barbosa, D. 2019. Connecting language and tion. We construct a word-level annotated dataset with the knowledge with heterogeneous representations for neural re- help of a domain expert by carefully defining a labelling lation extraction. arXiv preprint arXiv:1903.10126. scheme to train a sequence labelling task to recognize SE Zellers, R.; Bisk, Y.; Schwartz, R.; and Choi, Y. 2018. Swag: concepts. Further, we also construct some essential datasets A large-scale adversarial dataset for grounded commonsense from the SE domain which can be used for future re- inference. CoRR abs/1808.05326. search. Future directions include constructing a comprehen- sive common-knowledge relation extractor from SE hand- Zhang, Y.; Zhong, V.; Chen, D.; Angeli, G.; and Manning, book and incorporating such human knowledge into a more C. D. 2017. Position-aware attention and supervised data comprehensive machine-processable commonsense knowl- improve slot filling. In EMNLP, 35–45. edge base for the SE domain.