Automated Requirement Formalization Using Product Design Specifications Robin Gröpler1 , Libin Kutty1 , Viju Sudhi1 and Daran Smalley2 1 ifak - Institut für Automation und Kommunikation e.V., Werner-Heisenberg-Str. 1, 39106 Magdeburg, Germany 2 ALSTOM, Östra Ringvägen 2, 721 73 Västerås, Sweden Abstract Assuring the quality of complex and highly configurable software systems is a demanding and time- consuming process. Especially for safety-critical systems, extensive testing based on requirements is necessary. Methods for model-based test automation in agile software development offer the possibility to overcome these difficulties. However, it is still a major effort to create formal models from functional requirements in natural language on a large scale. In this paper, we present and evaluate automated support for the requirements formalization process to reduce cost and effort. We present a new approach based on Natural Language Processing (NLP) and textual similarity using requirements and product design specifications to generate human- and machine-readable models. The method is evaluated on an industrial use case from the railway domain. The recommended requirement models for the considered propulsion system show an average accuracy of more than 90% and an exact match of the entire models of about 55%. These results show that our approach can support the requirements formalization process, which can be further used for test case generation and execution, as well as for requirements and design verification. Keywords Requirements engineering, requirements modeling, natural language processing, textual similarity 1. Introduction Increasingly complex and highly configurable software systems also increase the effort required for their quality assurance. The rapid and simultaneously high-quality development of industrial software products demands an increasingly effective test process. Especially for safety-critical systems, such as in the automotive and railway domains, extensive testing based on require- ments is necessary. However, any manual processing, such as requirements verification and test generation, from textual requirements is time-consuming and error-prone, and also requires a lot of expert knowledge. Methods for model-based test automation in agile software develop- ment pursue the goal to overcome these difficulties [1]. Formal models serve as the basis for automating a large number of further process steps. In: J. Fischbach, N. Condori-Fernández, J. Doerr, M. Ruiz, J.-P. Steghöfer, L. Pasquale, A. Zisman, R. Guizzardi, J. Horkoff, A. Perini, A. Susi, M. Daneva, A. Herrmann, K. Schneider, P. Mennig, F. Dalpiaz, D. Dell’Anna, S. Kopczyńska, L. Montgomery, A.G. Darby, P. Sawyer (eds.): REFSQ 2022 Joint Proceedings of the Co-Located Events, NLP4RE, Aston, Birmingham, UK, 21-03-2022 " robin.groepler@ifak.eu (R. Gröpler); libinjohn26@gmail.com (L. Kutty); vjusudhi@gmail.com (V. Sudhi); daran.smalley@alstomgroup.com (D. Smalley) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) In previous works, a model-based testing tool chain has been developed to enable efficient test processes [2, 3]. Methods for model synthesis, model-based test generation and prioritization are used to systematically and efficiently create a test suite that contains suitable test cases. This approach is based on behavioral requirements that serve as input for further processing. The only time-consuming manual step is the creation of requirement models from functional requirements in natural language. Recent advances in natural language processing (NLP) show promising results for supporting a wide range of requirements analysis tasks [4]. Therefore, NLP techniques show a growing interest in automating various software testing activities such as model and test case generation. A considerable amount of NLP approaches and tools have been investigated in recent years aiming to generate models or test cases from underlying requirements documents [5, 6]. However, due to the shortcomings of natural language, most of the existing NL-based ap- proaches generate abstract models and test cases, but lack information about real entities, their relations and design [6]. The links to real entities such as components, signals and pa- rameters are usually added only during test execution, which in turn is an error-prone and time-consuming manual process. At this late stage of the testing process, issues have to be resolved, e.g., that parts of the requirements are not verifiable. This makes the model-based approach somewhat inefficient, since the requirements and the design specification then have to be revised again. Besides mapping abstract to real entities, there are usually many specific details in the system design from which the requirement formalization process would benefit. The system architecture not only describes the structure, but also includes architectural design decisions and is therefore closely related to the requirements [7]. In this work, we want to take into consideration product design specifications for a much more precise requirements formalization process. Our new approach utilizes NLP techniques to automatically generate requirement models from natural language requirements and design specifications. We perform textual similarity and contradiction analysis between the require- ments and entity descriptions using classical to modern NLP algorithms. The generated models are represented in a simple, self-created, machine- and human-readable language. The main contributions of this work are i) a new approach for requirement formalization using product design specifications and ii) the evaluation of various algorithms on an industrial use case. The integration of information from the design specifications at this early stage of the testing process shall provide much faster feedback to the requirements or test engineer whether the requirements are verifiable and correctly designed. 2. Related work There is a wide range of sophisticated methods and tools for requirement formalization, see e.g. the surveys of Zhao et al. [4], Buzhinsky [5] or Brunello et al. [8]. Recently, Giannakopoulou et al. [9] proposed a structured natural language called FRETISH using semantic templates, which has been evaluated in several NASA projects. These methods focus purely on the requirements texts without considering further information about the system architecture. There are only a few works that take architecture and design specifications into account during the formalization of requirements. Bernaerts et al. [10] addressed the early integration of the design process, but focus more on temporal logic while assuming a manual interpretation of natural language texts. Stachtiari et al. [11] addressed the early assurance of consistency between the requirements and design correctness using a pattern-based approach. Note that in a previous study, we pointed out that a predefined list of semantic entities is helpful to generate more accurate models from requirements [2]. Wang et al. [12] developed a pattern- based approach called PASER for checking the consistency between the generated models and the implementation, i.e., considering the design in the stage of the process model. During test execution, many authors use a mapping table or test case specifications [6]. However, all of these approaches require large manual effort and expert knowledge of system architecture and design to create such documents. The use of architecture and design specifications has been investigated for various other requirements analysis tasks. Leitão and Medeiros [13] developed an NLP-based method that can extract and associate components from product design specifications and system requirements. Sharma et al. [14] proposed a recommender system for selecting a suitable architectural pattern for a given set of software requirements using textual entailment. Niklas et al. [15] developed an approach for checking the consistency of design specifications against natural language requirements based on noun extraction and graph-based modeling. Yet, none of these approaches can be used for the formalization of requirements. The application of similarity approaches is widely used in requirements analysis. For example, several techniques for identifying requirement duplicates and interdependencies have been studied [16, 17]. Furthermore, it was shown that measuring the similarity between new and old requirements of software projects helps to identify reusable software components such as design, coding and test cases [18]. Recently, Abbas et al. [19] investigated the semantic similarity of customer requirements to generate reuse recommendations for software product line assets. However, most previous work has focused on the level of the entire requirement, which has the advantage that it can be processed at once and most NLP techniques are directly applicable. In contrast, our approach goes into much more detail on the individual entities and their dependencies between the requirements and design descriptions. In addition, not only similarity but also contradiction is studied to identify specific properties of the requirement model. To the best of our knowledge, the application of NLP-based methods for automated requirements formalization using requirements and design specifications has not yet been considered in the literature. 3. Use case In order to conduct experiments and evaluate our approach, we consider a use case from the rail industry. We use the data of the Propulsion Control (PPC) system in Bombardier Transportation, an Alstom Group Company. The PPC is part of a large, complex, safety-critical system. It handles the control of the entire propulsion system, including both control software as well as the electrical functions. The requirements are written in textual format in the requirements management tool IBM DOORS. They are written in English and may contain several sentences. They do not follow a prescribed format in order not to focus on syntax when writing them. To meet the standards Requirement specification Design specification Requirement Component Signal Parameter + ID 1..* 1..* + Name + Name + Name + Description + Description + Description + Description + LinkedComponents + Parameters + DataType + DataType + Signals + Direction + Value 1 1..* 1..* 1..* Figure 1: Class diagram of the input data provided in the documents. of Safety Integrity Level 2 (SIL2), a design document is written in textual format and is also handled in DOORS. The requirements and the design document are created by hand and can be manually linked. The system architecture and software modules are modeled as blocks in Matlab Simulink using a model-based design approach [20]. The two underlying documents for our study are provided as exports from DOORS and Simulink. We assume the following tool and use case specific characteristics of the requirements and design specifications, cf. Figure 1: i) components are not mentioned in the textual requirement descriptions but are defined in the design specification and linked to the requirements, ii) all signals and parameters are defined with textual descriptions in the design specification and linked to the corresponding components, and iii) requirement descriptions have a clear and consistent structure. In particular, we assume that a short description in natural language is available for each entity. This assumption should hold true in most industrial product developments since requirements engineers also need to identify entities by some textual description. This information can be written inside the implemented code, in the company’s internal standardization documentation, or more structured as a table in a design specification document. 4. Methodology In the following, we investigate suitable NLP methods to automate the process of creating requirement models from the requirements and design specification. Figure 2 demonstrates the pipeline of our approach using an exemplary requirement with related signal descriptions. Note that the requirement is shown somewhat generalized, as the data we use is confidential. The process can be divided into the following steps. The requirements in natural language and the design specification serve as inputs to the pipeline. The requirement is initially decomposed into sub-requirement clauses (Section 4.1). Each clause is compared to the signal descriptions in the design specification to identify the signals (Section 4.2) and then the corresponding parameters are identified (Section 4.3). A relation is formed for each clause, which is then translated into a logical structure and finally into a requirement model (Section 4.4). The ground truth for the individual steps was created manually and reviewed by a require- ments engineer with expert knowledge. While presenting the methodology, we show experi- mental results for the individual steps and discuss which methods are most suitable. Requirement Specification The valve shall be closed CMD_Close_Valve = true and event 'Extremely high device temperature' shall be indicated, EVT_Ex_Hi_Temp = true when the device temperature exceeds [T_ExHi] ºC Device_Temp > T_ExHi until the device temperature falls below the reset value [T_RST] ºC. Device_Temp < T_RST Decom- Signal Parameter Model position Detection Detection Generation Design Specification Requirement Model Signal Data Type Description Command: close valve due to if (Device_Temp > T_ExHi) CMD_Close_Valve boolean extremely high device temperature then (CMD_Close_Valve = true Event: extremely high device EVT_Ex_Hi_Temp boolean and temperature EVT_Hi_Temp boolean Event: high device temperature EVT_Ex_Hi_Temp = true) Device_Temp single Device temperature, degC until (Device_Temp < T_RST) .. .. .. Figure 2: Exemplary requirement with an overview of the pipeline. 4.1. Decomposition It is imperative to initially decompose the long, complex industrial requirements into shorter requirement clauses that describe elementary instructions individually. This process requires understanding both the structure (syntax) and meaning (semantics) of the requirement text. In our first trials to decompose the requirements, we used a naive keyword-based approach. The conjunctions for introducing conditional clauses like if, when, while and until and for connecting clauses like and and or were identified in the span of the requirement text and were considered the boundary of clauses. In the considered use case with well structured requirements, there were only a few cases where this approach did not work. For example, the phrases "between sensor 1 and 2" or "less than or equal" should not be considered a boundary despite the presence of a conjunction. To decompose such requirements correctly, we considered the syntactic dependencies1 of the requirement text (e.g., using the ancestor tokens of the conjunctions) in addition to the keywords. Fig. 2 illustrates the decomposition of a requirement (by line breaks) from our use case. Our use case demands only a simple decomposition algorithm, as proposed here, to work correctly. However, this could be extended to more linguistic patterns as proposed in [21] or to more sophisticated algorithms such as proposed in [22]. For the further steps in our pipeline, decomposition into requirement clauses is best suited. A shorter chunk of text (with just the noun phrases) leads to a loss of information and a longer clause (a whole sentence) leads to erroneous detection of signals and parameters. 1 using spaCy, https://spacy.io/ Table 1 Evaluation results for signal detection Class Method Accuracy Term Frequency (TF) 78.5% Classical methods Term Frequency with stemming (TF-stem) 86.0% FuzzyWuzzy Partial Ratio 67.3% String matching FuzzyWuzzy Token Set Ratio (Fuzz-TSR) 85.0% GloVe 74.0% Static embeddings fastText 57.0% Contextual embeddings Sentence-BERT (SBERT) 90.7% TF-stem + Fuzz-TSR 90.7% Ensembles SBERT + Fuzz-TSR 91.6% 4.2. Signal detection Once the requirement is decomposed, we compute the cosine similarity between the requirement clause and all the signal descriptions in the design specification. The signal with the highest similarity score is then retrieved as the most relevant. To obtain a vector representation for each requirement clause and signal description, we apply classical techniques including Term Frequency (TF), static word embeddings including GloVe2 and fastText3 and contextual embeddings from Sentence-BERT4 (SBERT). We also use FuzzyWuzzy5 , which looks for partial (or inexact) matches between sentence pairs. We also consider ensembles of these methods by taking the average of the individual similarity scores. The evaluation results for signal detection using a total of 107 clauses and 207 signals from the use case are shown in Table 1. We use TF instead of TF-IDF as we observed that the IDF weights do not change the result significantly (probably due to the shortness of texts). In agreement with this argument, we found that removing stop words deteriorated the similarity scores. However, stemming6 the text before vectorizing with TF gives better results. Pre-trained static embedding models performed hardly well for detecting signals. We observed that these models have difficulty distinguishing between, e.g., "extremely high temperature" and "too high temperature" within the signal description. Though the SBERT model performs very well, there are also some cases where the model could not distinguish between "device on fault" and "device off fault". This could be attributed to the modality of their training, which aims to bring embeddings of similar words in their representation space as close as possible. As a result, the model gives very high cosine similarity scores for such pairs and occasionally detects an incorrect signal. Though FuzzyWuzzy individually did not perform as well as the other methods, when combined with the relatively better performing TF-stem and SBERT models, the best results were obtained. 2 using glove.42B.300d from https://nlp.stanford.edu/projects/glove/ 3 using wiki-news-300d-1M.vec from https://fasttext.cc/docs/en/english-vectors.html 4 using paraphrase-distilroberta-base-v1 from https://github.com/UKPLab/sentence-transformers 5 https://github.com/seatgeek/fuzzywuzzy 6 using Porter stemmer from NLTK, https://www.nltk.org/_modules/nltk/stem/porter.html Table 2 Evaluation results for parameter detection with boolean data type Class Method TPR TNR Balanced Accuracy WordNet 50.7% 33.3% 42.1% Antonyms MoE-ASD trained on all POS tags 55.2% 61.1% 58.2% Term-Frequency with stemming 50.7% 55.6% 53.2% Paraphrase Fuzzy Wuzzy Token Set Ratio 71.6% 22.2% 46.9% Sentence-BERT (SBERT) 73.1% 66.7% 69.9% Sentiment Analysis 79.1% 50.0% 64.6% Inference Textual Entailment (TE) 88.1% 61.1% 74.6% Textual Entailment with negation rule (TE-neg) 95.5% 72.2% 83.9% 4.3. Parameter detection After detecting the signals, we need to determine the parameters that are assigned or compared to the signal values. When parameter names are used within the requirement (e.g., T_ExHi), they can be easily identified by simple pattern matching in the design specification. In many cases, however, the parameters are of boolean data type and can only be identified by a semantic comparison between the two textual statements. In these cases, we need to determine whether the requirement clause agrees or contradicts with the corresponding signal description to obtain the parameter. For example, the requirement clause "The valve shall be opened" and the signal description "close valve .." contradict each other and the parameter is set to false. Conversely, if the signal description agrees with the requirement clause, the parameter should be detected as true. Such NLP tasks for detecting contradictions are known to be difficult [23]. Our first trial was antonym synonym detection, where we checked whether the verb in the requirement clause falls into the antonym sets in WordNet7 of the verb in the signal description. Similar to this approach, we used the model of Mixture-of-Experts for Antonym-Synonym Discrimination (MoE-ASD) [24]. While these methods handle pairs like "activated" and "deacti- vated" considerably well, they fail for phrasal negations like "not activated". To overcome this issue and avoid the ambiguity of model inferences that depend only on the verb, we trained MoE-ASD with all the POS tags and tried to infer with the sentence representation as input. We also utilize the similarity scores from the signal detection step and identify the parameter as true if the scores are above a threshold (empirically set to 0.6) and false otherwise. This is motivated by the idea that when paraphrasing a sentence into another, one could identify whether it agrees or contradicts with the other sentence. Since textual inference methods can reliably classify a pair of sentences as either agreement or contradiction, we also resort to sentiment analysis8 to assess the sentiment of the pair and determine the parameter from it. We also tried to infer the entailment relationship using a pre- trained model9 trained with 3 classification labels. We considered the softmax probabilities for 7 https://wordnet.princeton.edu/ 8 using GLoVe-LSTM from https://demo.allennlp.org/sentiment-analysis/glove-sentiment-analysis 9 using ELMo-based Decomposable Attention from https://demo.allennlp.org/textual-entailment/elmo-snli the labels agreement and contradiction and predicted the parameter accordingly. In neutral cases, we employed a negation rule that checks for the presence of words that identify a negation. Table 2 shows our evaluation results for parameter detection using 85 signals with boolean parameters, of which 67 are actually true and 18 are false. It is evident from the table that textual entailment combined with the negation rule gives the best result. 4.4. Model generation Once the signals and parameters are identified for each requirement clause, we formulate a relation (of the form Signal-Operator-Parameter) for each of these clauses. We rely on a dictionary-based approach10 to identify comparison operators (if any) in the clauses. In Fig. 2 the relations are illustrated for the exemplary requirement. To generate requirement models from these relations, we introduce a domain-specific language (DSL) with abstract logical blocks. It maps the relations coming from a conditional clause (if /when/while) to an if-block, those coming from main clauses to a then-block and those coming from an until-clause to an until-block. Conjunctions (and/or) identified between relations are also accommodated in these DSL blocks. Though this mapping appears rather trivial, our aim is to make this translation simple and flexible, so that ways are open for integration with other sophisticated languages. The resulting models can also be further transformed into Matlab Simulink models or UML sequence diagrams, depending on what the end user desires. 5. Evaluation To evaluate the entire pipeline against the 31 requirements from our use case, we combine all the individual steps from above. We take the requirement texts as input and generate requirement models with the help of the design specification. The evaluation results are shown in Table 3. The average accuracies are calculated as follows. For each requirement, we calculate the percentage of correctly identified relations. Similarly, we calculate the accuracy of the logic by counting the correctly connected relations with logical conjunctions (and/or) and the correctly assigned relations to the logical blocks (if /then/until). For the accuracy of the model, we count all correctly identified signals, parameters, relations, and logic. Then, the macro-average over all requirements is calculated and shown in Table 3. The exact match counts the percentage of completely correct models. While this is a tough metric, it shows the extent to which the requirements formalization process can be fully automated. We observe the best results with the two best ensembles for signal detection together with the best method (TE-neg) for parameter detection. This combination yields an average accuracy of 90.7% in evaluating the model across all requirements. The rather low values for exact match of 54.8% reflect the toughness of the metric, which invalidates a requirement model even if a single constituent (mostly one parameter in our case) is misidentified, although the remaining constituents are correctly identified. Some generalized resources of this paper can be found on GitHub11 . 10 using Roget’s Thesaurus, http://www.roget.org/scripts/hier.php/?class=I&division=0§ion=III. 11 https://github.com/ifak-prototypes/nlp_reform Table 3 Evaluation results for the whole pipeline on the propulsion system Methods Average Accuracy Exact Match Signal detection Param. detection Relations Logic Model Model TF-stem + Fuzz-TSR SBERT 69.6% 69.9% 86.4% 25.8% SBERT + Fuzz-TSR SBERT 70.4% 69.9% 86.4% 25.8% TF-stem + Fuzz-TSR TE-neg 84.4% 82.8% 90.7% 54.8% SBERT + Fuzz-TSR TE-neg 84.4% 82.8% 90.7% 54.8% Although our approach shows promising results, it has several limitations. One of the main limitations of our pipeline could be the simple decomposition algorithm that works for our specific requirements. Therefore we referred to more elaborated algorithms when needed. The model language is also very limited and contains only the basic behavioral elements. Finally, we assume that links are provided between artifacts, especially to the components. 6. Conclusion and outlook In this work, we presented an NLP-based approach for automated requirements formalization using natural language requirements and design specifications. We investigated various NLP methods for the individual steps of our pipeline and evaluated them on an industrial use case from the railway domain. We have shown that for signal detection we obtain the best results using Sentence-BERT combined with FuzzyWuzzy-TSR, and for (boolean) parameter detection using Textual Entailment supported by a negation rule. When evaluating the entire pipeline, we found that the requirement models generated using the combination of the two aforementioned methods yield the highest average accuracy of more than 90% and an exact match of about 55%. These results show that our approach can highly automate the process of requirements formalization, which can support the requirements engineer in e.g. requirements verification and test case generation. In the future, we plan to integrate this approach into our model-based testing pipeline. Furthermore, there are several features that could improve our methodology, such as handling non-functional properties (like durations), prioritizing methods with lower execution time, domain-specific pre-training and fine-tuning of available models (like SBERT), or learning from corrected predictions of a domain expert (online learning). Another interesting enhancement would be to support the requirements design process and the manual creation of the design specification by providing recommendations for related components and signals in advance. Acknowledgments This research was funded by the ITEA3 project XIVT. The German partner was funded by the BMBF (grant no. 01IS18059E). We thank Bombardier Transportation, an Alstom Group Company, for providing an industrial use case for the evaluation of the presented method. References [1] M. Utting, B. Legeard, Practical Model-Based Testing: A Tools Approach, Elsevier, 2010. [2] R. Gröpler, V. Sudhi, E. J. Calleja García, A. Bergmann, NLP-based Requirements Formalization for Automatic Test Case Generation, in: CS&P’21, 2021, pp. 18–30. [3] M. Reider, S. Magnus, J. Krause, Feature-based testing by using model synthesis, test generation and parameterizable test prioritization, in: ICSTW, 2018, pp. 130–137. [4] L. Zhao, W. Alhoshan, A. Ferrari, K. Letsholo, M. Ajagbe, E.-V. Chioasca, R. T. Batista-Navarro, Natural Language Processing (NLP) for Requirements Engineering (RE): A Systematic Mapping Study, ACM Computing Surveys (2020). [5] I. Buzhinsky, Formalization of natural language requirements into temporal logics: a survey, in: INDIN, 2019, pp. 400–406. [6] C. Wang, F. Pastore, A. Goknil, L. Briand, Automatic Generation of Acceptance Test Cases from Use Case Specifications: an NLP-based Approach, TSE (2020) 1–38. [7] R. C. De Boer, H. Van Vliet, On the similarity between requirements and architecture, Journal of Systems and Software 82 (2009) 544–550. [8] A. Brunello, A. Montanari, M. Reynolds, Synthesis of LTL formulas from natural language texts: State of the art and research directions, in: TIME, 2019, pp. 17:1–17:19. [9] D. Giannakopoulou, T. Pressburger, A. Mavridou, J. Schumann, Automated formalization of struc- tured natural language requirements, Information and Software Technology 137 (2021) 106590. [10] M. Bernaerts, B. Oakes, K. Vanherpen, B. Aelvoet, H. Vangheluwe, J. Denil, Validating industrial requirements with a contract-based approach, in: MODELS-C, 2019, pp. 18–27. [11] E. Stachtiari, A. Mavridou, P. Katsaros, S. Bliudze, J. Sifakis, Early validation of system require- ments and design through correctness-by-construction, J. Syst. Softw. 145 (2018) 52–78. [12] Y. Wang, T. Wang, J. Sun, PASER: a pattern-based approach to service requirements analysis, Int. J. Softw. Eng. Knowl. Eng. 29 (2019) 547–576. [13] V. Leitão, I. Medeiros, SRXCRM: Discovering Association Rules Between System Requirements and Product Specifications, in: NLP4RE’21, 2021, pp. 1–9. [14] S. Sharma, B. Sodhi, APR: architectural pattern recommender, in: SAC’17, 2017, pp. 1225–1230. [15] K. Niklas, S. Gärtner, K. Schneider, Consistency checks of design specifications against require- ments using graph-based linguistic analysis, in: SAC’16, 2016, pp. 1546–1549. [16] J. Natt, B. Regnell, P. Carlshamre, M. Andersson, J. Karlsson, Evaluating automated support for requirements similarity analysis in market-driven development, in: REFSQ’01, 2001, pp. 190–201. [17] D. Falessi, G. Cantone, G. Canfora, Empirical principles and an industrial case study in retrieving equivalent requirements via natural language processing techniques, TSE 39 (2011) 18–44. [18] M. Ilyas, J. Küng, A comparative analysis of similarity measurement techniques through SimReq framework, in: FIT’09, 2009, pp. 1–6. [19] M. Abbas, M. Saadatmand, E. Enoiu, D. Sundamark, C. Lindskog, Automated reuse recommenda- tion of product line assets based on natural language requirements, in: ICSR, 2020, pp. 173–189. [20] E. Simonson, A Model-Based Design Adoption Story from Bombardier Transportation, in: MAT- LAB EXPO 2018 Sweden, 2018, pp. 1–16. [21] M. Liu, X. Peng, A. Marcus, C. Treude, X. Bai, G. Lyu, J. Xie, X. Zhang, Learning-based extraction of first-order logic representations of API directives, in: ESEC/FSE, 2021, pp. 491–502. [22] J. Malin, C. Millward, F. Gomez, D. Throop, Semantic annotation of aerospace problem reports to support text mining, IEEE Intelligent Systems 25 (2010) 20–26. [23] R. Rivera, P. Martinez, The Impact of Pretrained Language Models on Negation and Speculation Detection in Cross-Lingual Medical Text: Comparative Study, JMIR Medical Informatics 8 (2020). [24] Z. Xie, N. Zeng, A Mixture-of-Experts Model for Antonym-Synonym Discrimination, in: Proceed- ings of ACL-IJCNLP, 2021, pp. 558–564.