Research on Test Case Generation Method of Airborne Software Based on NLP Cong Chao, Qinghua Yang, Xiaowei Tu Shanghai University, Shanghai, 200072, China1 Abstract Software testing is a key stage in the life cycle of airborne software development. At this stage, airborne software test cases are developed manually, so the preparation of test cases requires a lot of time and labor costs and is prone to human errors. To solve this problem, on the basis of Long Short-Term Memory, this paper proposes an airborne software test case automatic generation algorithm based on Bi-LSTM-CRF named entity recognition model and Part-Of- Speech tagging. First, preprocess the airborne software requirement document, replace the testable variable name and filter out the untestable requirement statements. Then, the airborne software domain corpus is trained through Bi-LSTM-CRF model to obtain named entity recognition model. Finally, the tag sequence is generated from the requirement statement through the named entity identification model, and the test case is generated through the triplet generation algorithm and the coverage criteria processing algorithm. The experiment uses the engine indicator software requirements document to verify the effect. The results show that compared with the traditional Bi-LSTM-CRF model, the training method with Part-Of-Speech tagging is more accurate, and the accuracy of the final test case generation can reach more than 80%. Keywords Airborne Software, Named Entity Recognition, Bi-LSTM-CRF, Test Case Generation 1. Introduction Software testing is to evaluate the software according to the requirements collected from the system specifications[1]. Due to the high safety and reliability requirements of airborne software, it is very important to ensure the quality and correctness of software. In addition to strictly controlling the software development process, the software testing process is also of great significance[2]. It is estimated that software testing takes 50% of the total development cost, while testing activities consume about 40% of the overall development time[3]. The requirements-based testing process mainly solves two problems: (1) Verify that the requirements are correct, complete, clear and logically consistent. (2) Design necessary and sufficient test cases according to the requirements. Requirements documents for airborne software are written in natural language, so these requirements written in natural language need to be translated into computer readable patterns to facilitate automated test case generation. NLP can transform sentences expressed in natural language into sentences that can be understood in syntax and semantics and generate corresponding test cases. At present, airborne software test cases are developed manually, but there are some serious problems in the manual development of test cases[4]. In order to improve the efficiency and effectiveness of testing, testers need to create high-quality test cases. However, writing test cases is a long and tedious task, and is prone to human errors. Therefore, we need to find a method to automatically generate high-quality test cases. ICBASE2022@3rd International Conference on Big Data & Artificial Intelligence & Software Engineering, October 21- 23, 2022, Guangzhou, China EMAIL: chaocong1@163.com (Cong Chao); yangqinghua@shu.edu.cn (Qinghua Yang); tuxiaowei@shu.edu.cn (Xiaowei Tu) ORCID: 0000-0003-1314-8605 (Cong Chao); 0000-0001-7084-4784 (Qinghua Yang); 0000-0003-3219-3719 (Xiaowei Tu) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 28 This paper describes the process of automatically generating test cases from natural language requirements. The proposed method uses requirements documents as input and test case files as output. 2. Related Work Requirements-based testing involves multiple manual processes. Among them, software testers must define test standards, design test cases according to requirements, build test cases, execute test cases and verify whether software requirements are met. If the requirements-based test process cannot run correctly or consistently, the test case may not provide the expected effect, and the testing time may increase significantly[1]. Automatic generation of test cases is not a new concept. Anurag and Shubhashis[5] developed the Litmus tool for generating test cases from requirements documents. The tool processes each requirement statement and generates one or more test cases through a five-step process. Charles et al.[6] developed an automated test case generator (ATCG), which also takes requirement statements as input and test cases as output. However, in the above methods, the test coverage generated by the requirement statement is not complete and is not applicable to the field of airborne software. 3. Implementation of Automatic Test Case Generation In this paper, we first preprocess the requirements documents, and then train the requirements statements in the airborne software domain through natural language processing technology and Part- of-Speech tagging(POS). Then the requirement statements that need to generate test cases are generated into corresponding named entities through the Bi-LSTM-CRF named entity recognition model, and test cases are generated from standard templates through the triplet generation algorithm and coverage criteria. The test case generation process is shown in Figure 1. Figure 1: Test case generation process 3.1. Document Preprocessing Not all the requirement statements in the requirements document can be transformed into test cases for testing, and there are some explanations and definitions of terms. For example, "the engine indication software includes normal and compression modes". Therefore, untestable requirement statements need to be filtered out. A requirement is a contract that specifies what the user \ agent does to the system and how the system responds. So, a testable sentence can be defined as one that has subject, action, and optional object. Since the constants and variables of the requirements document and the airborne software model have one-to-one correspondence relationship tables, and the test case of airborne software is to verify the assignment of software model variables, the requirements document can be filtered by extracting the constants and variables in the relationship table and replacing the constants and variable names in the requirements document. Figure 2 shows an example of the comparison before and after the requirement document preprocessing. 29 Figure 2: Comparison of requirements documents before and after preprocessing 3.2. Requirement Statement Extraction In order to process the requirements item by item, the statements in the requirements document need to be extracted separately. For testable requirements documents, they can be divided into single-line requirements and multi-line requirements. Single-line requirements are requirements with only one sentence. A multi-line requirement contains multiple statements, and the statements are connected by logical symbols such as "AND" or "OR". The last example in Figure 2 is a multi-line requirement. For the extraction of single-line requirements, it does not require too many operations. It can be obtained directly from the requirements document. For the extraction of multi-line statements, it is necessary to divide each line and logical symbols into multiple statements for processing the statements one by one. For example, the requirements in Figure 2 can be divided into“当以下条件满足时,左侧 发动机 N1 振动读数应显示为白色”, “ip_state 为 4”, “-AND-”, “ip_state 为 4 并且 ip_value 为 0”, “-OR-”, “ip_value 为 1”, and then process one by one. 3.3. Named Entity Recognition Model The purpose of named entity recognition is to identify all entities in the requirement statement. The input of the model is the requirement statement, and the output is the named entity tagging sequence. The requirement statement is transmitted through the Bi LSTM-CRF neural network model. First, bidirectional LSTM is used for forward and backward training to obtain the output score of the tag. Then, run the CRF layer to calculate the gradient of network output and state transition edge. Finally, we update the network parameters, including the parameters of the state transition matrix and the original bidirectional LSTM. In order to improve the accuracy of named entity tagging, the following two problems should be solved: (1) For the requirement statement, what is the label of the training model. (2) How to improve the performance of the model. For the first question, use the label method of SPO (subject predicate object) to tagging the target element, operation instruction and interaction information of the test case as the three types of labels of the training model. For the second problem, the Part-Of-Speech tagging is used as the feature of the training model while tagging the triplets. The named entity recognition model proposed in this paper is based on Bi-LSTM-CRF, and the POS features are added on this basis. 30 3.3.1. LSTM Network Structure Recurrent Neural Network (RNN) is a kind of neural network used to process sequence data. It is very effective for data with sequence characteristics, which enables the trained model to predict results through long distance characteristics[7]. Theoretically, RNN can learn long-term dependence, but there are defects in dealing with long-term memory, such as gradient disappearance and gradient explosion [8]. It tends to consider the recent state. Long Short-Term Memory (LSTM) adds a storage unit to RNN to filter past states, so that it can choose which states have more influence on the current situation, and better discover and utilize the dependencies in the data , instead of simply selecting a nearby state. The module at moment t of LSTM is shown in Figure 3. Figure 3: LSTM module at moment t LSTM adjusts the values of input and hidden layers through the gate structure, which is composed of forgetting gate, memory gate and output gate. Among them, σ Represents a sigmoid function whose output is between 0 and 1. Tanh is a hyperbolic tangent function with values between - 1 and 1. The forgetting gate determines the forgetting ratio in the last moment 𝐶 , and the formula is: 𝑓 = 𝜎 𝑊 ∙ [ℎ , 𝑥 ] + 𝑏 , (1) The memory gate obtains the weight of the new memory through the σ layer, and then adds the weighted new memory 𝑖 ∗ 𝐶 to the existing state, and then realizes the update from 𝐶 to 𝐶 . The formula is as follows: 𝑖 = 𝜎 𝑊 ∙ [ℎ , 𝑥 ] + 𝑏 , (2) 𝐶 = 𝑡𝑎𝑛ℎ 𝑊 ∙ [ℎ ,𝑥 ] + 𝑏 , (3) 𝐶 =𝑓 ∗𝐶 +𝑖 ∗𝐶 , (4) Finally, the short-term memory ℎ is obtained by updating the output gate, and the formula is as follows: 𝑜 = 𝜎 𝑊 ∙ [ℎ , 𝑥 ] + 𝑏 , (5) ℎ = 𝑜 ∗ 𝑡𝑎𝑛ℎ 𝐶 , (6) 31 Both RNN and LSTM can only predict the output of the next moment based on the previous time series information, but in the process of named entity recognition, the output is not only related to the previous state, but also related to the future state. Therefore, this paper uses bidirectional LSTM to predict named entities based on the context, takes words as the minimum unit, takes the word vector encoding sequence 𝑥 generated through the embedding layer as the input of each moment of the LSTM, and then splices the hidden state output sequences of each position of the forward LSTM and the backward LSTM, that is, ℎ = [ℎ⃗ ; ℎ ⃖ ]. The new sequence contains both historical information and future information, which can further improve the accuracy of recognition. 3.3.2. CRF Layer Conditional Random Field (CRF) is a distribution model that takes the input sequence as a condition and then obtains another set of conditional probabilities of the output sequence. It is widely used in word segmentation, part-of-speech tagging, and named entity recognition. Input the label distribution probability obtained through bidirectional LSTM to the CRF layer, and then output the corresponding label sequence of each word. If the CRF layer is not used as the constraint, the tag with the highest probability of each word is taken as the output when the label is output, which is easy to generate a tag sequence that does not conform to common sense. For example, the subject tag follows the predicate tag. By calculating the transition probability between tags, CRF can obviously filter out these error outputs. In this paper, CRF is used to establish the output of the whole sentence, and the CRF model is used to score the labels of words in the sentence. The tag sequence with the highest score is output. The scoring formula can be expressed as: 𝑠 𝑋, 𝑦 = ∑ 𝐴 +∑ 𝑃 , (7) , , In Formula (7), A is the transfer matrix of (k+2)*(k+2), 𝐴 , is the transfer probability from tag 𝑦 to tag 𝑦 , where k is the number of tag categories. P is the emission matrix of n * (k+2), 𝑃 , is the emission probability of the tag 𝑦 obtained from the word 𝑥 , where n is the length of the sequence. 3.3.3. Word-based Tagging Triplets have corresponding role components in the requirement statements of airborne software. The subject is usually the variable name in the airborne software, the predicate is generally a verb, and the object is the data related to the subject. According to the above analysis, this paper uses the POS feature and the BIO annotation method to build a neural network. By training the neural network to learn the relationship between triplets and parts of speech, the named entity recognition performance of the model can be effectively improved. We use the jieba to tag the part of speech of the requirement statement. Since jieba supports adding custom dictionaries, we can supplement the custom dictionaries to cover more comprehensive vocabulary in specific fields and improve the accuracy of tool tagging. BIO tagging is a kind of union tagging. Specifically, B-X represents that the element is of type X and is located at the beginning of the segment of this type, I-X represents that the element is of type X and is located at the middle or end of the segment of this type, and O represents that it is not an entity type that needs to be tagged. 3.3.4. Model Training In the process of model training, we use one-hot coding, input the tagged words as samples, and then use the embedding layer to convert the coding into a low dimensional, dense vector to solve the feature sparse problem. To avoid overfitting during training, we add dropout[9] to the LSTM layer with the parameter set to 0.5. The optimizer chooses Adam[10], using stochastic gradient descent algorithm with a learning rate of 0.001 for 100 epochs. The Bi-LSTM-CRF training model is shown in Figure 4. 32 Figure 4: Bi-LSTM-CRF model 3.4. Triplet Generation Algorithm Since the output of the model during prediction is a sequence of BIO tags, the tags need to be converted into corresponding triplets. The statements in the requirements document are "subject, predicate, object" or "subject, predicate" structures. In order to extract the triplets of requirement statements, a verb centered algorithm is established to extract the relationships between complex statements. The input of the algorithm is a requirement statement, which extracts single/multiple relationships between entities into triplets. For example, " ip_state 为 3 并且 ip_value 为 1。" The extracted triplet is: (ip_state, 为, 3), (ip_value, 为, 1). 3.5. Coverage Criteria Processing Algorithm In view of the requirements for high safety and reliability of large aircraft, according to DO-178C, airborne software of large aircraft is divided into categories A, B, C, D and E[11], and different categories of airborne software correspond to different coverage criteria. Since the engine instruction software is class B software and needs to satisfy the Decision Coverage (DC), this paper designs a DC-based processing algorithm. 3.5.1. Decision Coverage The basic idea of decision coverage is to design enough test cases so that each decision in the program can obtain at least one "true" and one "false", that is, each true or false branch is executed at least once, so it is also called branch coverage. 33 In order to achieve decision coverage, this paper converts each requirement into two test cases, in which all parameters in one test case are true, and in the other test case, all parameters are false. Since in multiple requirement statements, there may be two statements connected by "OR" with the same parameter setting different values, so this paper adopts the strategy that if the same parameter exists above, the parameter value will remain unchanged. 3.5.2. Keyword Mapping Table Since the same semantics of the verbs in the demand statement may have multiple representation methods, for example, "is" and "equal to" both indicate setting a value, so this paper replaces the same semantic characters with keywords. Among them, EQ, NEQ, GR, LE, GRE, and LEE correspond to equal to, not equal to, greater than, less than, greater than or equal to, and less than or equal to, respectively, which are used to replace the same semantics and facilitate the processing of triplets. 3.6. Test Case Generation Test cases need to be generated into corresponding formats before they can be used for testing and generating test scripts. A complete test case should include the start flag, requirement number, requirement content, test case content, end flag, etc. This paper uses the method of filling the test case template to fill the requirement number, requirement content and test case content to the corresponding position. 4. Effect verification 4.1. Model effect verification The evaluation indexes of model experimental results are P(Precision), R(Recall) and F1(F- measure)[12]. F1 is the result of the weighted calculation of Precision and Recall, which is used as the comprehensive evaluation index of the model. The calculation formulas of P, R, and F1 are: 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑟𝑒𝑐𝑜𝑔𝑛𝑖𝑧𝑒𝑑 𝑡𝑎𝑔𝑠 (8) 𝑃= × 100%, 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑎𝑔𝑠 𝑟𝑒𝑐𝑜𝑔𝑛𝑖𝑧𝑒𝑑 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑟𝑒𝑐𝑜𝑔𝑛𝑖𝑧𝑒𝑑 𝑡𝑎𝑔𝑠 (9) 𝑅= × 100%, 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑎𝑔𝑠 2×𝑃×𝑅 (10) 𝐹1 = × 100%, 𝑃+𝑅 In this paper, we compare the effect of two tagging methods, BIO and BIO-POS, in identifying triplets of requirement statements. It can be seen from Table 1 that the recall rate of BIO-POS tagging method is significantly higher than that of traditional BIO tagging method, which indicates that BIO- POS tagging method has better performance in the task of identifying triplets of airborne software requirements. Table 1 Comparison of recognition effects under different tagging methods Tagging Precision Recall F1-measure BIO 91.18% 92.53% 91.85% BIO-POS 90.46% 96.18% 93.23% 34 4.2. Test case generation effect verification In this paper, representative test case results are selected for effect verification, and the generation effect is shown in Table 2. In order to save the length of the article, positive test cases are intercepted for each requirement. Table 2 Results of some test cases generated Requirement 1. 当 ipFL 有效性为 VALID 时,左发动机模拟表盘的红线标记应显示为白色。 SET ipFL_state 3 Test Case VERIFY 左发动机模拟表盘的红线标记应显示为白色 2. 当 ipEC 有效性为 VALID,并且 ipEC 的值为 TRUE 时,发动机显示软件应为 Requirement 压缩模式。 SET ipEC_state 3 Test Case SET ipEC_value 1 VERIFY 发动机显示软件应为压缩模式 3.当以下条件满足时,左发动机 N1 指针应显示为白色: 1.ipFL 的有效性为 VALID -AND- Requirement 2a.ipFE 的有效性为 INVALID -OR- 2b.ipFE 的有效性为 VALID,并且 ipFE 的值为 TRUE SET ipFL_state 3 SET ipFE_state 4 Test Case SET ipFE_value 1 VERIFY 左发动机 N1 指针应显示为白色 Requirement 4. 当涉及显示迟滞的 ipFL 的值增加时,左发动机指针应顺时针旋转。 Test Case Test case generation failed Requirement 5. 当 N1 值等于或大于 100%时,N1 指示应以 XXX 显示。 SET N1_value 100 Test Case VERIFY 左发动机指针应顺时针旋转 Requirement 6. 当 ipFRT 的有效性为 VALID 且 1<=ipFRT 的值<=11,对应推力模式应显示。 SET ipFRT_state 3 Test Case SET ipFRT_value 11 VERIFY 对应推力模式应显示 7. 当 ipHP 的有效性为 VALID,并且 ipHP 的值在[33°,35°]范围内时,襟 Requirement 翼卡位应显示为 4。 SET ipHP_state 3 Test Case SET ipHP_value 34 VERIFY 襟翼卡位应显示为 4 Requirement 8. ipFC 的有效性变为 INVALID 且保持 1.2s,通信标志 FMS 应不显示。 SET ipFC_state 4 Test Case WAIT 1.2 VERIFY 通信标志 FMS 应不显示 From the results, we can see that most of the requirement statements have a good conversion effect. In Requirement 5, since there is no reference value for the relevant parameters of a single statement, the corresponding value cannot be set. The next step will be to set the initial value or contact the context to solve such problems. 35 Through the analysis of the requirements document, it can be found that the forms in Table 3 can cover more than 80% of the testable requirement statements, that is to say, as long as the relevant generation algorithms are processed well, most of the conversion results can be generated correctly. 5. Conclusions This paper explores the application of automatic test case generation method based on NLP in the field of airborne software. The strong robustness of Bi LSTM-CRF named entity recognition model and the ability to effectively use past and future features are comprehensively considered. Aiming at the specific corpus in the field of airborne software, the BIO-POS tagging method is used to train the model, and a good effect of named entity recognition is obtained. According to the results of named entity recognition, a verb-centered triplet generation algorithm and a triplet-based coverage criterion processing algorithm are proposed. Experiments show that the correct rate of the test cases generated by the algorithm in this paper is more than 80%. However, when the variables in some sentences do not have corresponding reference values or the sentence patterns are special, the method in this paper will not be able to identify effectively. Therefore, the next step is to study the processing of requirement statements that need to contact the context to obtain initial values and special sentence patterns, so as to further improve the generation effect of test cases. 6. Acknowledgements First of all, I would like to give my heartfelt thanks to all the people who have ever helped me in this paper. My sincere and hearty thanks and appreciations go firstly to my supervisor, Mr. Yang Qinghua, whose suggestions and encouragement have given me much insight into these studies. In addition, thanks to COMAC Shanghai Aircraft Design and Research Institute and my corporate mentor Mr. Yi Zichun. Thank you for providing me with research direction and experimental materials. Finally, I am really grateful to all those who devote much time to reading this thesis and give me much advice, which will benefit me in my later study. 7. References [1] Veera, Prm , et al. "Req2Test - Graph Driven Test Case Generation for Domain Specific Requirement." (2018). [2] Gao, Y. , G. An , and C. Zhi . Verification and validation of flight control system airborne software. 2021. [3] Nagpal, K. , and R. Chawla . "IMPROVEMENT OF SOFTWARE DEVELOPMENT PROCESS: A NEW SDLC MODEL.". [4] P Kulkarniȧ, and Y Joglekarȧ. "Generating and Analyzing Test cases from Software Requirements using NLP and Hadoop.". [5] Dwarakanath, A. , and S. Sengupta . "Litmus: Generation of Test Cases from Functional Requirements in Natural Language." Springer Berlin Heidelberg Springer Berlin Heidelberg, 2012. [6] Morgan, Charles P., et al. "ATCG: An Automated Test Case Generator." Journal of Information Technology Management 27.3 (2016): 112-120. [7] Long Qiuxian. Automatic Function Testing Algorithm Based on Deep Learning .2018.Tianjin University, MA thesis. DOI:10.27356/d.cnki.gtjdu.2018.002016. [8] Lai, Siwei, et al. "Recurrent convolutional neural networks for text classification." Twenty-ninth AAAI conference on artificial intelligence. 2015. [9] Hinton, G. E. , et al. "Improving neural networks by preventing co-adaptation of feature detectors." arXiv e-prints (2012). [10] Cheng Ming, et al. "Fishery standard named entity recognition with integrated attention mechanism and BiLSTM+CRF." Journal of Dalian Ocean University 35.02(2020):296-301. doi:10.16535/j.cnki.dlhyxb.2019-289. 36 [11] Wang, L. Z. , Y. Wang , and X. U. Zhang-Hou . "Software Verification and Validation of Tokamak Safety-critical Instrumentation and Control System." Journal of Changchun Normal University (2018). [12] Chen Yanyu, and Du Ming. "Insurance Named Entity Recognition based on Bi-LSTM-CRF." Intelligent Computer and Applications 8.03(2018):111-114. 37