Identifying scientific problems and solutions: Semantic network analytics and deep learning Lu Huang1,2, Xiaoli Cao3,4,ο€ͺ, Hang Ren1, Chunze Zhang1,5 and Zhenxin Wu3,4 1 School of Economics, Beijing Institute of Technology, Beijing, China, 100081 2 Digital Economy and Policy Intelligentization Key Laboratory of Ministry of Industry and Information Technology, Beijing Institute of Technology, Beijing, China, 100081 3 National Science Library, Chinese Academy of Sciences, Beijing, China, 100190 4 Department of Information Resources Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing, China, 100190 5 Zhejiang Sineva Intelligent Technology Co., Ltd, Zhejiang, China, 314499 Abstract As critical building blocks of scientific research, scientific problems and solutions are put forward to reveal the existing issues and primary methods in scientific and technological practice. In this paper, we proposed a novel method for identifying scientific problems and solutions using semantic network analytics and deep learning. Firstly, the BERT-CRF model constructed is combined with BIO tagging to identify four entity types: research object, problem, solution, and fundamental principle. Then, the Levenshtein algorithm is applied to align entities, and a knowledge network is constructed integrating semantic information and co-occurrence associations, comprehensively and accurately depicting the relations between entities. Finally, the correlations between the four entity types are thoroughly explored using semantic network analytics and topological structure analytics. A case study on artificial intelligence domain demonstrates the reliability of the proposed methodology, and the results provide intelligent support for raising and solving scientific problems in the field. Keywords 1 Scientific problems and solutions, Semantic network analytics, BERT-CRF, Knowledge network, Entity identification 1. Introduction Identifying scientific problems and solutions can help scholars map the scientific field, enhance the speed of information retrieval and processing, The rapid increase in scientific articles lays a and offer reference solutions for real-world strong foundation for identifying problems and issues in industrial practices [2,3]. solutions in a field [1]. The intelligent mining of Some scholars have mentioned that problems scientific problems and solutions aims to identify and corresponding solutions constitute the "key the real-world issues existing in the scientific and insights" within scientific articles [4]. Many technological practices of a field, find significant studies focus on extracting key corresponding solutions, and explore the viewpoints (e.g., research problems, and underlying theoretical foundations. It facilitates a solutions) from scientific papers using entity deep exploration of the intrinsic logical extraction techniques [5,6]. However, these relationships among research objects, problems, methods usually involve supervised learning on solutions, and fundamental principles. ο€ͺ Corresponding Author Joint Workshop of the 5th Extraction and Evaluation of Knowledge Entities from Scientific Documents and the 4th AI + Informetrics (EEKE-AII2024), April 23~24, 2024, Changchun, China and Online EMAIL: huanglu628@163.com (Lu Huang); cxl163990307@163.com (Xiaoli Cao); renhang0988@163.com (Hang Ren); zhangchunze@sineva.com.cn (Chunze Zhang); wuzx@mail.las.ac.cn (Zhenxin Wu) ©️ 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 60 pre-annotated datasets, a process that requires To address these concerns, we propose a significant resources for domain-specific novel framework for identifying problems and annotation [7]. Deep learning is an efficient and solutions using semantic network analytics and accurate technology for extracting information deep learning. The proposed method advances from complex unstructured data (e.g. graphics, the fields of entity extraction and knowledge text) and converting data into vector graph analysis by delineating three specific representations [8,9]. Combining deep learning functions: 1) the BERT-CRF model is with bibliometrics is often used to address constructed to generate textual representations problems in science, technology, and innovation with enhancing semantics, and it is combined (ST&I) management [10,11]. As an important with BIO tagging to identify four entity types: area of deep learning, text representation learning research object, problem, solution, and effectively extracts information from text data fundamental principle, improving the accuracy of has been widely applied in data mining [12]. identifying entities; 2) the Levenshtein algorithm Additionally, some scholars employ methods is applied to align entities, and the semantic such as keyword network analysis and citation relations and co-occurrence associations are analysis to construct academic knowledge graphs, integrated to construct knowledge network, deeply exploring the relationships among comprehensively and accurately revealing the knowledge entities in papers [13,14]. For relationships between entities; 3) the example, Zhang et al. [15] integrate multiple combination of semantic network analytics and relationships such as co-occurrence, citation, and topological structure analytics are applied to co-authorship to explore the processes of thoroughly explore the correlations between the knowledge creation, knowledge transfer, and four entity types. We use a case study on artificial other knowledge evolution dynamics. However, intelligence domain to demonstrate the reliability these researches ignore the specific semantic of our proposed method. functions of keywords in different contexts, leading to a lack of accuracy in the representation 2. Method of knowledge structures [16]. Semantic network analysis incorporates the rich semantic information of keywords into network analysis, The framework of identifying scientific problems and solutions is shown in Figure 1. providing a more intensive and accurate analysis for the mining of scientific problems and solutions [17,18]. Figure 1: Framework of identifying scientific problems and solutions 61 2.1. Entity identification of scientific texts into feature vector matrices. The Bidirectional Encoder Representation from scientific problems and solutions Transformers (BERT), a deep learning 2.1.1. Entity concept construction technology based on the bidirectional based on Structural Topic Model transformer architecture [23], can capture contextual semantic information and latent The paper data gathered is acquired from the relationships from large-scale corpora, achieving Web of Science (WoS) and pre-processed via more precise textual semantic representations VantagePoint (VP) [19]. [24]. BERT can process vast amounts of textual Then, four abstract entity concepts are corpus data with a need for minimal training constructed based on the concept of Structural datasets. Combined with the Conditional Topic Model (STM), which includes "research Random Field (CRF) model, it can effectively object", "problem", "solution" and "fundamental improve the efficiency and quality of text principle". These entities serve as a structured sequence labelling [25]. representation of knowledge, characterizing First, the BERT model is trained on each of scientific problems and solutions. The STM, an the four types of entities by using a pre-training advancement over the Latent Dirichlet dataset and setting the model parameters. To Allocation (LDA) topic model, extracts topics acquire enhanced vector representations that from document-level metadata and establishes include contextual positional information, this latent connections between these topics and the paper integrates the CRF model [26] with BERT document data [20]. This approach facilitates the based on the embedding during the training discovery of hidden knowledge structures within process, further training and optimizing the texts and the accurate delineation of implicit configuration of feature function. Moreover, the relationships among them [21]. Therefore, this multi-head self-attention mechanism of BERT is study employs the STM to construct entity applied to better capture contextual semantic concepts. information [27], obtaining enhanced textual Within this knowledge structure, "research semantic representation vectors. When the model object" refers to the research subfields, serving as converges, the well-trained BERT-CRF model is the starting point of the research; "problem" generated. focuses on the scientific issues to be resolved and Then, the well-trained model is used to the goals to be achieved, jointly defining the transform the dataset into vector representations problems space with the research object; with enhanced contextual positional information "solution" describes the overall solution to the and multiple semantic information. problem, representing the key steps towards achieving the goals; "fundamental principle" refers to the theoretical foundation underlying the 2.1.3. Entity extraction solution methods. The four-entity concept constructed provides a comprehensive analytical The purpose of this section is to extract framework reflecting the essence of research scientific problem and solution entities by literature. By capturing and mining these four transforming textual semantic vectors into key entities, this study can excavate the scientific probabilistic representations of text sequence problems and solutions within the paper data, labeling using BERT-CRF model and BIO unveiling research hotspots in the domain. tagging. Finally, the four types of entities in a small First, the well-trained BERT-CRF model is number of literatures are manually identified and applied to process the text vectors using the a pre-trained dataset is generated based on the SoftMax function [28], generating predicted BIO tagging [22]. labels corresponding to the text sequence. Assuming the sentence length is 𝑛, and the input text sequence is represented as 𝑋 = 2.1.2. Text vector acquisition based (π‘₯1 , π‘₯2 , β‹― , π‘₯𝑛 ) , the corresponding predicted on BERT-CRF model label sequence is represented as π‘Œ = (𝑦1 , 𝑦2 , β‹― , 𝑦𝑛 ). The method for calculating the This section aims to construct an enhanced final prediction score π‘ π‘π‘œπ‘Ÿπ‘’(𝑋, π‘Œ) for the text semantic BERT-CRF model, transforming sequence 𝑋 is: 62 𝑛 𝑛 2.2.2. Constructing knowledge π‘ π‘π‘œπ‘Ÿπ‘’(𝑋, π‘Œ) = βˆ‘ 𝑃π‘₯𝑖,𝑦𝑖 + βˆ‘ π‘‡π‘¦π‘–βˆ’1 ,𝑦𝑖 (1) 𝑖=1 𝑖=2 network integrating semantic where 𝑃π‘₯𝑖,𝑦𝑖 represents the probability of the text and co-occurrence relations sequence element π‘₯𝑖 being predicted as 𝑦𝑖 and π‘‡π‘¦π‘–βˆ’1 ,𝑦𝑖 is the score for the transition from label The purpose of this section is to construct a π‘¦π‘–βˆ’1 to label 𝑦𝑖 . heterogeneous knowledge network including Thus, the probability distribution matrix four types of entities, integrating semantic and corresponding to the text sequences is obtained. co-occurrence information among entities, and Then, this paper integrates BIO tagging [22] improving the accuracy of relationship into the CRF layer to generate four sequence identification between entities. labeling matrices of "research object", "problem", First, the cosine distance between entity "solution" and "fundamental principle" based on vectors is used to measure the semantic similarity the probability distribution matrix. Finally, the between entities. The calculation method of the entity categories corresponding to the text semantic similarity π‘ π‘–π‘š(π‘Ž, 𝑏) between entity a sequences are identified based on the sequence and b is: annotation results. π‘ˆ(π‘Ž)𝑇 π‘ˆ(𝑏) (2) π‘ π‘–π‘š(π‘Ž, 𝑏) = ||π‘ˆ(π‘Ž)||2 βˆ™ ||π‘ˆ(𝑏)||2 where π‘ˆ(π‘Ž) and π‘ˆ(𝑏) denote the textual vectors 2.2. Knowledge network of entities a and b respectively. construction Then the co-occurrence relation between entities is obtained based on the literature data. The co-occurrence association between entities a After identifying the four types of entities and b is denoted as π‘π‘œ_𝑒𝑛𝑑𝑖𝑑𝑦(π‘Ž, 𝑏) represented corresponding to scientific problems and by the number of co-occurrences between a and solutions, a knowledge network containing b. multiple semantic and structural information Finally the entropy weight method [30] is between entities is constructed. This part introduced to integrate the semantic information includes two sections: 1) Entity alignment based and co-occurrence information between entities on Levenshtein algorithm and 2) Constructing and get the relation between entities in the knowledge network integrating multiple relations. knowledge network. Link weight π‘€π‘’π‘–π‘”β„Žπ‘‘(π‘Ž, 𝑏) between entities a and b can be calculated as: 2.2.1. Entity alignment based on π‘€π‘’π‘–π‘”β„Žπ‘‘(π‘Ž, 𝑏) = 𝛼 βˆ— π‘ π‘–π‘š(π‘Ž, 𝑏) + (3) Levenshtein algorithm 𝛽 βˆ— π‘π‘œ_𝑒𝑛𝑑𝑖𝑑𝑦(π‘Ž, 𝑏) where 𝛼 and 𝛽 are coefficients of semantic Considering that entities extracted from similarity and co-occurrence correlation obtained by entropy weight method respectively, and different literature may have multiple names for the same entity, we apply the Levenshtein 𝛼+𝛽=1. method for measuring the difference between In this section we generate a knowledge two sequences, to disambiguate. Levenshtein network 𝐺 = (𝑉, 𝐸, π‘Š) containing rich semantic algorithm can consider both the contextual and structural information among entities where information and semantic similarity, enhancing 𝑉 𝐸 and π‘Š denote the entities edges and edge the accuracy of entity alignment [29]. weights in 𝐺 respectively. Furthermore, this paper constructs an entity dictionary based on expert knowledge, which 2.3. Scientific problems-solutions is used for further checking and proofreading correlation analysis of entity alignment results. This part aims to identify the primary research problems corresponding to the core research objects and find the main solutions and theoretical basis based on the topological structure analysis of knowledge networks. 63 2.3.1. Core research objects Similarly this paper identifies the main solutions corresponding to the primary problem identification based on 𝑃𝑖 and the main fundamental principle for the PageRank algorithm corresponding solutions. Finally we generate a series of complete In this section PageRank algorithm is used to chains of scientific problems and solutions measure the importance score of research objects which can be represented as "research object - and thus identify core research objects in the problem - solution - fundamental principle". knowledge network. This algorithm fully considers multiple factors including the local 3. Case study topological structure and semantic information of the target node and the importance of the nodes Artificial Intelligence (AI) is a connected with it [31] which has been widely multidisciplinary domain composed of a diverse applied to identify core nodes in various complex and heterogeneous network of innovations. It has knowledge networks [32]. Therefore we use emerged as a significant force driving PageRank algorithm to rank the importance of technological innovation [33]. This field research objects in knowledge network. The encompasses many emerging research questions calculation method of the importance score and research methods offering extensive data 𝑃𝑅(π‘Ž) of research object a can be calculated as: support for empirical analysis. Therefore this 𝑃𝑅(𝑇𝑗 ) (4) 𝑃𝑅(π‘Ž) = 𝑑 Γ— βˆ‘π‘› 𝑗=1 𝐢(𝑇 ) + (1 βˆ’ 𝑑) paper analyzed the scientific problems and 𝑗 where d is the damping factor (0 ≀ 𝑑 ≀ 1), solutions in-depth in the AI domain to verify the effectiveness of the proposed method. generally 0.85, 𝑇𝑗 denotes the entity linked to the research object π‘Ž, 𝐢(𝑇𝑗 ) is the number of entities linked with 𝑇𝑗 , and 𝑛 is the number of entities 3.1. Entity identification based on linked with research object π‘Ž. BERT-CRF model Finally we sort the research objects in the knowledge network based on the importance Following the study of Liu et al. [34] a total score and select the top-K research objects as the of 375608 papers published between 2021 to core research objects. The core research objects 2023 were retrieved from the Web of Science set 𝑂 is represented as: (WoS). Then VantagePoint (VP) was used to 𝑂 = {𝑂1 , β‹― , 𝑂𝑖 , β‹― , 𝑂𝐾 } (5) process titles and abstracts of papers. Finally a where 𝑂𝑖 denotes the i-th core research object in total of 310456 papers were retained as the the knowledge network and K is the number of textual corpus and a total of 3000 papers were core research objects that has been identified. randomly selected in proportion to the publication year as the pre-training dataset. 2.3.2. Knowledge structure-based Based on the BIO tagging the titles and abstracts of the pre-training dataset were annotated with entity correlation analysis "research object", "problem", "solution", and "fundamental principle". After identifying the core research objects Following the design in Section 2.1.2 this within the knowledge network this section will study constructed an enhanced semantic BERT- deeply analyze the correlation between entities in CRF model based on the textual dataset to the domain based on the topological structure transform text data into feature vectors. During analysis of the knowledge network. the experimental process the performance of the First based on the link weights between the model was assessed based on evaluation metrics core research object 𝑂𝑖 and the research (Precision Recall and F1-score) [35] with questions the primary problems corresponding model parameters being continuously adjusted. to 𝑂𝑖 are identified. The primary problems set 𝑃 The optimal model was determined when the is represented as: evaluation metrics reached their maximum 𝑃 = {𝑃1 , β‹― , 𝑃𝑖 , β‹― , π‘ƒπ‘š } (6) values. Finally when the Precision of the model where 𝑃𝑖 denotes the i-th primary problem reaches 89.2% the Recall reaches 87.4% and the corresponding of 𝑂𝑖 and m is the number of F1-score reaches 88.3% the optimal model was primary problems. 64 generated. The results show that the trained comprehensively considered the semantic BERT-CRF model exhibits better performance. similarity between entities and expert knowledge Finally the trained BERT-CRF model and to achieve entity synonym alignment. The BIO tagging method were used to identify four Levenshtein algorithm is employed for aligning types of entities. We identified 24 254 "research entities within the same category and across object" entities 23 839 "problem" entities different categories. Finally a total of 887 20 670 "solution" entities and 17 550 "research object" entities 4 136 "problem" "fundamental principle" entities from the dataset. entities 13 858 "solution" entities and 5 518 "fundamental principle" entities were obtained. 3.2. Constructing knowledge Then we integrated semantic similarity and structural similarity between entities to build a network in AI knowledge network in AI domain. The statistical information on the edges between each type of After identifying the four types of entities entity in the knowledge network is shown in corresponding to scientific problems and Table 1. solutions from the text dataset this paper Table 1 Descriptive statistics of edges in knowledge network Research Fundamental Problem Solution object principle Research / 12683 9105 8468 object Problem 12683 / 17691 10565 Solution 9105 17691 / 13783 Fundamental 8468 10565 13783 / principle According to the results of the PageRank 3.3. Entity correlation analysis algorithm, the hot research objects in artificial intelligence domain mainly include deep learning, neural network, medical image, facial image, The next stage was to analyze the correlation robot, and electric system. Following the design between entities. Following Section 2.3.1, the in Section 2.3.2, the top-2 problems were PageRank algorithm was applied to calculate the identified corresponding to the core research important scores of research objects and thus objects within the knowledge network. identify the core research objects in the Finally, we explored the correlations between knowledge network. The hot research topics in entities and generated a series of complete chains the artificial intelligence domain can be explored including four types of entities. The partial entity according to the core research objects. correlation results of Top-6 core research objects are shown in Figure 2. 65 Figure 2: Entity correlation results Several observations can be acquired based on artificial intelligence. On the other hand, it is able the above results. The research object represents to detect the corresponding solutions to the real a subfield, where the problems refer to the issues problems in the scientific and technological contained within that subfield, the solution refers practice and explore the theoretical basis behind to the methods or technologies required to solve them, and thus realize the in-depth excavation of the problems, and the fundamental principles the intrinsic logical connection among scientific refer to the inherent principles involved in the problems, solutions and fundamental principles. implementation process of the methods and technologies. The "research object" and 3.4. Validation "problem" together constitute the complete scientific problem, and the "solution" and In this part, we conducted the quantitative and "fundamental principle" together constitute the qualitative methods to verify the reliability of our complete solution. For example, for the identified "classification - image classification - neural proposed method and entity identification results. network – feature extraction", it refers that the neural network can be used to solve image 3.4.1. Verification of the trained classification problems through feature model extraction [36]. This paper identifies a complete chain of To quantitatively verify the advantages of the "research object - problem - solution - combination of BERT-CRF model trained in this fundamental principle". On the one hand, it can paper and the BIO tagging method, we select identify the core research objects and three advanced models, ALBERT [37], SciBERT corresponding primary problems in the field of [38], and XLNet [39], for comparison 66 experiments. Referencing the model parameters optimal effect. The specific parameter settings of of BERT-CRF in this paper, the three models the models are shown in Table 2. were fine-tuned respectively to achieve the Table 2 Parameter configurations of models Our ALBERT SciBERT XLNet method Maximum input 64 64 64 64 length Training epoch 30 40 30 35 Batch size 4 16 4 8 Number of layers 12 12 12 12 Learning rate 1e-5 5e-6 1e-5 1e-6 CRF learning rate 100 / 50 / multiplier Then, the performance of our method was comparing with three state-of-the-art methods. validated based on Recall and Precision by The comparison results are given in Table 3. Table 3 The comparison of prediction performance Fundamental Research objects Problems Solutions Methods principles Recall Precision Recall Precision Recall Precision Recall Precision Our 0.980 0.965 0.964 0.942 0.856 0.877 0.724 0.759 method ALBERT 0.934 0.936 0.924 0.896 0.848 0.815 0.638 0.702 SciBERT 0.928 0.955 0.906 0.883 0.834 0.827 0.704 0.740 XLNet 0.962 0.919 0.944 0.907 0.838 0.859 0.680 0.741 It can be seen that our method outperforms respectively and the Precision value increases by baseline methods in two evaluation indicators. 5.7% 1.9% and 1.8% respectively. These results Concretely in the entity recognition of the demonstrate the combination of BERT-CRF research objects the Recall value of our method model and BIO tagging used in this paper has increases by 4.6% 5.2% and 1.8% respectively achieved good performance on our dataset. and the Precision value increases by 2.9% 1.0% and 4.6% respectively. In the problems 3.4.2. Verification of entity identification the Recall value of our method increases by 4.0% 5.8% and 2.0% respectively identification and the Precision value increases by 4.6% 5.9% and 3.5% respectively. In the solutions In this section the qualitative method was identification the Recall value of our method applied to verify the reliability of the entity increases by 0.8% 2.2% and 1.8% respectively identification results by searching relevant articles and the Precision value increases by 6.2% 5.0% published in 2021 and beyond. Table 4 shows the and 1.8% respectively. In the entity identification detailed empirical evidence of partial entity of fundamental principles the Recall value of our identification results. method increases by 8.6% 2.0% and 4.4% 67 Table 4 Relevant documentary proof of partial entity identification results Research object - problem - No solution - fundamental Relevant documentary proof principle classification - image In 2021, Nadendla et al. proposed a neural network-based 1 classification - neural network – classifier by feature extraction and classification to solve the feature extraction image classification problem [36]. clustering - deep clustering In 2022, Hou et al. used a dual convolutional autoencoder to performance - dual 2 extract features of multi-levels and fuse them to improve convolutional autoencoder - the performance of deep clustering [40]. multi-level feature fusion medical image - medical image In 2024, Tamilmani et al. used the convolutional neural segmentation - convolutional 3 network with the optimal network topology to solve the neural network - optimal problem of medical image segmentation [41]. network topology facial image - face recognition - In 2022, Wei-Jie et al. applied the convolutional neural 4 convolutional neural network - network by extracting the masked face features to solve the feature extraction problem of masked facial recognition [42]. robot - local motion planning - In 2023, Garrote et al. proposed a deep reinforcement 5 deep reinforcement learning - learning strategy based on a reward model to solve the local reward model motion planning problem of robots [43]. electrical power system - In 2022, Gu et al. used the graph neural network based on stability assessment - graph 6 the self-attention mechanism to evaluate the stability of the neural network - self-attention power system [44]. mechanism Table 4 demonstrates the alignment between explore the linkages among research objects our entity identification results and the literature. problems solutions and fundamental principles. Therefore the four types of entities identified and Semantic network analytics and deep learning the relations between entities in this paper are were combined to identify scientific problems reliable and the effectiveness of the proposed and solutions from scientific text which provides method has been further verified. technical intelligence for field scientific innovation and industrial technology upgrading. 4. Conclusion In addition this method can not only explore the association between entities reveal the primary research problems and corresponding solutions In this paper we proposed a novel in the field of artificial intelligence but also methodology to identify scientific problems and discover the knowledge structure in this field and solutions using semantic network analytics and promote the development of scientific deep learning. First the deep learning method is knowledge network analysis methods. applied to extract textual semantic information Several limitations of our proposed method and identify entities capturing the hidden require further improvement: 1) The scientific semantic association in different textual contexts and technological output of a certain field effectively and improving the accuracy of the includes not only papers but also patents and entity recognition. Then the machine learning product data. Further research should be method was used to construct the knowledge conducted based on more data sources; 2) The network fully considering the knowledge methodology of entity alignment can be further structure and semantic structure between entities optimized. More advanced methods and and thus containing more abundant information. professional expert knowledge could be Finally the PageRank algorithm and semantic introduced in the future to improve the efficiency network analytics were introduced to deeply and quality of entity alignment; 3) The evolution mechanism of "research object - problem - 68 solution - fundamental principle" needs to be [9] R. Xiang, E. Chersoni, Q. Lu, et al, Lexical further explored. data augmentation for sentiment analysis. Journal of the Association for Information 5. Acknowledgements Science and Technology, 72(11): 1432- 1447, 2021. [10] X. Chen, P. Ye, L. Huang, et al, Exploring This work was supported by the National science-technology linkages: A deep Nature Science Foundation of China Funds learning-empowered solution. Information (Grant No. 72274013), and Fundamental Processing & Management, 60(2): 103255, Research Funds for the Central Universities. 2023. [11] J. Chen, Y. Chen, Y. He, et al, A classified 6. References feature representation three-way decision model for sentiment analysis. Applied [1] Y. Zhang, M. Wang, M. Saberi, et al, From Intelligence, 52(7): 7995–8007, 2022. big scholarly data to solution-oriented [12] X. Xi, F. Ren, L. Yu, et al, Detecting the knowledge repository. Frontiers in Big Data, technology's evolutionary pathway using 2: 38, 2019. HiDS-trait-driven tech mining strategy. [2] P. Li, W. Lu, Q. Cheng, Generating a related Technological Forecasting and Social work section for scientific papers: an Change, 195: 122777, 2023. optimized approach with adopting problem [13] J. Wang, Q. Cheng, W. Lu, et al, A term and method information. Scientometrics, function–aware keyword citation network 127(8): 4397-4417, 2022. method for science mapping analysis. [3] Z. Luo, W. Lu, J. He, et al, Combination of Information Processing & Management, research questions and methods: A new 60(4): 103405, 2023. measurement of scientific novelty. Journal [14] G. Garechana, R. RΓ­o-Belver, E. Zarrabeitia, of Informetrics, 16(2): 101282, 2022. et al, TeknoAssistant: a domain specific tech [4] G. Chen, J. Peng, T. Xu, et al, Extracting mining approach for technical problem- entity relations for β€œproblem-solving” solving support. Scientometrics, 127(9): knowledge graph of scientific domains 5459-5473, 2022. using word analogy. Aslib Journal of [15] X. Zhang, Q. Xie, C. Song, et al, Mining the Information Management, 75(3): 481-499, evolutionary process of knowledge through 2023. multiple relationships between keywords. [5] V. Giordano, G. Puccetti, F. Chiarello, et al, Scientometrics, 127(4): 2023-2053, 2022. Unveiling the inventive process from [16] X. Cao, X. Chen, L. Huang, et al, Detecting patents by extracting problems, solutions technological recombination using semantic and advantages with natural language analysis and dynamic network analysis. processing. Expert Systems with Scientometrics, Doi: 10.1007/s11192-023- Applications, 229: 120499, 2023. 04812-4, 2023. [6] R. B. Mishra, H. Jiang, Classification of [17] J. Liu, Z. Zhou, M. Gao, et al, Aspect problem and solution strings in scientific sentiment mining of short bullet screen texts: evaluation of the effectiveness of comments from online TV series. Journal of machine learning classifiers and deep neural the Association for Information Science and networks. Applied Sciences, 11(21): 9997, Technology, 74(8): 1026-1045, 2023. 2021. [18] J. Won, D. Lee, J. Lee, Understanding [7] H. Liu, T. Brailsford, J. Goulding, et al, experiences of food-delivery-platform Towards idea mining: problem-solution workers under algorithmic management phrase extraction from text. International using topic modeling. Technological Conference on Advanced Data Mining and Forecasting and Social Change, 190: Applications (pp. 3-14). 2022. 122369, 2023. [8] Y. Zhang, J. Lu, F. Liu, et al, Does deep [19] L. Huang, X. Chen, Y. Zhang, et al, learning help topic extraction? A kernel k- Identification of topic evolution: network means clustering method with word analytics with piecewise linear embedding. Journal of Informetrics, 12(4): representation and word embedding. 1099–1117, 2018. Scientometrics, 127(9): 5353-5383, 2022. 69 [20] S. Bai, D. Yu, C. Han, et al, Enablers or advanced entity recognition. Applied inhibitors? Unpacking the emotional power Sciences, 2023, 13(19): 10918, 2023. behind in-vehicle AI anthropomorphic [30] B. Jiang, W. Tang, M. Li, et al, Assessing interaction: A dual-factor approach by text land resource carrying capacity in China’s mining. IEEE Transactions on Engineering main grain-producing areas: Spatial– Management, Doi: temporal evolution, coupling coordination, 10.1109/TEM.2023.3327500, 2023. and obstacle factors. Sustainability, 15(24): [21] Z. Zhang, H. Mu, S. Huang, Playing to save 16699, 2023. sisters: how female gaming communities [31] P. Marjai, A. Kiss, Influential Performance foster social support within different of Nodes Identified by Relative Entropy in cultural contexts. Journal of Broadcasting & Dynamic Networks. Vietnam Journal of Electronic Media, 67(5): 693-713, 2023. Computer Science, 8(1): 93-112, 2021. [22] J. Wei, T. Hu, J. Dai, et al, Research on [32] T. Liang, C. Li, H. Li, Top-k Learning named entity recognition of adverse drug Resource Matching Recommendation reactions based on NLP and deep learning. Based on Content Filtering PageRank. Frontiers in Pharmacology, 14: 1121796, Computer Engineering, 43(2): 220-226, 2023. 2017. [23] Z. Xue, G. He, J. Liu, et al, Re-examining [33] K. Song, A selection method for industry- lexical and semantic attention: Dual-view university cooperation from the perspective graph convolutions enhanced BERT for of patentometrics. Library Tribune, 41(11): academic paper rating. Information 19-27, 2021. Processing & Management, 60(2): 103216, [34] N. Liu, P. Shapira, X. Yue, Tracking 2023. developments in artificial intelligence [24] X. Zhu, Z. Kuang, L. Zhang, A prompt research: constructing and applying a new model with combined semantic refinement search strategy. Scientometrics, 126(4): for aspect sentiment analysis. Information 3153-3192, 2021. Processing & Management, 60(5): 103462, [35] L. LΓΌ, T. Zhou, Link prediction in complex 2023. networks: A survey. Physica A: statistical [25] C. Zhang, Improved word segmentation mechanics and its applications, 390(6): system for Chinese criminal judgment 1150-1170, 2011. documents. Applied Artificial Intelligence, [36] H. R. Nadendla, A. Srikrishna, K. G. Rao, 38(1): 2297524, 2024. Rider and Sunflower optimization-driven [26] K. Gupta, A. Ahmad, T. Ghosal, et al, A neural network for image classification. BERT-based sequential deep neural Web Intelligence. IOS Press, 19(1-2): 41-61, architecture to identify contribution 2021. statements and extract phrases for triplets [37] J. Li, Q. Huang, S. Ren, et al, A novel from scientific publications. International medical text classification model with Journal on Digital Libraries, Doi: Kalman filter for clinical decision making. 10.1007/s00799-023-00393-y, 2024. Biomedical Signal Processing and Control, [27] Z. Wang, X. Xu, X. Song, et al, 82: 104503, 2023. Multigranularity pruning model for subject [38] S. Shen, J. Liu, L. Lin, et al, SciBERT: A recognition task under knowledge base pre-trained language model for social question answering when general models science texts. Scientometrics, 128(2): 1241- fail. International Journal of Intelligent 1263, 2023. Systems, 2023: 1202315, 2023. [39] J. Sirrianni, E. Sezgin, D. Claman, et al, [28] N. Xu, Y. Liang, C. Guo, et al, Entity Medical text prediction and suggestion recognition in the field of coal mine using generative pretrained transformer construction safety based on a pre-training models with dental medical notes. Methods language model. Engineering, Construction of Information in Medicine, 61(05/06): 195- and Architectural Management, Doi: 200, 2022. 10.1108/ECAM-05-2023-0512, 2023. [40] H. Hou, S. Ding, X. Xu. A deep clustering [29] M. Mansurova, V. Barakhnin, A. Ospan, et by multi-level feature fusion. International al, Ontology-driven semantic analysis of Journal of Machine Learning and tabular data: an iterative approach with Cybernetics, 13(10): 2813-2823, 2022. 70 [41] G. Tamilmani, C. H. Phaneendra Varma, V. Brindha Devi, et al, Medical image segmentation using grey wolf based U-Net with bi-directional convolutional LSTM. International Journal of Pattern Recognition and Artificial Intelligence, Doi: 10.1142/S0218001423540253, 2023. [42] L. C. Wei-Jie, S. C, Chong, T. S. Ong, Masked face recognition with principal random forest convolutional neural network (PRFCNN). Journal of Intelligent & Fuzzy Systems, 43(6): 8371-8383, 2022. [43] L. Garrote, J. Perdiz, U. J. Nunes, Costmap- based local motion planning using deep reinforcement learning. In 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO- MAN). IEEE (pp. 1089-1095). 2023. [44] S. Gu, J. Qiao, Z. Zhao, et al, Power system transient stability assessment based on graph neural network with interpretable attribution analysis. In 2022 4th International Conference on Smart Power & Internet Energy Systems (SPIES). IEEE (pp. 1374-1379). 2022. 71