=Paper= {{Paper |id=Vol-3745/paper7 |storemode=property |title=Identifying Scientific Problems and Solutions: Semantic Network Analytics and Deep Learning |pdfUrl=https://ceur-ws.org/Vol-3745/paper7.pdf |volume=Vol-3745 |authors=Lu Huang,Xiaoli Cao,Hang Ren,Chunze Zhang,Zhenxin Wu |dblpUrl=https://dblp.org/rec/conf/eeke/HuangCRZW24 }} ==Identifying Scientific Problems and Solutions: Semantic Network Analytics and Deep Learning== https://ceur-ws.org/Vol-3745/paper7.pdf
                         Identifying scientific problems and solutions: Semantic network
                         analytics and deep learning
                         Lu Huang1,2, Xiaoli Cao3,4,ο€ͺ, Hang Ren1, Chunze Zhang1,5 and Zhenxin Wu3,4
                         1
                           School of Economics, Beijing Institute of Technology, Beijing, China, 100081
                         2
                            Digital Economy and Policy Intelligentization Key Laboratory of Ministry of Industry and Information
                            Technology, Beijing Institute of Technology, Beijing, China, 100081
                         3
                           National Science Library, Chinese Academy of Sciences, Beijing, China, 100190
                         4
                            Department of Information Resources Management, School of Economics and Management, University
                            of Chinese Academy of Sciences, Beijing, China, 100190
                         5
                           Zhejiang Sineva Intelligent Technology Co., Ltd, Zhejiang, China, 314499

                                          Abstract
                                          As critical building blocks of scientific research, scientific problems and solutions are put
                                          forward to reveal the existing issues and primary methods in scientific and technological
                                          practice. In this paper, we proposed a novel method for identifying scientific problems and
                                          solutions using semantic network analytics and deep learning. Firstly, the BERT-CRF model
                                          constructed is combined with BIO tagging to identify four entity types: research object, problem,
                                          solution, and fundamental principle. Then, the Levenshtein algorithm is applied to align entities,
                                          and a knowledge network is constructed integrating semantic information and co-occurrence
                                          associations, comprehensively and accurately depicting the relations between entities. Finally,
                                          the correlations between the four entity types are thoroughly explored using semantic network
                                          analytics and topological structure analytics. A case study on artificial intelligence domain
                                          demonstrates the reliability of the proposed methodology, and the results provide intelligent
                                          support for raising and solving scientific problems in the field.

                                          Keywords 1
                                          Scientific problems and solutions, Semantic network analytics, BERT-CRF, Knowledge
                                          network, Entity identification

                         1. Introduction                                                                                   Identifying scientific problems and solutions can
                                                                                                                           help scholars map the scientific field, enhance the
                                                                                                                           speed of information retrieval and processing,
                             The rapid increase in scientific articles lays a
                                                                                                                           and offer reference solutions for real-world
                         strong foundation for identifying problems and
                                                                                                                           issues in industrial practices [2,3].
                         solutions in a field [1]. The intelligent mining of
                                                                                                                              Some scholars have mentioned that problems
                         scientific problems and solutions aims to identify
                                                                                                                           and corresponding solutions constitute the "key
                         the real-world issues existing in the scientific and
                                                                                                                           insights" within scientific articles [4]. Many
                         technological practices of a field, find
                                                                                                                           significant studies focus on extracting key
                         corresponding solutions, and explore the
                                                                                                                           viewpoints (e.g., research problems, and
                         underlying theoretical foundations. It facilitates a
                                                                                                                           solutions) from scientific papers using entity
                         deep exploration of the intrinsic logical
                                                                                                                           extraction techniques [5,6]. However, these
                         relationships among research objects, problems,
                                                                                                                           methods usually involve supervised learning on
                         solutions,     and     fundamental       principles.

                         ο€ͺ
                           Corresponding Author
                         Joint Workshop of the 5th Extraction and Evaluation of
                         Knowledge Entities from Scientific Documents and the 4th AI +
                         Informetrics (EEKE-AII2024), April 23~24, 2024, Changchun,
                         China and Online
                         EMAIL:         huanglu628@163.com         (Lu         Huang);
                         cxl163990307@163.com (Xiaoli Cao); renhang0988@163.com
                         (Hang Ren); zhangchunze@sineva.com.cn (Chunze Zhang);
                         wuzx@mail.las.ac.cn (Zhenxin Wu)
                                      ©️ 2024 Copyright for this paper by its authors. Use permitted under Creative
                                      Commons License Attribution 4.0 International (CC BY 4.0).




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings

                                                                                                                      60
pre-annotated datasets, a process that requires                To address these concerns, we propose a
significant resources for domain-specific                  novel framework for identifying problems and
annotation [7]. Deep learning is an efficient and          solutions using semantic network analytics and
accurate technology for extracting information             deep learning. The proposed method advances
from complex unstructured data (e.g. graphics,             the fields of entity extraction and knowledge
text) and converting data into vector                      graph analysis by delineating three specific
representations [8,9]. Combining deep learning             functions: 1) the BERT-CRF model is
with bibliometrics is often used to address                constructed to generate textual representations
problems in science, technology, and innovation            with enhancing semantics, and it is combined
(ST&I) management [10,11]. As an important                 with BIO tagging to identify four entity types:
area of deep learning, text representation learning        research object, problem, solution, and
effectively extracts information from text data            fundamental principle, improving the accuracy of
has been widely applied in data mining [12].               identifying entities; 2) the Levenshtein algorithm
    Additionally, some scholars employ methods             is applied to align entities, and the semantic
such as keyword network analysis and citation              relations and co-occurrence associations are
analysis to construct academic knowledge graphs,           integrated to construct knowledge network,
deeply exploring the relationships among                   comprehensively and accurately revealing the
knowledge entities in papers [13,14]. For                  relationships between entities; 3) the
example, Zhang et al. [15] integrate multiple              combination of semantic network analytics and
relationships such as co-occurrence, citation, and         topological structure analytics are applied to
co-authorship to explore the processes of                  thoroughly explore the correlations between the
knowledge creation, knowledge transfer, and                four entity types. We use a case study on artificial
other knowledge evolution dynamics. However,               intelligence domain to demonstrate the reliability
these researches ignore the specific semantic              of our proposed method.
functions of keywords in different contexts,
leading to a lack of accuracy in the representation        2. Method
of knowledge structures [16]. Semantic network
analysis incorporates the rich semantic
information of keywords into network analysis,                The framework of identifying scientific
                                                           problems and solutions is shown in Figure 1.
providing a more intensive and accurate analysis
for the mining of scientific problems and
solutions [17,18].




Figure 1: Framework of identifying scientific problems and solutions



                                                      61
2.1. Entity      identification   of                        scientific texts into feature vector matrices. The
                                                            Bidirectional Encoder Representation from
scientific problems and solutions                           Transformers (BERT), a deep learning
2.1.1. Entity concept construction                          technology based on the bidirectional
     based on Structural Topic Model                        transformer architecture [23], can capture
                                                            contextual semantic information and latent
    The paper data gathered is acquired from the            relationships from large-scale corpora, achieving
Web of Science (WoS) and pre-processed via                  more precise textual semantic representations
VantagePoint (VP) [19].                                     [24]. BERT can process vast amounts of textual
    Then, four abstract entity concepts are                 corpus data with a need for minimal training
constructed based on the concept of Structural              datasets. Combined with the Conditional
Topic Model (STM), which includes "research                 Random Field (CRF) model, it can effectively
object", "problem", "solution" and "fundamental             improve the efficiency and quality of text
principle". These entities serve as a structured            sequence labelling [25].
representation of knowledge, characterizing                     First, the BERT model is trained on each of
scientific problems and solutions. The STM, an              the four types of entities by using a pre-training
advancement over the Latent Dirichlet                       dataset and setting the model parameters. To
Allocation (LDA) topic model, extracts topics               acquire enhanced vector representations that
from document-level metadata and establishes                include contextual positional information, this
latent connections between these topics and the             paper integrates the CRF model [26] with BERT
document data [20]. This approach facilitates the           based on the embedding during the training
discovery of hidden knowledge structures within             process, further training and optimizing the
texts and the accurate delineation of implicit              configuration of feature function. Moreover, the
relationships among them [21]. Therefore, this              multi-head self-attention mechanism of BERT is
study employs the STM to construct entity                   applied to better capture contextual semantic
concepts.                                                   information [27], obtaining enhanced textual
    Within this knowledge structure, "research              semantic representation vectors. When the model
object" refers to the research subfields, serving as        converges, the well-trained BERT-CRF model is
the starting point of the research; "problem"               generated.
focuses on the scientific issues to be resolved and             Then, the well-trained model is used to
the goals to be achieved, jointly defining the              transform the dataset into vector representations
problems space with the research object;                    with enhanced contextual positional information
"solution" describes the overall solution to the            and multiple semantic information.
problem, representing the key steps towards
achieving the goals; "fundamental principle"
refers to the theoretical foundation underlying the
                                                            2.1.3. Entity extraction
solution methods. The four-entity concept
constructed provides a comprehensive analytical                The purpose of this section is to extract
framework reflecting the essence of research                scientific problem and solution entities by
literature. By capturing and mining these four              transforming textual semantic vectors into
key entities, this study can excavate the scientific        probabilistic representations of text sequence
problems and solutions within the paper data,               labeling using BERT-CRF model and BIO
unveiling research hotspots in the domain.                  tagging.
    Finally, the four types of entities in a small             First, the well-trained BERT-CRF model is
number of literatures are manually identified and           applied to process the text vectors using the
a pre-trained dataset is generated based on the             SoftMax function [28], generating predicted
BIO tagging [22].                                           labels corresponding to the text sequence.
                                                            Assuming the sentence length is 𝑛, and the input
                                                            text sequence is represented as 𝑋 =
2.1.2. Text vector acquisition based                        (π‘₯1 , π‘₯2 , β‹― , π‘₯𝑛 ) , the corresponding predicted
on BERT-CRF model                                           label sequence is represented as π‘Œ =
                                                            (𝑦1 , 𝑦2 , β‹― , 𝑦𝑛 ). The method for calculating the
  This section aims to construct an enhanced                final prediction score π‘ π‘π‘œπ‘Ÿπ‘’(𝑋, π‘Œ) for the text
semantic BERT-CRF model, transforming                       sequence 𝑋 is:




                                                       62
                    𝑛           𝑛
                                                            2.2.2. Constructing       knowledge
  π‘ π‘π‘œπ‘Ÿπ‘’(𝑋, π‘Œ) = βˆ‘ 𝑃π‘₯𝑖,𝑦𝑖 + βˆ‘ π‘‡π‘¦π‘–βˆ’1 ,𝑦𝑖        (1)
                   𝑖=1         𝑖=2
                                                                 network integrating semantic
where 𝑃π‘₯𝑖,𝑦𝑖 represents the probability of the text              and co-occurrence relations
sequence element π‘₯𝑖 being predicted as 𝑦𝑖 and
π‘‡π‘¦π‘–βˆ’1 ,𝑦𝑖 is the score for the transition from label            The purpose of this section is to construct a
π‘¦π‘–βˆ’1 to label 𝑦𝑖 .                                          heterogeneous knowledge network including
   Thus, the probability distribution matrix                four types of entities, integrating semantic and
corresponding to the text sequences is obtained.            co-occurrence information among entities, and
   Then, this paper integrates BIO tagging [22]             improving the accuracy of relationship
into the CRF layer to generate four sequence                identification between entities.
labeling matrices of "research object", "problem",              First, the cosine distance between entity
"solution" and "fundamental principle" based on             vectors is used to measure the semantic similarity
the probability distribution matrix. Finally, the           between entities. The calculation method of the
entity categories corresponding to the text                 semantic similarity π‘ π‘–π‘š(π‘Ž, 𝑏) between entity a
sequences are identified based on the sequence              and b is:
annotation results.                                                                 π‘ˆ(π‘Ž)𝑇 π‘ˆ(𝑏)           (2)
                                                                 π‘ π‘–π‘š(π‘Ž, 𝑏) =
                                                                               ||π‘ˆ(π‘Ž)||2 βˆ™ ||π‘ˆ(𝑏)||2
                                                            where π‘ˆ(π‘Ž) and π‘ˆ(𝑏) denote the textual vectors
2.2. Knowledge                         network              of entities a and b respectively.
construction                                                    Then the co-occurrence relation between
                                                            entities is obtained based on the literature data.
                                                            The co-occurrence association between entities a
   After identifying the four types of entities             and b is denoted as π‘π‘œ_𝑒𝑛𝑑𝑖𝑑𝑦(π‘Ž, 𝑏) represented
corresponding to scientific problems and                    by the number of co-occurrences between a and
solutions, a knowledge network containing                   b.
multiple semantic and structural information                    Finally the entropy weight method [30] is
between entities is constructed. This part                  introduced to integrate the semantic information
includes two sections: 1) Entity alignment based            and co-occurrence information between entities
on Levenshtein algorithm and 2) Constructing                and get the relation between entities in the
knowledge network integrating multiple relations.           knowledge network. Link weight π‘€π‘’π‘–π‘”β„Žπ‘‘(π‘Ž, 𝑏)
                                                            between entities a and b can be calculated as:
2.2.1. Entity alignment based on                                  π‘€π‘’π‘–π‘”β„Žπ‘‘(π‘Ž, 𝑏) = 𝛼 βˆ— π‘ π‘–π‘š(π‘Ž, 𝑏) +         (3)
     Levenshtein algorithm                                               𝛽 βˆ— π‘π‘œ_𝑒𝑛𝑑𝑖𝑑𝑦(π‘Ž, 𝑏)
                                                            where 𝛼 and 𝛽 are coefficients of semantic
   Considering that entities extracted from                 similarity and co-occurrence correlation obtained
                                                            by entropy weight method respectively, and
different literature may have multiple names
for the same entity, we apply the Levenshtein               𝛼+𝛽=1.
method for measuring the difference between                     In this section we generate a knowledge
two sequences, to disambiguate. Levenshtein                 network 𝐺 = (𝑉, 𝐸, π‘Š) containing rich semantic
algorithm can consider both the contextual                  and structural information among entities where
information and semantic similarity, enhancing              𝑉 𝐸 and π‘Š denote the entities edges and edge
the accuracy of entity alignment [29].                      weights in 𝐺 respectively.
   Furthermore, this paper constructs an entity
dictionary based on expert knowledge, which                 2.3. Scientific problems-solutions
is used for further checking and proofreading               correlation analysis
of entity alignment results.
                                                                This part aims to identify the primary research
                                                            problems corresponding to the core research
                                                            objects and find the main solutions and
                                                            theoretical basis based on the topological
                                                            structure analysis of knowledge networks.




                                                       63
2.3.1. Core       research     objects                        Similarly this paper identifies the main
                                                           solutions corresponding to the primary problem
     identification      based     on                      𝑃𝑖 and the main fundamental principle for the
     PageRank algorithm                                    corresponding solutions.
                                                              Finally we generate a series of complete
   In this section PageRank algorithm is used to           chains of scientific problems and solutions
measure the importance score of research objects           which can be represented as "research object -
and thus identify core research objects in the             problem - solution - fundamental principle".
knowledge network. This algorithm fully
considers multiple factors including the local             3. Case study
topological structure and semantic information of
the target node and the importance of the nodes               Artificial     Intelligence  (AI)     is     a
connected with it [31] which has been widely               multidisciplinary domain composed of a diverse
applied to identify core nodes in various complex          and heterogeneous network of innovations. It has
knowledge networks [32]. Therefore we use                  emerged as a significant force driving
PageRank algorithm to rank the importance of               technological innovation [33]. This field
research objects in knowledge network. The                 encompasses many emerging research questions
calculation method of the importance score                 and research methods offering extensive data
𝑃𝑅(π‘Ž) of research object a can be calculated as:           support for empirical analysis. Therefore this
                        𝑃𝑅(𝑇𝑗 )              (4)
   𝑃𝑅(π‘Ž) = 𝑑 Γ— βˆ‘π‘›   𝑗=1 𝐢(𝑇 )   + (1 βˆ’ 𝑑)                  paper analyzed the scientific problems and
                           𝑗
    where d is the damping factor (0 ≀ 𝑑 ≀ 1),             solutions in-depth in the AI domain to verify the
                                                           effectiveness of the proposed method.
generally 0.85, 𝑇𝑗 denotes the entity linked to the
research object π‘Ž, 𝐢(𝑇𝑗 ) is the number of entities
linked with 𝑇𝑗 , and 𝑛 is the number of entities           3.1. Entity identification based on
linked with research object π‘Ž.                             BERT-CRF model
    Finally we sort the research objects in the
knowledge network based on the importance                      Following the study of Liu et al. [34] a total
score and select the top-K research objects as the         of 375608 papers published between 2021 to
core research objects. The core research objects           2023 were retrieved from the Web of Science
set 𝑂 is represented as:                                   (WoS). Then VantagePoint (VP) was used to
           𝑂 = {𝑂1 , β‹― , 𝑂𝑖 , β‹― , 𝑂𝐾 }        (5)          process titles and abstracts of papers. Finally a
where 𝑂𝑖 denotes the i-th core research object in          total of 310456 papers were retained as the
the knowledge network and K is the number of               textual corpus and a total of 3000 papers were
core research objects that has been identified.            randomly selected in proportion to the
                                                           publication year as the pre-training dataset.
2.3.2. Knowledge      structure-based                      Based on the BIO tagging the titles and abstracts
                                                           of the pre-training dataset were annotated with
     entity correlation analysis                           "research object", "problem", "solution", and
                                                           "fundamental principle".
    After identifying the core research objects                Following the design in Section 2.1.2 this
within the knowledge network this section will             study constructed an enhanced semantic BERT-
deeply analyze the correlation between entities in         CRF model based on the textual dataset to
the domain based on the topological structure              transform text data into feature vectors. During
analysis of the knowledge network.                         the experimental process the performance of the
    First based on the link weights between the            model was assessed based on evaluation metrics
core research object 𝑂𝑖 and the research                   (Precision Recall and F1-score) [35] with
questions the primary problems corresponding               model parameters being continuously adjusted.
to 𝑂𝑖 are identified. The primary problems set 𝑃           The optimal model was determined when the
is represented as:                                         evaluation metrics reached their maximum
           𝑃 = {𝑃1 , β‹― , 𝑃𝑖 , β‹― , π‘ƒπ‘š }       (6)           values. Finally when the Precision of the model
where 𝑃𝑖 denotes the i-th primary problem                  reaches 89.2% the Recall reaches 87.4% and the
corresponding of 𝑂𝑖 and m is the number of                 F1-score reaches 88.3% the optimal model was
primary problems.



                                                      64
generated. The results show that the trained              comprehensively considered the semantic
BERT-CRF model exhibits better performance.               similarity between entities and expert knowledge
   Finally the trained BERT-CRF model and                 to achieve entity synonym alignment. The
BIO tagging method were used to identify four             Levenshtein algorithm is employed for aligning
types of entities. We identified 24 254 "research         entities within the same category and across
object" entities 23 839 "problem" entities                different categories. Finally a total of 887
20 670 "solution" entities          and 17 550            "research object" entities 4 136 "problem"
"fundamental principle" entities from the dataset.        entities 13 858 "solution" entities and 5 518
                                                          "fundamental principle" entities were obtained.
3.2. Constructing                    knowledge                Then we integrated semantic similarity and
                                                          structural similarity between entities to build a
network in AI                                             knowledge network in AI domain. The statistical
                                                          information on the edges between each type of
   After identifying the four types of entities           entity in the knowledge network is shown in
corresponding to scientific problems and                  Table 1.
solutions from the text dataset this paper

Table 1
Descriptive statistics of edges in knowledge network
                             Research                                              Fundamental
                                             Problem              Solution
                              object                                                 principle
            Research
                                 /            12683                 9105               8468
             object
            Problem           12683             /                  17691              10565
            Solution          9105            17691                  /                13783
          Fundamental
                               8468           10565                13783                 /
            principle

                                                              According to the results of the PageRank
3.3.    Entity correlation analysis                       algorithm, the hot research objects in artificial
                                                          intelligence domain mainly include deep learning,
                                                          neural network, medical image, facial image,
   The next stage was to analyze the correlation          robot, and electric system. Following the design
between entities. Following Section 2.3.1, the            in Section 2.3.2, the top-2 problems were
PageRank algorithm was applied to calculate the
                                                          identified corresponding to the core research
important scores of research objects and thus             objects within the knowledge network.
identify the core research objects in the                     Finally, we explored the correlations between
knowledge network. The hot research topics in             entities and generated a series of complete chains
the artificial intelligence domain can be explored        including four types of entities. The partial entity
according to the core research objects.                   correlation results of Top-6 core research objects
                                                          are shown in Figure 2.




                                                     65
Figure 2: Entity correlation results


    Several observations can be acquired based on          artificial intelligence. On the other hand, it is able
the above results. The research object represents          to detect the corresponding solutions to the real
a subfield, where the problems refer to the issues         problems in the scientific and technological
contained within that subfield, the solution refers        practice and explore the theoretical basis behind
to the methods or technologies required to solve           them, and thus realize the in-depth excavation of
the problems, and the fundamental principles               the intrinsic logical connection among scientific
refer to the inherent principles involved in the           problems, solutions and fundamental principles.
implementation process of the methods and
technologies. The "research object" and                    3.4.     Validation
"problem" together constitute the complete
scientific problem, and the "solution" and
                                                              In this part, we conducted the quantitative and
"fundamental principle" together constitute the
                                                           qualitative methods to verify the reliability of our
complete solution. For example, for the identified
"classification - image classification - neural            proposed method and entity identification results.
network – feature extraction", it refers that the
neural network can be used to solve image                  3.4.1. Verification of the trained
classification   problems      through     feature              model
extraction [36].
    This paper identifies a complete chain of
                                                              To quantitatively verify the advantages of the
"research object - problem - solution -
                                                           combination of BERT-CRF model trained in this
fundamental principle". On the one hand, it can
                                                           paper and the BIO tagging method, we select
identify the core research objects and
                                                           three advanced models, ALBERT [37], SciBERT
corresponding primary problems in the field of
                                                           [38], and XLNet [39], for comparison



                                                      66
experiments. Referencing the model parameters                 optimal effect. The specific parameter settings of
of BERT-CRF in this paper, the three models                   the models are shown in Table 2.
were fine-tuned respectively to achieve the

Table 2
Parameter configurations of models
                                    Our
                                                      ALBERT             SciBERT              XLNet
                                   method
           Maximum input
                                        64              64                  64                 64
               length
           Training epoch               30              40                  30                 35
               Batch size               4               16                    4                 8
           Number of layers             12              12                  12                 12
             Learning rate           1e-5              5e-6                1e-5               1e-6
           CRF learning rate
                                        100             /                   50                  /
              multiplier

    Then, the performance of our method was                   comparing with three state-of-the-art methods.
validated based on Recall and Precision by                    The comparison results are given in Table 3.

Table 3
The comparison of prediction performance
                                                                                                    Fundamental
                Research objects                 Problems                  Solutions
 Methods                                                                                              principles
               Recall       Precision        Recall    Precision      Recall      Precision   Recall      Precision
  Our
                0.980        0.965           0.964      0.942         0.856        0.877       0.724       0.759
 method
 ALBERT         0.934        0.936           0.924      0.896         0.848        0.815       0.638       0.702
 SciBERT        0.928        0.955           0.906      0.883         0.834        0.827       0.704       0.740
  XLNet         0.962        0.919           0.944      0.907         0.838        0.859       0.680       0.741

   It can be seen that our method outperforms                 respectively and the Precision value increases by
baseline methods in two evaluation indicators.                5.7% 1.9% and 1.8% respectively. These results
Concretely in the entity recognition of the                   demonstrate the combination of BERT-CRF
research objects the Recall value of our method               model and BIO tagging used in this paper has
increases by 4.6% 5.2% and 1.8% respectively                  achieved good performance on our dataset.
and the Precision value increases by 2.9% 1.0%
and 4.6% respectively. In the problems                        3.4.2. Verification              of          entity
identification the Recall value of our method
increases by 4.0% 5.8% and 2.0% respectively                       identification
and the Precision value increases by 4.6% 5.9%
and 3.5% respectively. In the solutions                          In this section the qualitative method was
identification the Recall value of our method                 applied to verify the reliability of the entity
increases by 0.8% 2.2% and 1.8% respectively                  identification results by searching relevant articles
and the Precision value increases by 6.2% 5.0%                published in 2021 and beyond. Table 4 shows the
and 1.8% respectively. In the entity identification           detailed empirical evidence of partial entity
of fundamental principles the Recall value of our             identification results.
method increases by 8.6% 2.0% and 4.4%



                                                         67
Table 4
Relevant documentary proof of partial entity identification results
      Research object - problem -
 No solution         -    fundamental Relevant documentary proof
      principle
      classification      -      image In 2021, Nadendla et al. proposed a neural network-based
 1    classification - neural network – classifier by feature extraction and classification to solve the
      feature extraction                image classification problem [36].
      clustering - deep clustering
                                        In 2022, Hou et al. used a dual convolutional autoencoder to
      performance           -      dual
 2                                      extract features of multi-levels and fuse them to improve
      convolutional autoencoder -
                                        the performance of deep clustering [40].
      multi-level feature fusion
      medical image - medical image
                                        In 2024, Tamilmani et al. used the convolutional neural
      segmentation - convolutional
 3                                      network with the optimal network topology to solve the
      neural network - optimal
                                        problem of medical image segmentation [41].
      network topology
      facial image - face recognition - In 2022, Wei-Jie et al. applied the convolutional neural
 4    convolutional neural network - network by extracting the masked face features to solve the
      feature extraction                problem of masked facial recognition [42].
      robot - local motion planning - In 2023, Garrote et al. proposed a deep reinforcement
 5    deep reinforcement learning - learning strategy based on a reward model to solve the local
      reward model                      motion planning problem of robots [43].
      electrical power system -
                                        In 2022, Gu et al. used the graph neural network based on
      stability assessment - graph
 6                                      the self-attention mechanism to evaluate the stability of the
      neural network - self-attention
                                        power system [44].
      mechanism

   Table 4 demonstrates the alignment between                explore the linkages among research objects
our entity identification results and the literature.        problems solutions and fundamental principles.
Therefore the four types of entities identified and              Semantic network analytics and deep learning
the relations between entities in this paper are             were combined to identify scientific problems
reliable and the effectiveness of the proposed               and solutions from scientific text which provides
method has been further verified.                            technical intelligence for field scientific
                                                             innovation and industrial technology upgrading.
4. Conclusion                                                In addition this method can not only explore the
                                                             association between entities reveal the primary
                                                             research problems and corresponding solutions
    In this paper we proposed a novel                        in the field of artificial intelligence but also
methodology to identify scientific problems and              discover the knowledge structure in this field and
solutions using semantic network analytics and               promote the development of scientific
deep learning. First the deep learning method is             knowledge network analysis methods.
applied to extract textual semantic information                 Several limitations of our proposed method
and identify entities capturing the hidden                   require further improvement: 1) The scientific
semantic association in different textual contexts           and technological output of a certain field
effectively and improving the accuracy of the                includes not only papers but also patents and
entity recognition. Then the machine learning                product data. Further research should be
method was used to construct the knowledge                   conducted based on more data sources; 2) The
network fully considering the knowledge                      methodology of entity alignment can be further
structure and semantic structure between entities            optimized. More advanced methods and
and thus containing more abundant information.               professional expert knowledge could be
Finally the PageRank algorithm and semantic                  introduced in the future to improve the efficiency
network analytics were introduced to deeply                  and quality of entity alignment; 3) The evolution
                                                             mechanism of "research object - problem -



                                                        68
solution - fundamental principle" needs to be             [9] R. Xiang, E. Chersoni, Q. Lu, et al, Lexical
further explored.                                              data augmentation for sentiment analysis.
                                                               Journal of the Association for Information
5. Acknowledgements                                            Science and Technology, 72(11): 1432-
                                                               1447, 2021.
                                                          [10] X. Chen, P. Ye, L. Huang, et al, Exploring
   This work was supported by the National                     science-technology linkages: A deep
Nature Science Foundation of China Funds
                                                               learning-empowered solution. Information
(Grant No. 72274013), and Fundamental
                                                               Processing & Management, 60(2): 103255,
Research Funds for the Central Universities.                   2023.
                                                          [11] J. Chen, Y. Chen, Y. He, et al, A classified
6. References                                                  feature representation three-way decision
                                                               model for sentiment analysis. Applied
[1] Y. Zhang, M. Wang, M. Saberi, et al, From                  Intelligence, 52(7): 7995–8007, 2022.
    big scholarly data to solution-oriented               [12] X. Xi, F. Ren, L. Yu, et al, Detecting the
    knowledge repository. Frontiers in Big Data,               technology's evolutionary pathway using
    2: 38, 2019.                                               HiDS-trait-driven tech mining strategy.
[2] P. Li, W. Lu, Q. Cheng, Generating a related               Technological Forecasting and Social
    work section for scientific papers: an                     Change, 195: 122777, 2023.
    optimized approach with adopting problem              [13] J. Wang, Q. Cheng, W. Lu, et al, A term
    and method information. Scientometrics,                    function–aware keyword citation network
    127(8): 4397-4417, 2022.                                   method for science mapping analysis.
[3] Z. Luo, W. Lu, J. He, et al, Combination of                Information Processing & Management,
    research questions and methods: A new                      60(4): 103405, 2023.
    measurement of scientific novelty. Journal            [14] G. Garechana, R. RΓ­o-Belver, E. Zarrabeitia,
    of Informetrics, 16(2): 101282, 2022.                      et al, TeknoAssistant: a domain specific tech
[4] G. Chen, J. Peng, T. Xu, et al, Extracting                 mining approach for technical problem-
    entity relations for β€œproblem-solving”                     solving support. Scientometrics, 127(9):
    knowledge graph of scientific domains                      5459-5473, 2022.
    using word analogy. Aslib Journal of                  [15] X. Zhang, Q. Xie, C. Song, et al, Mining the
    Information Management, 75(3): 481-499,                    evolutionary process of knowledge through
    2023.                                                      multiple relationships between keywords.
[5] V. Giordano, G. Puccetti, F. Chiarello, et al,             Scientometrics, 127(4): 2023-2053, 2022.
    Unveiling the inventive process from                  [16] X. Cao, X. Chen, L. Huang, et al, Detecting
    patents by extracting problems, solutions                  technological recombination using semantic
    and advantages with natural language                       analysis and dynamic network analysis.
    processing.     Expert     Systems      with               Scientometrics, Doi: 10.1007/s11192-023-
    Applications, 229: 120499, 2023.                           04812-4, 2023.
[6] R. B. Mishra, H. Jiang, Classification of             [17] J. Liu, Z. Zhou, M. Gao, et al, Aspect
    problem and solution strings in scientific                 sentiment mining of short bullet screen
    texts: evaluation of the effectiveness of                  comments from online TV series. Journal of
    machine learning classifiers and deep neural               the Association for Information Science and
    networks. Applied Sciences, 11(21): 9997,                  Technology, 74(8): 1026-1045, 2023.
    2021.                                                 [18] J. Won, D. Lee, J. Lee, Understanding
[7] H. Liu, T. Brailsford, J. Goulding, et al,                 experiences of food-delivery-platform
    Towards idea mining: problem-solution                      workers under algorithmic management
    phrase extraction from text. International                 using topic modeling. Technological
    Conference on Advanced Data Mining and                     Forecasting and Social Change, 190:
    Applications (pp. 3-14). 2022.                             122369, 2023.
[8] Y. Zhang, J. Lu, F. Liu, et al, Does deep             [19] L. Huang, X. Chen, Y. Zhang, et al,
    learning help topic extraction? A kernel k-                Identification of topic evolution: network
    means clustering method with word                          analytics      with      piecewise     linear
    embedding. Journal of Informetrics, 12(4):                 representation and word embedding.
    1099–1117, 2018.                                           Scientometrics, 127(9): 5353-5383, 2022.




                                                     69
[20] S. Bai, D. Yu, C. Han, et al, Enablers or                advanced entity recognition. Applied
     inhibitors? Unpacking the emotional power                Sciences, 2023, 13(19): 10918, 2023.
     behind in-vehicle AI anthropomorphic                [30] B. Jiang, W. Tang, M. Li, et al, Assessing
     interaction: A dual-factor approach by text              land resource carrying capacity in China’s
     mining. IEEE Transactions on Engineering                 main grain-producing areas: Spatial–
     Management,                             Doi:             temporal evolution, coupling coordination,
     10.1109/TEM.2023.3327500, 2023.                          and obstacle factors. Sustainability, 15(24):
[21] Z. Zhang, H. Mu, S. Huang, Playing to save               16699, 2023.
     sisters: how female gaming communities              [31] P. Marjai, A. Kiss, Influential Performance
     foster social support within different                   of Nodes Identified by Relative Entropy in
     cultural contexts. Journal of Broadcasting &             Dynamic Networks. Vietnam Journal of
     Electronic Media, 67(5): 693-713, 2023.                  Computer Science, 8(1): 93-112, 2021.
[22] J. Wei, T. Hu, J. Dai, et al, Research on           [32] T. Liang, C. Li, H. Li, Top-k Learning
     named entity recognition of adverse drug                 Resource Matching Recommendation
     reactions based on NLP and deep learning.                Based on Content Filtering PageRank.
     Frontiers in Pharmacology, 14: 1121796,                  Computer Engineering, 43(2): 220-226,
     2023.                                                    2017.
[23] Z. Xue, G. He, J. Liu, et al, Re-examining          [33] K. Song, A selection method for industry-
     lexical and semantic attention: Dual-view                university cooperation from the perspective
     graph convolutions enhanced BERT for                     of patentometrics. Library Tribune, 41(11):
     academic paper rating. Information                       19-27, 2021.
     Processing & Management, 60(2): 103216,             [34] N. Liu, P. Shapira, X. Yue, Tracking
     2023.                                                    developments in artificial intelligence
[24] X. Zhu, Z. Kuang, L. Zhang, A prompt                     research: constructing and applying a new
     model with combined semantic refinement                  search strategy. Scientometrics, 126(4):
     for aspect sentiment analysis. Information               3153-3192, 2021.
     Processing & Management, 60(5): 103462,             [35] L. LΓΌ, T. Zhou, Link prediction in complex
     2023.                                                    networks: A survey. Physica A: statistical
[25] C. Zhang, Improved word segmentation                     mechanics and its applications, 390(6):
     system for Chinese criminal judgment                     1150-1170, 2011.
     documents. Applied Artificial Intelligence,         [36] H. R. Nadendla, A. Srikrishna, K. G. Rao,
     38(1): 2297524, 2024.                                    Rider and Sunflower optimization-driven
[26] K. Gupta, A. Ahmad, T. Ghosal, et al, A                  neural network for image classification.
     BERT-based sequential deep neural                        Web Intelligence. IOS Press, 19(1-2): 41-61,
     architecture to identify contribution                    2021.
     statements and extract phrases for triplets         [37] J. Li, Q. Huang, S. Ren, et al, A novel
     from scientific publications. International              medical text classification model with
     Journal on Digital Libraries, Doi:                       Kalman filter for clinical decision making.
     10.1007/s00799-023-00393-y, 2024.                        Biomedical Signal Processing and Control,
[27] Z. Wang, X. Xu, X. Song, et al,                          82: 104503, 2023.
     Multigranularity pruning model for subject          [38] S. Shen, J. Liu, L. Lin, et al, SciBERT: A
     recognition task under knowledge base                    pre-trained language model for social
     question answering when general models                   science texts. Scientometrics, 128(2): 1241-
     fail. International Journal of Intelligent               1263, 2023.
     Systems, 2023: 1202315, 2023.                       [39] J. Sirrianni, E. Sezgin, D. Claman, et al,
[28] N. Xu, Y. Liang, C. Guo, et al, Entity                   Medical text prediction and suggestion
     recognition in the field of coal mine                    using generative pretrained transformer
     construction safety based on a pre-training              models with dental medical notes. Methods
     language model. Engineering, Construction                of Information in Medicine, 61(05/06): 195-
     and Architectural Management, Doi:                       200, 2022.
     10.1108/ECAM-05-2023-0512, 2023.                    [40] H. Hou, S. Ding, X. Xu. A deep clustering
[29] M. Mansurova, V. Barakhnin, A. Ospan, et                 by multi-level feature fusion. International
     al, Ontology-driven semantic analysis of                 Journal of Machine Learning and
     tabular data: an iterative approach with                 Cybernetics, 13(10): 2813-2823, 2022.




                                                    70
[41] G. Tamilmani, C. H. Phaneendra Varma, V.
     Brindha Devi, et al, Medical image
     segmentation using grey wolf based U-Net
     with bi-directional convolutional LSTM.
     International Journal of Pattern Recognition
     and      Artificial    Intelligence,    Doi:
     10.1142/S0218001423540253, 2023.
[42] L. C. Wei-Jie, S. C, Chong, T. S. Ong,
     Masked face recognition with principal
     random forest convolutional neural network
     (PRFCNN). Journal of Intelligent & Fuzzy
     Systems, 43(6): 8371-8383, 2022.
[43] L. Garrote, J. Perdiz, U. J. Nunes, Costmap-
     based local motion planning using deep
     reinforcement learning. In 2023 32nd IEEE
     International Conference on Robot and
     Human Interactive Communication (RO-
     MAN). IEEE (pp. 1089-1095). 2023.
[44] S. Gu, J. Qiao, Z. Zhao, et al, Power system
     transient stability assessment based on
     graph neural network with interpretable
     attribution analysis. In 2022 4th
     International Conference on Smart Power &
     Internet Energy Systems (SPIES). IEEE (pp.
     1374-1379). 2022.




                                                    71