=Paper= {{Paper |id=Vol-3304/paper03 |storemode=property |title=Academic Paper Knowledge Graph, the Construction and Application |pdfUrl=https://ceur-ws.org/Vol-3304/paper03.pdf |volume=Vol-3304 |authors=Xinyu Du,Ning Li }} ==Academic Paper Knowledge Graph, the Construction and Application== https://ceur-ws.org/Vol-3304/paper03.pdf
Academic Paper Knowledge Graph, the Construction and
Application
Xinyu Du and Ning Li
Beijing Information Science & Technology University, Beijing, China

                 Abstract
                 Academic papers in the form of documents are still the primary carrier of academic
                 publications. Nevertheless, it is difficult for such documents to express the papers’ semantic
                 elements and discourse structures directly. Hence, this paper focuses on knowledge units with
                 semantic information for papers to construct a knowledge graph, affording quickly retrieving
                 knowledge from academic papers. Based on the in-depth analysis of the general narrative
                 regulations of academic papers, we develop an academic paper representation ontology PEO
                 that includes 29 classes, 18 relations, and five attributes. The experiment demonstrates that the
                 developed ontology has a strong ability to represent knowledge of academic papers.
                 Additionally, this paper preliminarily constructs the knowledge graph PKG of academic papers
                 based on PEO ontology, demonstrating its role in semantic retrieval and intelligent question
                 answering. Overall, this study enriches the academic knowledge’s expression ability and helps
                 better explore the value of academic papers.

                 Keywords
                 academic papers, ontology, semantic description, knowledge representation, knowledge graph

1. Introduction 1

    In recent years, knowledge graphs, as a form of structured human knowledge, have attracted
significant research attention in academia and industry and have been widely used in AI tasks such as
natural language understanding, question answering, and recommendation systems [1]. With the digital
transformation of academic work, applying knowledge graphs in knowledge representation, knowledge
mining, knowledge retrieval, and other aspects of the academic literature has become a research hotspot.
However, most of the early research was limited to constructing knowledge graphs for the external
features of academic papers (e.g., title, author, institution, keywords, issues, and publisher), phrases,
key terms, and other knowledge content [2-5]. Recently, some scholars have constructed knowledge
graphs for the semantic knowledge of academic papers (e.g., background, methods, results, and
conclusions), but the semantic knowledge is incomplete, as it does not realize complex semantic
retrieval and question answering [6-9]. For example, “Is there any literature mentioning that a certain
method is used to solve a problem?”, “For a certain goal, what methods have been proposed in the
existing research and how effective?”, “What is the best experimental result of a method?”. However,
under the massive literature resources, current knowledge service platforms, e.g., HowNet, Wanfang,
and Baidu Academic, provide literature retrieval methods only from the perspective of article title,
subject, author, unit, keywords, abstracts, references, Chinese library classification number, and
literature sources. Therefore, the retrieval results often provide the whole literature or text, still requiring
manual screening by searchers and then carefully reading the screened documents. This strategy does
not meet the scientific researchers’ needs to acquire knowledge and information accurately and
efficiently. Thus, to realize the above-mentioned intelligent question answering and retrieval, we must
build a specific knowledge base that contains the semantic knowledge in academic papers, such as
questions, methods, results, and conclusions. However, the authors of academic manuscripts typically


ICBASE2022@3rd International Conference on Big Data & Artificial Intelligence & Software Engineering, October 21-
23, 2022, Guangzhou, China
              © 2022 Copyright for this paper by its authors.
              Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
              CEUR Workshop Proceedings (CEUR-WS.org)



                                                                                  15
linearly express in natural language, and directly obtaining the papers’ semantic knowledge is
challenging. Therefore, this study investigates how to define a suitable ontology for the knowledge
representation of academic papers and how to construct the knowledge graph of papers based on the
ontology.
    As a knowledge representation method, ontology can also be employed as the skeleton and
foundation of a knowledge base, describing text information from the semantics and knowledge aspect.
Ontology is widely used in knowledge representation of academic literature, knowledge management
semantic retrieval, scientific argument analysis, and other applications [10]. According to the principles
of knowledge units, ontology, and knowledge graph, this paper combines subdividing the paper’s
content, associating knowledge units, and forming a knowledge network through analyzing the content
of the academic papers, determining the knowledge types, and defining the ontology concepts. Then
the concepts and relationships in the ontology are used to describe the knowledge units contained in the
academic papers and the relationships between them. Finally, a structured semantic knowledge base
(namely a knowledge graph) is constructed based on ontology to achieve semantic retrieval and
intelligent question answering for academic papers.

2. Related Work
2.1. Research Status of Academic Knowledge Graph

    Currently, constructing academic knowledge graphs is a hot research topic that has recently been
included in the guideline of national key R&D projects. In 2017, Tsinghua University and Microsoft
Research [2] jointly released the Open Academic Graph (OAG), which combines metadata from 155
million academic papers in the ArnetMiner Academic Graph [3] and 160 million papers in the Microsoft
Academic Graph (MAG) [4]. The employed data types include the paper’s title, author, conference,
year, and abstract. Subsequently, the OAG 2.0 version released in 2019 added three types of entities:
papers, authors, and publication locations and their corresponding matching relationships. OAG
integrates a large amount of paper metadata information, provides intelligent services through data
sharing, and promotes the development of academic knowledge graphs. Bratsas et al. [5] constructed a
scientific knowledge graph by semantically annotating and linking academic research fields, including
all research fields in each scientific field in a standard hierarchy. The above research is of great value
in improving the literature’s retrieval efficiency. In recent years, knowledge graph research for
academic papers has shifted from paper metadata to deep semantic knowledge in papers. For instance,
Auer et al. [6] proposed the Open Research Knowledge Graph (ORKG), which describes research
contributions traditionally described in scientific articles in a structured and semantic manner. Articles
are added to ORKG by retrieving (or manually adding) key metadata for articles from CrossRef via
DOI and then using dedicated input fields to describe the content of the research articles. The
description includes the research questions, the materials and methods used, and the results obtained so
that the research contribution is comparable to other articles addressing the same research question.
Fathalla et al. [7] proposed the SemSur ontology for describing the content of the literature review,
including four core concepts of research questions, methods, implementation, and evaluation. Then,
based on the ontology, a review knowledge graph is generated. Cao and Zhao [8] mined innovative
content by extracting innovative sentences in papers for entity recognition and building a knowledge
graph for innovative content in academic papers. Roa et al. [9] created a Deep Knowledge Graph (DKG)
repository for papers related to deep learning algorithms and methods to help improve the search and
retrieval of relevant information in the academic field.
    The above research shows that the existing research on academic knowledge graphs is limited to the
paper’s external features and bibliographic information. Only a few scholars researched the graph
construction for the intrinsic semantic knowledge of academic papers. However, the semantic
knowledge is not comprehensive enough to cope with the complicated semantic retrieval and question-
and-answer for academic papers.




                                                   16
2.2.    Research Status of Knowledge Representation in Academic Papers

    Many scholars analyze the semantic description of the literature content from different perspectives
and have proposed different ontologies and models. For example, Groza[11] proposed the SALT
framework (Semantically Annotated LaTex) for document semantic annotation. This framework
indexed early document rhetorical units, including document ontology, rhetorical ontology, and
annotation ontology. The rhetorical ontology is expanded based on the ABCDE model[12], including
abstract, motivation, scenario, contribution, evaluation, discussion, background, conclusion, and entity,
and also defines 11 rhetorical relations such as antithesis, circumstance, and concession. SALT has a
rough definition of component granularity, which can not describe in detail the content information of
each part of the academic paper, but its classification system and relationship definition provide a
reference for the related research. Liakata et al.[13] developed a core scientific concept (CoreSCs) that
reflected the structure and type of knowledge of scientific research. In 2011, W3C (World Wide Web
Consortium) [14] released the Ontology of Rhetorical Blocks (ORB), which creates a general coarse-
grained collection of rhetoric modules for scientific publications and provides fine-grained semantic
entry for document contents and forms. The Pattern Ontology (PO) constructed by Iorio et al. [15]
focused on the attribute description of structural components such as sentences, paragraphs, and
chapters. Ribaupierre et al. [16] proposed a user-centric scientific literature annotation model—
SciAnnotDoc. However, the academic papers’ semantic content description in these studies is not
detailed and comprehensive, and the granularity is relatively coarse. Therefore, most scholars further
built ontologies for the semantic description of academic paper content at a fine-grained level. Shotton
et al. [17] proposed the Discourse Elements Ontology (DEO), which draws on some of the rhetorical
structural elements of the rhetorical ontology in the SALT framework, defines components with
different rhetorical functions such as background, conclusion, and data, and provides a structured
vocabulary for the rhetorical elements in documents. DEO can describe the paper’s rhetorical units in
detail but does not define their relationship. The Document Components Ontology (DoCO) [18][19]
provides a structured vocabulary that defines document components such as title, abstract, chapter,
sentence, and paragraph. However, it only provides a fine-grained description of the dissertation
structure. Qin et al. [20] introduced a knowledge element ontology model for the knowledge
representation of scientific literature, which hierarchically represents the contents of papers and defines
the apposition and hierarchical relationships. This model describes the internal and external
characteristics of scientific literature in fine granularity, playing a significant role in the deep knowledge
service of scientific literature. Based on the work of Zhang et al. [22], Wang et al. [21] constructed the
Functional Units Ontology, FUO) of scientific papers, which included 12 first-level categories and 28
second-level categories. This ontology builds a fine-grained model of the organizational structure of
scientific papers from the perspective of semantic functions of content components. FUO describes the
content components of scientific papers in more detail and reveals better the semantic functions of
functional units of scientific papers, having a positive significance for the semantic description of
academic papers. However, the ontology does not consider the definition of the relationship between
the functional units, and thus it can not represent the logical relationship among various functions. Sun
et al. [23] constructed the semantic annotation ontology of academic literature based on inheriting the
existing annotation ontology (such as DEO, DoCO, C4O[24], FaBiO, and CiTO[25]). Although the
annotation ontology involves the types of academic documents, scientific discourses, structural
elements, and references, it cannot comprehensively and carefully describe the content semantics of
academic documents. Niu and Ou [26] suggested a semantic annotation framework when exploring the
semantic annotation model of scientific papers. This framework realizes the semantic annotation
function of the paper’s physical and argument structure, with the annotation ontology adopting ORB,
scientific experiment ontology (EXPO) [27], the micro-publication ontology [28], and the nano-
publication ontology [29]. Although it covers the paper’s physical and argument structures, this
framework lacks some basic semantic units, such as research background, research questions, and future
work.
    Additionally, some scholars proposed different models or ontologies from the perspective of
scientific argumentation to divide the article’s content. For example, Teufel [30] proposed an
Argumentative Zoning (AZ) model for analyzing the scientific papers’ argumentation and rhetorical


                                                     17
structure. Since the annotation experiments of the model are limited to computer linguistics, Teufel et
al. [31] extended and updated AZ and obtained the Argumentative Zoning II (AZ-II) model. Soldatova
et al. [27] proposed EXPO, while Vitali et al. [32] introduced an Argument Model Ontology (AMO)
based on the Toulmin Argument Model. Wang et al. [33] suggested the scientific paper argumentation
ontology SAO, which is used to reveal the important viewpoints, conclusions, and demonstration
processes of scientific papers. Qu and Ou [34] constructed a sentence-level and entity-level scientific
paper argument structure ontology. Scientific argumentation is a critical process in an academic paper,
where the argument model or argument ontology considers the necessary elements of scientific
argumentation. Although it is impossible to describe the article’s content comprehensively, it still has
good reference value for the semantic description of the academic papers’ content.
    Nevertheless, existing research on semantically describing the literature content has the following
deficiencies. 1) It is difficult to reveal the document’s semantic units in a detailed and comprehensive
manner by simply using rhetorical elements such as methods, results, and conclusions to describe the
document’s content in coarse-grained semantics. 2) Defining the relationship between semantic units
or relying on a trivial definition to reflect the logical relationship between the semantic units of
academic papers.
    Spurred by the above deficiencies, this paper develops an academic paper representation ontology
(PEO) based on the current results to express the semantic units in academic papers in a detailed and
comprehensive manner. Moreover, our model provides a basis for constructing academic papers
knowledge graphs and realizes academic Semantic retrieval of resources and intelligent question
answering.

3. Construction of Academic Paper Expression Ontology

    Based on the existing literature relevant to content representation ontology and modeling, this paper
determines the types of semantic units through semantic annotation and analysis appropriate for
academic paper content. Then, it draws on the argumentation relationship in argumentation
structure[33], the rhetorical relationship in rhetorical structure[35], the discourse relations in discourse
analysis [36], and the relations defined in existing ontologies, and finally determined 29 classes, 18
relations, and five attributes.

3.1.    Class Design in PEO

    This paper first refers to the FUO ontology, develops some coding nodes according to the classes
defined, and establishes an encoding system for semantic annotation and analysis of academic papers.
During annotation, the encoding nodes are continuously expanded and adjusted according to the
semantic content expressed in the academic paper, and the encoding system is updated. Finally, the
hierarchical conceptual classes of PEO are determined, including 17 first-level classes such as
background, research objectives, research significance, research content, methods, experiments, results,
and conclusions, and 29 second-level classes obtained further subdividing the first-level class (see Table
1).

3.2.    Property Design in PEO

    Table 1 determines the classes and their hierarchical relationships, but it is inadequate. In order to
fully express the paper’s semantic units and their logical relationships, it is necessary to describe further
the internal structure of these classes, where the structural information of these classes is the property
of the class. This article designs the external properties and internal properties of the class. Among them,
the former property is used to describe the relationship between the classes (semantic units in the paper),
and the latter is the attribute information that describes itself.




                                                     18
3.2.1. External Property Design

    In order to accurately describe the logical relationship between the above semantic units, this paper
uses rhetorical relationships, argument relationships, chapter relationships, and knowledge element
relations, plus custom relationships, to define a total of 18 logical relationships. They are the external
property set in the ontology of academic papers (see Table 2).

Table 1
Hierarchy concept class design of PEO.
 First-Level Class           Second-Level Class               Co-occurrence Framework
 Background                    Background                     SALT、CoreSCs、DEO、FUO
 Theme                         Theme                          SALT、DoCO、FUO
 Problem                       Problem                        DEO
 Research-Goal                 Research-Goal                  CoreSCs、FUO
 Research-Significance         Research-Significance          FUO
 Research-Content              Research-Content
 Theoretical-Basis             Theoretical-Basis
 Definition                    Definition                     SciAnnotDoc、FUO
 Examples                      Examples
 Data                          Data                           DEO
 Conclusion                    Conclusion                     SALT、CoreSCs、DEO、FUO
 Future-Work                   Future-Work                    DEO、FUO
 Related-Research              Existing-Research              SciAnnotDoc、FUO
                               Research-Value
                               Research-Gap
 Method                        Method-Paper
                               Method-Selection               FUO
                               Method-Description             SciAnnotDoc、CoreSCs、FUO、DEO
                               Method-Advantage
 Experiment                    Experiment-Environment
                               Experiment-Purpose
                               Experiment-Settings
                               Experiment-Content             CoreSCs
 Result                        Result-Description             CoreSCs、DEO、FUO
                               Result-Description             CoreSCs、DEO、FUO
                               Result-Metrics
                               Result-Evaluation              SALT、DEO、FUO
 Discussion                    Discussion-Recapitulation      DEO、FUO
                               Discussion-Limitation          FUO
                               Discussion-Contribution        DEO、FUO


3.2.2. Internal Property Design

   Classes in PEO have some basic internal properties, such as information description, the article it
belongs to, and the label information. In addition, in academic papers, authors cite and refer to the work
of others. Therefore, some classes, e.g., background, existing research, and the paper’s method, have

                                                    19
source information in the representation ontology of academic papers. Again, the author will hold a
particular attitude or point of view. Therefore, some classes in the representation ontology of academic
papers, such as research significance, research defects, results, and conclusions, often carry certain
emotional information. Therefore, this paper defines five internal properties, with the specific contents
listed in Table 3.

Table 2
External property set of PEO.
 Property Name             Explanation                           Refer(Source)
 condition                 A is the condition of B               RST
 background                A is the background of B              RST
 motivation                A is the motivation of B              customization
 leads_to                  A leads to B                          SAO、knowledge element ontology
 review                    A is the review of B                  customization
 introduces                A introduces B                        customization
 improves                  A improves B                          customization
 resolves                  A resolves B                          customization
 argues                    A argues B                            knowledge element ontology
 produces                  A produces B                          SAO
 supports                  A supports B                          SAO
 not support               A does not support B                  customization
 summary                   A is the summary of B                 RST
 purpose-behavior          Achieve A,B                           discourse relationship
 uses                      A uses B                              SAO
 basis                     A is the basis for B                  customization
 guides                    A guides B                            customization
 elaboration               A is a elaboration of B               RST

Table 3
Internal property set of PEO.
 Property Name              Property Value(Description)                         Refer(Source)
 Description                the content of the sentence(string)                 customization
 Article                    the title of the article(string)                    customization
                            background, problem, method,
 Label                                                                          customization
                            result, conclusion etc.
 Tendency                   positive, negative, neutral                          SAO、FUO
 Source                      other, own                                          SAO、FUO


4. Academic Paper Semantic Annotation Experiment

   To evaluate PEO, this study utilizes the Nvivo data analysis tool [37] that exploits “deductive”
coding. First, the encoding nodes are created according to the class in ontology, establishing the
encoding system. Then, the sample data is encoded using the system, and finally, the annotation results
are stored and analyzed. This study preprocesses the paper samples in PDF format and converts them
into DOCX format used by Microsoft Word before annotating to remove diagrams, formulas, English
abstracts, and references. The specific annotation process is illustrated in Figure 1.




                                                   20
4.1.    Selection of Annotated Samples

   Since articles in specific fields help analyze and compare results, based on previous work[20-21,
30] , we randomly selected 40 research papers published in 2017-2021 from Computer Science as
annotated samples. This journal has a standard format, high quality, and reasonable length, and is more
suitable for annotation experiments of academic papers.




Figure 1: Academic paper semantic annotation process.

4.2.    Annotation Experiment and Encoding Consistency Analysis

   This research adopts the manual annotation strategy, which requires the annotators to judge and
understand the content of the academic papers. Therefore, to ensure the reliability of the annotations,
eight papers were randomly selected from a sample of 40 papers for consistency check, i.e., encoding
consistency analysis, before starting the semantic annotation experiments. Specifically, first, the author
annotated these eight papers, which were then annotated again by a person familiar with encoding
conventions. Finally, the Kappa coefficient is calculated, an indicator used for consistency testing that
can also measure the classification effectiveness [38]. The kappa coefficient is mostly between 0.6-1,
presenting substantial consistency. After that, the author marked the remaining 32 papers and finally
completed annotating the academic papers.

4.3.    Annotation Result Analysis

   Next, we statistically analyzed the annotating results. On the one hand, ontology coverage is used to
evaluate the PEO coverage in all papers. On the other hand, text encoding coverage is used to assess
the PEO coverage capabilities for individual papers, i.e., the ability of PEO to represent the semantic
units of academic papers and their logical relationships verified from the above two aspects.

4.3.1. Ontology Coverage

   Ontology coverage refers to the proportion of articles containing ontology categories in the total
number of articles. Figure 2 illustrates the number of coding items of a single coding node. From the
sample of 40 papers, different categories appear with different frequencies. Among them, nine
categories such as “background”, “conclusion”, “outcome evaluation”, “method description”, and


                                                   21
“existing research” cover all academic papers, so these categories are regarded as common categories,
illustrating the importance of this taxonomy. In addition, except for “theoretical basis”, “limitations”,
“experimental environment”, “method selection”, and “research objectives”, the coverage rate of the
remaining categories is more than 70%, which shows that most of the categories in PEO are
representative.

4.3.2. Text Encoding Coverage

    A node’s length proportion that encodes the content is important. By summing the encoding
coverage of all categories in a single paper, the text encoding coverage of the entire paper is obtained
to evaluate whether PEO can cover each academic paper. The statistical results are depicted in Figure
3, which reveals that the text encoding coverage is at least 75.33% and at most 92.57%, most of which
falls in the 80.00% to 90.00% range. The average text encoding coverage rate of the 40 papers reached
84.64%. Therefore, the classes in PEO can express most of the academic paper content. To simplify the
processing, some of the paper’s content has been appropriately deleted, e.g., figures, tables, and
formulas, before annotating, while some content has not been annotated, e.g., keywords and titles at all
levels. Therefore, the text encoding coverage is not statistically accurate, but it should be better than
the results presented in the figure.




Figure 2: Statistical results of PEO ontology coverage.




Figure 3: Statistical results of PEO text encoding coverage.

4.3.3. Comparison with Other Ontologies

   In order to compare the representation ability of PEO, this paper uses the currently relatively mature
Scientific Functional Unit Ontology (FUO) and Discourse Element Ontology (DEO) to annotate the
same 40 sample papers. The corresponding results are reported in Table 4, highlighting that compared
with FUO and DEO, the ontology coverage of the proposed PEO is 27.78% and 17.52% higher,
respectively, and the text encoding coverage is 16.19% and 21.39% higher. These findings indicate that
compared with existing ontologies, PEO has a stronger representation ability for the semantic units of
academic papers.

                                                   22
Table 4
Annotation results based on different ontologies.
    Ontology Name           Ontology Coverage(average)          Text Encoding Coverage(average)
         FUO                            51.96%                                  68.45%
         DEO                            62.22%                                  63.25%
    PEO(this paper)                     79.74%                                  84.64%

5. Knowledge Extraction and Storage of Academic Papers

    Knowledge extraction and storage are important parts of a knowledge graph construction. Thus, first,
this research uses the GATE (General Architecture for Text Engineering)[39] framework to
semantically annotate two academic papers and obtain the documents in XML format. The titles of
these two articles are “Moves Recognition in Abstract of Research Paper Based on Deep Learning” and
“Masked Sentence Model Based on BERT for Move Recognition in Medical Scientific Abstracts”, from
JDIL and JDIS, respectively. Then, the XML is parsed to obtain a series of instance data with semantic
tags, and finally, the obtained instance data is mapped to the concepts of the ontology layer, and the
Neo4j graph database is used for storage and visualization. Figure 4 visualizes the knowledge graph of
the academic papers.




Figure 4: Example of PKG.

6. PKG Application Exploration

    The knowledge graph constructed in this paper for the content of academic papers is only a prototype.
In the future, the artificial processing link in the current process will be realized through intelligent
means such as natural language processing and machine learning. At the same time, investigating
knowledge graph fusion between multiple papers will also be considered. On this basis, the application
of the knowledge graph is explored and realized in multiple directions.
    Academic paper knowledge graphs and semantic technologies provide descriptions of the
classification, attributes, and relationships of knowledge units in papers so that search engines can
directly search for knowledge. For example, the user can directly query the “research objectives”,
“background”, “research significance”, and “contribution” in a particular paper. As illustrated in Figure
5, this study presents a preliminary semantic retrieval example based on PKG. Realizing semantic
retrieval can not only enable researchers to obtain information efficiently. At the same time, it can also

                                                    23
provide support for intelligent services such as intelligent question answering, decision support, and
personalized recommendation.




Figure 5: Application example of semantic retrieval PKG-based.

    Automatically selecting or generating the corresponding responses according to some questions can
improve the automation of information processing and resource acquisition efficiency and save human
resources and costs. Based on the knowledge graph proposed in this paper, some intelligent questions
answered in scientific research can be realized. For example: “Is there any literature mentioning that a
certain method was used to solve a certain problem?”. As depicted in Figure 6, this study implements
the above question and answer example based on PKG. The primary process of realizing this intelligent
question answering is: first, parse the question sentence through advanced natural language processing
technology, obtain the semantic information, and convert it into a query sentence in a structured form.
Then, retrieve the relevant information from the knowledge graph and give relevant answers. In this
way, researchers do not need to spend time and effort consulting literature but can quickly obtain
relevant information from current research through the intelligent question-answering system to speed
up scientific research.




Figure 6: Application example of intelligent question answering PKG-based.

7. Conclusion and Outlook

   Based on the knowledge units-theory, ontology, and knowledge graph theory and through the
detailed analysis of the academic papers’ content, this study constructs an academic paper expression
ontology (PEO), which solves existing research problems, such as too coarse modeling granularity and
insufficient logical relationship representation ability. The semantic annotation experiment of academic

                                                  24
papers demonstrates that PEO ontology can comprehensively and deeply express the semantic units and
their logical relationships in academic papers, verifying PEO’s ontology ability to express academic
papers. Second, we preliminarily construct the knowledge graph of academic papers based on PEO and
through manual semantic analysis of the paper’s content employing the GATE text annotation tool,
XML parsing tool, and Neo4j graph database. Finally, semantic retrieval and intelligent question
answering for academic knowledge are further realized based on PKG.
    However, the current research still has some limitations. First, the PEO ontology only describes the
text content semantically and does not consider other forms of content in the paper. Second, the
knowledge graph construction process relies on manual analysis and processing. For the first problem,
we design a particular semantic description model for the content outside the text format, combining
the external features, internal features, charts, formulas, and other information ontologies or models of
the paper to build a multimodal knowledge graph. This strategy covers academic knowledge in both
breadth and depth. For the second problem, we employ natural language processing technology and
machine learning technology for knowledge extraction and fusion to improve knowledge graphs’
automatic construction. Furthermore, this strategy supports intelligent services such as semantic
retrieval, intelligent question answering, intelligent recommendation, and automatic review generation
for academic knowledge and information.

8. Acknowledgements

   This work was supported by National Natural Science Foundation of China: the Intelligent Analysis
and Optimization Method for Reflowable Documents(61672105). The English language was reviewed
by EditSprings (https://www.editsprings.cn ).

9. References

[1] Ji, S. and Pan, S. and Cambria, E. and Marttinen, P. and Philip, S. (2021) A survey on knowledge
    graphs:Representation, acquisition, and applications, IEEE Transactions on Neural Networks and
    Learning Systems, 33, 494–514. https://doi.org/10.1109/TNNLS.2021.3070843
[2] Zhang, F. and Liu, S. and Tang, J. and Dong, Y. and Yao, P. and Zhang, J. and et al. (2019) Oag:
    Toward linking large-scale heterogeneous entity graphs, Proceedings of the 25th ACM SIGKDD
    International Conference on Knowledge Discovery & Data Mining, Association for Computing
    Machinery, New York, USA, 2585–2595.
[3] Tang, J. and Zhang, J. and Yao, L. and Li, J. and Zhang, L. and Su, Z. (2008) Arnetminer:
    extraction and mining of academic social networks, Proceedings of the 14th ACM SIGKDD
    international conference on Knowledge discovery and data mining, Association for Computing
    Machinery, New York, USA, 990–998.
[4] Sinha, A. and Shen, Z. and Song, Y. and Ma, H. and Eide, D. and Hsu, B. and Wang, K. (2015)
    An overview of microsoft academic service (mas) and applications, Proceedings of the 24th
    international conference on world wide web, Association for Computing Machinery, New York,
    USA, 243–246.
[5] Bratsas, C. and Filippidis, P.M. and Karampatakis, S. and Ioannidis, L. (2018) Developing a
    scientific knowledge graph through conceptual linking of academic classifications, 2018 13th
    International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP),
    IEEE, 113–118.
[6] Auer, S. and Oelen, A. and Haris, M. and et.al. (2020) Improving access to scientific literature with
    knowledge graphs, Bibliothek Forschung und Praxis, 44, 516–529.
[7] Fathalla, S. and Vahdati, S. and Auer, S. and Lange, C. (2017) Towards a knowledge graph
    representing research findings by semantifying survey articles, International Conference on Theory
    and Practice of Digital Libraries, Springer, Cham, 315–327.
[8] Cao, S. and Zhao, B. (2021) The Construction and Application of Knowledge Graph for the
    Innovative Content of Academic Papers, Journal of Modern Information, 41, 28–37.
[9] Roy, A. and Akrotirianakis, I. and Kannan, A.V. and Fradkin, D. and Canedo, A. and Koneripalli,
    K. and Kulahcioglu, T. (2020) Diag2graph: representing deep learning diagrams in research papers

                                                   25
     as knowledge graphs, 2020 IEEE International Conference on Image Processing (ICIP), IEEE,
     2581–2585.
[10] Wang, Y. (2020) Semantic Model for the Content of Scientific Literature, Journal of Library and
     Information Science in Agriculture, 32, 12–24.
[11] Groza, T. and Handschuh, S. and Mollercr, K. and Decker, S. (2017) SALT-Semantically
     Annotated LATEX for Scientific Publications, European Semantic Web Conference, Springer-
     Verlag, Berlin, Heidelberg, 518–532.
[12] de Waard, A. and Tel, G. (2006) The ABCDE Format Enabling Semantic Conference Proceedings,
     Proceedings of 1st workshop: SemWiki2006-from wiki to semantics, Budva, montenegro, 1–3.
[13] Liakata, M. (2010) Zones of conceptualisation in scientific papers: a window to negative and
     speculative statements, Proceedings of the Workshop on Negation and Speculation in Natural
     Language Processing, Association for Computational Linguistics, USA, 1–4.
[14] WWW.          (2011-12-22)        Ontology      of     Rhetorical     Blocks(ORB),       [EB/OL].
     http://www.w3.org/TR/hcls-orb
[15] Di Iorio, A. and Vitali, F. and Peroni, S. (2013-07-16) The Pattern Ontology Describing documents
     by          means           of         their       structural        components,         [EB/OL].
     https://sparontologies.github.io/po/current/po.html
[16] De Ribaupierre, H. and Falquet, G. (2013) A user-centric model to semantically annotate and
     retrieve scientific documents, Proceedings of the sixth international workshop on Exploiting
     semantic annotations in information retrieval, Association for Computing Machinery, New York,
     USA, 21–24.
[17] Shotton, D. and Peroni, S. (2015-07-03) The Pattern Ontology Describing documents by means of
     their structural components, [EB/OL]. https://sparontologies.github.io/deo/current/deo.html
[18] Shotton, D. and Peroni, S. (2015-07-03) DoCO, the Document Components Ontology, [EB/OL].
     https://sparontologies.github.io/doco/ current/doco.html
[19] Constantin, A. and Peroni, S. and Pettifer, S. and Shotton, D. and Vitali, F. (2016) The document
     components ontology (DoCO), Semantic web, 7, 167–181.
[20] Qin, C.X. and Yang, Z.J. and Zhao, P.W. and Liu, J. (2018) The Knowledge Element Ontology
     Model of Scientific Literature for Knowledge Representation, Library and Information Service, 62,
     94–103.
[21] Wang, X.G. and Li, M.L. and Song, N.Y. (2018) Design and Application of Scientific Paper
     Functional Units Ontology, Journal of Library Science in China, 44, 73–88.
[22] Zhang, L. and Kopak, R. and Freund, L. and Rasmussen, E. (2010) A taxonomy of functional units
     for information use of scholarly journal articles, Proceedings of the American Society for
     Information Science and Technology, 47, 1–10.
[23] Sun, J.J. and Pei, L. and Jiang, T. (2018) Research on Semantic Annotation in Academic Literature,
     Journal of the China Society for Scientific and Technical Information, 37, 1077–1086.
[24] Shotton, D. and Peroni, S. (2018-06-22) C4O, the Citation Counting and Context Characterization
     Ontology, [EB/OL]. https:// sparontologies.github.io/c4o/current/c4o.html
[25] Peroni, S. and Shotton, D. (2012) FaBiO and CiTO: ontologies for describing bibliographic
     resources and citations, Journal of Web Semantics, 17, 33–43.
[26] Niu, H.L. and Ou, S.Y. (2020) Design and Application of a Semantic Annotation Framework for
     Scientific Articles, Information studies: Theory & Application, 43, 124.
[27] Soldatova, L.N. and King, R.D. (2006) An ontology of scientific experiments, Journal of the Royal
     Society Interface, 3, 795–803.
[28] Clark, T. and Ciccarese, P.N. and Goble, C.A. (2014) Micropublications: a semantic model for
     claims, evidence, arguments and annotations in biomedical communications, Journal of biomedical
     semantics, 5, 1–33.
[29] Groth, P. and Gibson, A. and Velterop, J. (2010) The anatomy of a nanopublication, Information
     Services & Use, 30, 1–2.
[30] Teufel, S. (1999) Argumentative zoning: Information extraction from scientific text, Ph. D.
     Dissertation. University of Edinburgh, Edinburgh, U.K.
[31] Teufel, S. and Siddharthan, A. and Batchelor, C. (2009) Towards domain-independent
     argumentative zoning: Evidence from chemistry and computational linguistics, Proceedings of the
     2009 conference on empirical methods in natural language processing, 1493–1502.

                                                  26
[32] Vitali, F. and Peroni, S. (2011-05-04) The argument model ontology, [EB/OL].
     https://sparontologies.github.io/amo/current/amo.html
[33] Wang, X.G. and Zhou, H.M. and Song, N.Y. (2020) Scientific Paper Argumentation Ontology and
     Annotation Experiment, Journal of the China Society for Scientific and Technical Information, 39,
     885–895.
[34] Qu, J.B. and Ou, S.Y. (2021) Semantic Modeling for Scientific Paper Argumentation Structure
     Driven By Sematic Publishing, Journal of Modern Information, 41, 48–59.
[35] Mann, W.C. and Thompson, S.A. (2021) Rhetorical structure theory: Toward a functional theory
     of text organization, Text-interdisciplinary Journal for the Study of Discourse, 8, 243–281.
[36] Chu, X.M. and Xi, X.F. and Jiang, F. and Xu, S. and Zhu, X.M. and Zhou, G.D. (2020) Macro
     Discourse Structure Representation Schema and Corpus Construction, Journal of Software, 31,
     321-343.
[37] Feng, D. (2020) Qualitative Research Data Analysis Tool NVivo 12 Practical Tutorial, Posts &
     Telecom press, Beijing.
[38] Cohen, J. (1960) A coefficient of agreement for nominal scales, Educational and psychological
     measurement, 29, 37–46.
[39] Cunningham, H. and Tablan, V. and Roberts, A. (2013) Getting more out of biomedical documents
     with GATE’s full lifecycle open source text analytics, PLoS computational biology, 9, e1002854.
     https://doi. org/10.1371/journal.pcbi.1002854




                                                 27