=Paper=
{{Paper
|id=Vol-2658/preface
|storemode=property
|title=Preface to the 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents at JCDL 2020
|pdfUrl=https://ceur-ws.org/Vol-2658/preface.pdf
|volume=Vol-2658
|authors=Chengzhi Zhang,Philipp Mayr,Wei Lu,Yi Zhang
|dblpUrl=https://dblp.org/rec/conf/jcdl/Zhang0LZ20a
}}
==Preface to the 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents at JCDL 2020
==
EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents
Preface to the 1st Workshop on
Extraction and Evaluation of Knowledge Entities from
Scientific Documents at JCDL 2020
Chengzhi Zhang1, Philipp Mayr2, Wei Lu3, Yi Zhang4
1. Nanjing University of Science and Technology, Nanjing, China,
zhangcz@njust.edu.cn
2.GESIS-Leibniz-Institute for the Social Sciences, Cologne, Germany,
philipp.mayr@gesis.org
3.Wuhan University, Wuhan, China,
weilu@whu.edu.cn
4.University of Technology Sydney, Sydney, Australia,
Yi.Zhang@uts.edu.au
1. Introduction
The 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific
Documents (EEKE 2020) was launched at the ACM/IEEE Joint Conference on Digital Libraries
(JCDL) on August 1, 2020. The goal of this workshop is to engage the related communities in
open problems in the extraction and evaluation of knowledge entities from scientific documents.
Participants are encouraged to identify knowledge entities, explore feature of various entities,
analyze the relationship between entities, and construct the extraction platform or knowledge base.
Results of this workshop are expected to provide scholars, especially early career researchers, with
knowledge recommendations and other knowledge entity-based services [1].
2. Overview of the papers
This year, 14 papers (including 3 long papers, 6 short papers, 4 posters and 1 demo) were accepted
for presentation and inclusion in the proceedings. In addition, the workshop featured two keynote
talks in the different EEKE-related fields. All workshop contributions are documented in the
workshop website1. The following section briefly lists the various contributions.
2.1 Keynotes
Two keynotes were presented in EEKE2020.
The first one was given by Ming Song: Entitymetrics 2.0: Measuring the Impact of Entities
and Relations Extracted from Scientific Documents. The concept of entitymetrics was first
introduced in 2013, entitymetrics [2] has been applied to measure the impact of entities as well as
to gauge the knowledge usage and transfer anchored on entities for knowledge discovery. In this
talk, the previous studies employing entitiymetrics are summarized and the limitations of the
current approaches are discussed. In addition, the future directions of entitymetrics are suggested.
1
https://eeke2020.github.io/
Copyright 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
1
EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents
The second keynote was given by Markus Stocker: Building Scholarly Knowledge Bases with
Crowdsourcing and Text Mining. Building on the Open Research Knowledge Graph
(http://orkg.org) as a concrete research infrastructure, in this talk Dr. Stocker presented how using
crowdsourcing and text mining humans and machines can collaboratively build scholarly
knowledge bases. He discussed some key challenges that human and technical infrastructures face
as well as the possibilities scholarly knowledge bases enable.
2.2 Research papers, Posters and Demo
The following papers were presented in 5 sessions.
Session 1: Knowledge Entity Extraction and Application
-Jennifer D'Souza and Sören Auer
NLPContributions: An Annotation Scheme for Machine Reading of Scholarly Contributions
in Natural Language Processing Literature
This paper describes an annotation initiative to capture the scholarly contributions in natural
language processing (NLP) articles, particularly, for the articles that discuss machine learning
(ML) approaches for various information extraction tasks. They attempted to find a systematic
set of patterns of subject-predicate-object statements for the semantic structuring of scholarly
contributions, and to apply the discovered patterns in the creation of a larger annotated dataset
for training machine readers of research contributions.
-Mengjia Wu and Yi Zhang
Intelligent Bibliometrics for Discovering the Associations between Genes and Diseases:
Methodology and Case study
This paper proposes an adaptable and transferable methodology to extract biomedical entities
including diseases, chemicals, genes and genetic variations from literature data. A
heterogeneous co-occurrence network is constructed and a semantic adjacency matrix is
generated to identify key genes and genetic variants, and capture the emerging disease-gene
associations via a link prediction approach.
Session 2: Entity Extraction from Scientific Documents
-Liangping Ding, Zhixiong Zhang, Huan Liu, Jie Li and Gaihong Yu
Automatic Keyphrase Extraction from Scientific Chinese Medical Abstracts Based on
Character-Level Sequence Labeling
In this paper, authors regard automatic keyphrase extraction from Chinese text as a
character-level sequence labeling task. Unsupervised keyphrase extraction methods including
term frequency (TF), TF-IDF, TextRank and supervised machine learning methods including
Conditional Random Field (CRF), Bi-directional Long Short Term Memory Network (BiLSTM)
and BiLSTM-CRF are used to extract keyphrases from academic papers in medical domain.
The character-level sequence labeling model based on BERT obtains the best result.
2
EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents
-Jin Mao, Shiyun Wang and Xianli Shang
Investigating interdisciplinary knowledge flow through citances
This study attempts to investigate the content of knowledge flow towards an interdisciplinary
field by analyzing the citation sentences (i.e., citances) in the articles of eHealth field. The
associated knowledge phrases between citances and the references are identified and
categorized to analyze the content and categories of knowledge spread from the source
disciplines to the field. In general, this study contributes to the understanding of content
characteristics about interdisciplinary knowledge integration.
-Yu Li, Tao Yue (Speaker) and Wu Zhenxin
IEKM-MD: An Intelligent Platform for Information Extraction and Knowledge Mining in
Multi-Domains
This paper constructs a platform for information extraction and knowledge mining, namely
IEKMMD. Two innovative technologies are proposed: Firstly, a phrase-level scientific entity
extraction model combining neural network and active learning is designed to reduce the
model’s dependence on large-scale corpus. Secondly, a translation-based relation prediction
model is provided, which improves the relation embedding by optimizing loss function. In
addition, the platform integrates the advanced entity recognition model and the keyword
extraction mode, and provides abundant services for fine-grained and multi-dimensional
knowledge.
-Liang Chen, Shuo Xu, Weijiao Shang, Zheng Wang, Chao Wei and Haiyun Xu
What is Special about Patent Information Extraction?
This article aims at exploring the particularity in patent information extraction, thus to point out
the direction for further research. To be more specific, they discuss: (1) what is the special about
labeled patent dataset? (2) What is special about word embedding in patent information
extraction? (3) What kind of method is more suitable for patent information extraction?
Sesson 3: Interactive demos
-Zi Xiong, Yue Qi, Wei Lu and Qikai Cheng
Design and Implementation of an Academic Search System Based on a General Query
Language and Automatic Question Answering
This research designs and implements an academic search system with two major innovations:
1) proposing a general query language SSL to describe the academic search intention in a
unified and standardized way, 2) proposing a user intention recognition method to help improve
traditional automatic question-answering systems. The SSL language and intention recognition
method is applied to a QA-oriented academic system which is innovative compared with
traditional query-based systems.
3
EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents
Session 4: Entity Relation Extraction and Application
-Xin An, Jinghong Li, Shuo Xu, Liang Chen and Sainan Pi
A Novel Approach for Patent Similarity Measurement Based on Sequence Alignment
In order to measure the similarity among different patents, this study proposes a novel approach
on the basis of sequence alignment. The method takes semantic direction of each sequence
structure and the word order information of each component into consideration; an algorithm
for calculating the global importance of each sequence structure is put forward. Extensive
experimental results show that the proposed approach is significantly more accurate and is not
sensitive to several core parameters.
-Fang Tan, Siting Yang, Xiaoyan Wu and Jian Xu
Exploring the Relation between Biomedical Entities and Government Funding
In order to analyze the effect of government funding on the promotion of scientific research,
and to help the government manage research funds more rationally, this study proposes a
framework for analyzing the relationship between entities in the field of medicine and funds.
The results reveal that the field of genetic research is in a period of rapid development and
disease research catch NIH’s continuous attention. However, the stimulating effect of
government funding on the research popularity is decreasing.
-Sahand Vahidnia, Alireza Abbasi and Hussein A. Abbass
Document Clustering and Labeling for Research Trend Extraction and Evolution Mapping
In this study, a method is proposed to extract research trends and their temporal evolution,
throughout discrete time periods. Adapting contextualized word embedding techniques, the
method utilizes published academic documents as knowledge units and clusters them into
groups. Various labeling techniques are explored to evaluate the quality of clusters and explore
their explain ability. The results show that utilization of neural embedding in conjunction with
paragraph-term weights would provide simple and reliable paragraph embedding that can be
used for clustering of the textual data.
Session 5: Poster/ Greeting Notes of EEKE2020
-Qikai Liu, Pengcheng Li, Wei Lu and Qikai Cheng
Long-tail dataset entity recognition based on Data Augmentation
Datasets play an important role in data-driven scientific research. It is important to recognize
dataset entities correctly, especially when it comes to unusual long-tail dataset entities. However,
it is very difficult to obtain high quality training corpus in named entity recognition. This paper
obtained the data based on a distant supervision method along with two data augmentation
methods. A BERT-BiLSTM-CRF model is used to predict long-tail dataset entity.
-Xiaole Li, Yuzhuo Wang
Assessing Impact of Method Entities in a Special Task
Methods play an important role in the research. Identifying and analyzing entities about
research methods can help scholars understand methods used in their field and accelerate the
efficiency of scientific research. This paper takes named entity recognition (NER) as an
4
EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents
example and evaluates the impact of method entities in this domain. They found that
conditional random field (CRF) is the most influential algorithms in NER. Deep learning
algorithms have developed rapidly in the past 5 years. F-measure, precision and recall are the
most widely used indices and measurements. Scholars do not pay enough attention to use tools
and they prefer to use classic datasets.
-Chong Chen, Jingying Zhang, Xiaoyu Chu and Jinglin Zheng
Study on the Difference between Summary Peer Reviews and Abstracts of Scientific Papers
This article proposes primary measurement to compare Summary peer reviews with abstracts
from readability and semantic function types. The results show that summary peer reviews
highlight some distinct function types, and the terminology in peer reviews is not as dense as in
abstracts. Summary peer reviews can be complement to abstracts in literature searches, and can
help readers understanding papers more thoroughly.
-Wei Shao, Hua Bolin, Qiang Ma, Jiaying Liu, Hongwei He, Keqi Chen
An Unsupervised Method for Terminology Extraction from Scientific Text
Finding new terminology is a kind of named entity recognition (NER) problem. However, many
high performance methods need labeled data. This paper proposes an unsupervised method
based on sentence pattern and part of speech. They initialize a few patterns to extract
terminologies in certain sentences, and then try to find the same POS sequences in sentences
not matched by initial patterns with obtained terminologies’ POS sequences. The new patterns
and more terminologies are obtained after several iterations.
3. Outlook and further reading
Currently the EEKE2020 organizers edit the following two Special issues:
-Special Issue on “Extraction and Evaluation of Knowledge Entities from Scientific Documents”
in Journal of Data and Information Science (https://mc03.manuscriptcentral.com/jdis).
-Special Issue on “Scientific Documents Mining and Applications” in Data and Information
Management (https://www.editorialmanager.com/dim/default.aspx).
References
[1] Chengzhi Zhang, Philipp Mayr, Wei Lu, Yi Zhang. (2020). Extraction and Evaluation of
Knowledge Entities from Scientific Documents: EEKE2020. In: Proceedings of the 20th
ACM/IEEE Joint Conference on Digital Libraries (JCDL2020), Wuhan, China, 2020.
https://doi.org/10.1145/3383583.3398504
[2] Ying Ding, Min Song, Jia Han, Qi Yu, Erjia Yan, Lili Lin, Tamy Chambers. Entitymetrics:
measuring the impact of entities. Plos One, 8(8), e71416.
5