<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>End-to-end sequence labeling via bi-
directional lstm-cnns-carrfX.iv preprint arXiv:</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A Study of Person Entity Extraction and Profiling from Classical Chinese Historiography</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yihong May</string-name>
          <email>yihongma97@gmail.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Qingkai Zengy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tianwen Jianyg</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liang Cazi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Meng Jiangy</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>1603</year>
      </pub-date>
      <volume>01354</volume>
      <fpage>1135</fpage>
      <lpage>1145</lpage>
      <abstract>
        <p>When historians are interested in demographic and social network Our approach Truth information of historical actors in the early Chinese empires (841Courtesy name 长卿 长卿 BC-1911 AD), very few studies have been done on entity retrieval Hometown 東海蘭陵 東海蘭陵 from classical Chinese historiography. The key challenge lies in the Title(s) 郎, 丞相掾, 郎, 丞相掾, low resource of the language: deep learning requires large amounts 名之 曲臺署長 of annotated data and becomes impracticable when such data is noFtather 孟卿 孟卿 available. In this study, we employ domain experts (history profes- Son N/A N/A sors) to curate a setpoefrson entities and their profile atributes Master(s) 田王孫, 同郡碭田王孫 田王孫 (e.g., courtesy name,place of birth,titl)eand relations (e.gfa.,ther- Disciple(s) 沛翟牧子兄, 同郡白光少子, 翟牧, 白光, son, master-discipl)efrom two booksR,ecords of the Grand Historian 疏廣, 后蒼 趙賓, 焦延壽 and Book of Han. We develop a patern-based bootstrapping apTable 1: The task is to extract person entities and their proproach to extract the information with a very small number (i.eifl.i,ng attributes from classical Chinese text. Our approach 1 or 2) of seed paterns. Experimental results show the efective- can find most of the attribute values correctly (i.e., marked ness as well as the limitations of the iterative method. We woublyd underline),s compared with the ground truth annotated appreciate research of digital humanities to address the challenbgyeshistory professors. in entity retrieval from low-resource languages.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>databases about ancient China. As we know, NLP is being
revoluCCS CONCEPTS tionized by deep learning with neural networks. However, deep
• Information system→s Data minin;gInformation extractio.n learning requires large amounts of annotated data, and its
advantage over traditional statistical methods typically diminishes when
KEYWORDS such data is not available. How to address the islsouweroefsource
in the task of entity extraction and profiling from classical Chinese</p>
    </sec>
    <sec id="sec-2">
      <title>Information extraction, Entity profiling, Classical Chinese, Textual</title>
      <p>
        text is still an open problem.
patern, Bootstrapping In this study, we collect a ground-truth dataset for evaluating on
the task, propose a patern-based information extraction approach
1 INTRODUCTION that requires very limited prior knowledge of classical Chinese,
Historians are interested in the classical Chinese literature aasnditconduct experiments to show its efectiveness and limitations.
witnessed the rise and fall of the early empires and dyn2a,s3t,ies [ First, we recruit domain experts (history professors) to
anno18, 32]. Currently, they have to spend a large amount of time readt-ate person entities and their profiles from two classical Chinese
ing those books and digging ouwtho came from where, who did books,Records of the Grand Historian(authored by Sima Qian,
comwhat, who studied from whom, and so on, and before it, they had pleted in c. 86 BC) andBook of Han (authored by Ban Gu,
comto spend even longer time to learn the classical Chinese languapgleeted in 111 AD). Tabl1egives an example of the annotated profile
even some of them are (absolutely modern) Chine1s1e, 2[2, 26]. of Meng Xi孟喜 ( 90 BC– 40 BC). We focus on three atributes
hTerefore, the idea of utilizing digital technologieesxtroact per- (i.e., courtesy name,place of birth,positions/titl)esand two relations
son entities and their profiling atributes from classical Chinese text (e.g., father-son, master-discipl)e, because (1) these are main
conbecomes promising and exciting in the community, as natural lan-tents in the work of Chinese historiography and (
        <xref ref-type="bibr" rid="ref21">2</xref>
        ) historians are
guage processing (NLP) and entity retrieval have been developingvery interested in how the government mechanisms were
influand accelerating at an unprecedented speed today. enced by family and master-disciple relationships in the ancient
      </p>
      <p>However, it is a rather challenging task due to lack of annotatteimde. The domain experts generated fity handcrafted paterns to
data. Historians write papers, publish books, but rarely build entietxytract the above information. They validated the atribute values
and assessed the reliability of the paterns (see Ta4)b.lMeoreover,
Copyright © 2019 for this paper by its authors. Use permited under Creative Com- they annotated 15 complete person profiles (with 158 atribute
valmons License Atribution 4.0 International (CC BY 4.0). ues) that they feel the most interested in. This dataset can serve as</p>
      <sec id="sec-2-1">
        <title>Courtesy name</title>
      </sec>
      <sec id="sec-2-2">
        <title>Hometown</title>
      </sec>
      <sec id="sec-2-3">
        <title>Title(s)</title>
      </sec>
      <sec id="sec-2-4">
        <title>Father</title>
        <p>Son(s)</p>
      </sec>
      <sec id="sec-2-5">
        <title>Master</title>
      </sec>
      <sec id="sec-2-6">
        <title>Disciple(s)</title>
      </sec>
      <sec id="sec-2-7">
        <title>Meng Xi孟喜</title>
        <p>長卿
東海蘭陵
郎, 曲臺署長, 丞相掾
孟卿</p>
        <p>N/A
田王孫
趙賓, 白光, 翟牧, 焦延壽</p>
      </sec>
      <sec id="sec-2-8">
        <title>Meng Qing孟卿</title>
        <p>N/A
東海
N/A
N/A
孟喜
蕭奮
后倉, 閭丘卿, 疏廣
Yan An Le顏安樂
公孫
魯國薛
齊郡太守丞, 大司農
眭孟姊</p>
        <p>N/A
眭孟
泠豐, 任公, 冥都, 筦路</p>
      </sec>
      <sec id="sec-2-9">
        <title>Zhang Yu 張禹</title>
        <p>子文
河內軹
郡文學, 光祿大夫, 東平內史</p>
        <p>N/A
N/A
施讎
彭宣, 戴崇
ground truth for evaluating such information extraction methodsHistoriography Book # Sentences # Words
on the classical Chinese literature. Records of the Grand Historian 32,564 615,457</p>
        <p>Second, we propose a bootstrapping approach to extract person Book of Han 40,114 874,165
entities and profiles requiring very litle prior knowledge of the
language. The algorithm starts from only one or two simple seed Table 3: Statistics of the dataset (two books).
paterns, finds the atribute values, and then use them to discover trumpdonaldjohn,
more complicated paterns. It has an estimator to access the relia- newyorktrumpdonald,
bility of paterns during the iterative process. So, the new atribute newyorktrumpdonaldjohn.
values extracted from more reliable paterns are more likely to beNote that there would be no white-space (nor upper-case) to split
trustworthy and can be used to infer paterns in next iterations. the words. Complicated paterns have to be designed or recognized</p>
        <p>Experiments show that textual paterns achieve an F1 score of in the extraction methods.
0.851 on 15 ground-truth person profiles. Tab1lsehows a
comparison between the generated profile (left) and the ground-truth pro-2.2 Person Entity Profiling
ifle (right) of Meng Xi孟喜. On the other side, the bootstrapping Given the classical Chinese historiography, the task of person
enmethod achieves the highest performance after 7 iterations to find tity profiling aims to extract demographic atributes (ec.ogu.,rtesy
a set of related paterns and extrapecrtson-titlepairs, while meet- name, place of birth,titl)eand social relations (ef.agt.,her-son,
mastering barriers to find more paterns for other atributes and relations. disciple) and to generate a complete profile for the person entities</p>
        <sec id="sec-2-9-1">
          <title>We summarize our contributions in this study as follows: extracted in Sectio2.n1.</title>
          <p>New datasetW:e recruit history professors to curate a set Table2 shows examples of the profiles of Confusians in the Han
of person profiles from classical Chinese literature. Dynasty such as Meng X孟i喜, Meng Qing孟卿, Yan An Le顏安
New approach:We develop a bootstrapping method based 樂, and Zhang Yu張禹. Some of the atribute values are “N/A” if
on textual paterns to extract the person entities and atrib-not available in the text corpora.
uted information, requiring litle prior knowledge. hTere are two typical challenges of this task. One is the variety
Efectiveness: Experiments show that textual paterns make of demographic atributes. Each type of atributes needs a set of
an F1 score of 0.851 on 15 person profiles annotated by the specific, reliable extractors, which requires prior knowledge of the
domain experts. Limitations are discussed in Sec4t.3io.n classical Chinese language. The other challenge is typically for the</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Chinese historiography: Zero Pronoun (ZP), which stands for pro</title>
      <p>2 ENTITY EXTRACTION AND PROFILING nouns that are omited when they are pragmatically or
grammat2.1 Person Entity Extraction ically inferable from the context. Here is an example taken from</p>
      <p>Records of the Grand Historia,nwhere the ZPs (denoted φas) all</p>
    </sec>
    <sec id="sec-4">
      <title>It sounds like a subtask of the standard named entity recognition</title>
      <p>refer to “Mr. Chunshen春” 申君:
(NER) task – it narrows down from recognizing multiple types of
entities (i.e., persons, locations, organizations) to only one type. [春申君] 者，φ 楚人也，φ 名歇，φ 姓黄氏。φ 游</p>
      <sec id="sec-4-1">
        <title>However, it has to face a challenge when put into the classical Chi- 學博聞，φ 事楚頃襄王。</title>
        <p>nese text: in classical Chinese literature, there are many diferent (Translation: [Mr. Chunsheφ n]w,as born in Chuφ , ’s
ways of mentioning a specific person. A person has first name, last ifrst name is Xie, φ ’s family name is Huangφ.
travname (family name), and courtesy name; and he is also recognized elled over the country and enriched his knowledge,
by his hometown and title/position in the government. For the sake φ served King Qingxiang of Chu.)
of readability, let us take the President of United States Donald JThi.s sentence indicates three atributes (i.eh.,ometown, first name ,
Trump as an example. “Donald” is his first name ansudppose“John” last nam)eand one relation (i.me.,aster-discipl)e about Mr.
Chun(J.) can be considered as his courtesy name. (Ancient Chinese peo-shen 春申君. However, ZP makes it challenging to link the
atple do not have middle name. They have courtesy name.) He was tributes and relations in the context with the person entity.
Moreborn in New York. So, all the following could be used in the classoiv-er, we observe that ZPs occur not only in the same sentence with
cal Chinese literature to mention President Trump: the mention of the person entity but also across several sentences
donald, in the same paragraph. In biographical historiography, each
chaptrumpdonald, ter discussed the life story of a certain person. So, we adopt the
presidentdonald, following assumption (learned from history professors) to resolve
Seed Patterns
$TITLE
$TITLE
… …
Seed Entities
… …</p>
        <p>Rank</p>
        <p>Pattern</p>
        <p>Entities
Generate and rank
textual patterns
Expand seed patterns
and seed entities</p>
        <p>Top #
1
…
…
53
...
…
...</p>
        <p>…
the ZP issue: given a paragraph, as long as a person entity was
extracted in the first clause, the ZPs in every clause of the paragraph
refer to that person entity. This will help us propose an approach
to extract person-atribute/relation pairs when the extractors were
only able to find the atributes and relations in local contexts.
3 THE PROPOSED APPROACH
In this section, we first introduce how the dataset was curated
with handcrafted paterns by domain experts. Next, we present a
patern-based bootstrapping method to find the entity information
with a small number of seed paterns.
#Values:The number of (person entity, atribute or relation
value)-pairs extracted by the patern.
#True Values:The number of true pairs annotated by the
domain experts.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>ReliabilityI:t describes whether a patern is reliable for ex</title>
      <p>tracting true values. It is calculated as
#True V alues
Reliability = ;
#V alues
which gives a score between 0 and 1.
(1)</p>
    </sec>
    <sec id="sec-6">
      <title>Specifically, for person, patern [$ Person 者] was the first pat</title>
      <p>tern that the domain experts come up w者ith”.is“ a typical symbol
in classical Chinese that indicates the appearance of a person. The
3.1 Data Curation with Handcrafted Patterns person entities extracted by paternPe[$rson 者] may be in any
We curate a dataset of two classical Chinese historiography boooksf,the 6 forms of person entity mentions in Sect3.i1o.1n. Another
Records of the Grand Historia n(authored by Sima Qian, completed frequent patern is [$Person 字 $CourtesyName]. Unlike patern
in c. 86 BC) andBook of Han (authored by Ban Gu, completed in [$Person 者], person entities extracted by paternPe[$rson 字
111 AD). Table3 lists the statistics of the dataset. $CourtesyName] strictly follow the formlaostf name+ first name .</p>
      <p>It is a more reliable patern. As the table shows, the reliability
3.1.1 Paterns for person entity extraction. The domain experts we of patern [$ Person 字 $CourtesyName] is 1 and the reliability
recruited to annotate the data contribute the following paternsoftopatern [$ Person 者] is only 0.6087 though the number of
exrecognize mentions of person entities: tracted person-value pairs is smaller (205 vs. 299).
$FirstName, Most of the paterns forhometown, father-son, andmaster-disciple
$LastName + $FirstName are highly reliable (higher-than-0.96 reliability), except pa，terns [
$Title + $FirstName, $Father 子] (ID 38) and [，事 $Master] (ID 46) of reliability 0.8571
$LastName + $FirstName + $CourtesyName, and 0.8356, respectively. Among the 28 paterns for atributteitl,e
$Hometown + $LastName + $FirstName, only 4 paterns have reliability of lower than 0.8 and only one has a
$Hometown + $LastName + $FirstName + $CourtesyName. reliability score of lower than 0.7, 至i.e.$,T[itle] (ID 36). Among
all the 50 handcrafted paterns, 35 (70%) paterns have reliability
3.1.2 Paterns for entity profiling. Table4 presents 50 textual pat- score of 1; 5 (10%) paterns have reliability score of lower than 0.8.
terns that were used to extract a set of candidates of person’s
attribute or relation values. Some atributes suchhomasetownand 3.2 Pattern-based Bootstrapping
father-sonhave a small number of paterns. Some such atsitleand We propose a new approach to extract person entities and profiles
master-disciplehave a large number of paterns. The domain ex- from classical Chinese historiography requiring very litle prior
perts also annotated whether the atribute values and relations akrneowledge of the language. Generally, it is an iterative method
true or false. For each patern, we give three numbers associatedthat uses textual paterns to extract atribute or relation values
with its extractions: from text data. Figu1reshows the diagram of one iteration in the</p>
      <p>Attribute Pattern
person $Person 者
person $Person 字 $CourtesyName
person $Person，$Hometown 人也
person $Person，$Hometown 人
hometown ，$Hometown 人也
hometown ，$Hometown 人
hometown 徙 $Hometown
courtesy name ，字 $CourtesyName
title 拜為 $Title
title 拜 $Person 為 $Title
title 遷 $Title
title 遷為 $Title
title 遷 $Person 為 $Title
title 遷至 $Title
title 封為 $Title
title 封 $Person 為 $Title
title 召為 $Title
title 召 $Person 為 $Title
title 補 $Title
title 察… 為 $Title
title 舉為 $Title
title 舉… 為 $Title
title 擢為 $Title
title 擢 $Person 為 $Title
title 徵為 $Title
title 徵 $Person 為 $Title
title 徙為 $Title
title 徙 $Person 為 $Title
title 復為 $Title
title 以 $Title 察
title 薦為 $Title
title 薦 $Person 為 $Title
title 贖為 $Title
title 立為 $Title
title 為 $Title
title $Person 為 $Title
title 至 $Title
father-son ，$Father 子也
father-son ，$Father 子
father-son ，其父 $Father
father-son ，父 $Father
father-son $Son 父曰 $Father
master-disciple 從 $Master 受…
master-disciple 事 $Master 受…
master-disciple $Master 授 $Disciple
master-disciple ，授 $Disciple
master-disciple ，事 $Master
master-disciple 事 $Master 為 $Title
master-disciple 弟子… 者，$Master
master-disciple 受… 於 $Master
master-disciple 與 $Person 俱事 $Master</p>
      <p>Example
陳丞相平 者
王莽 字巨君
申屠嘉，梁 人也
朝鮮王滿，燕 人
，陽城 人也
，高陽 人
自下邑徙平陵
，字長卿
拜為上卿
拜仁 為郎中令
遷東平太傅
起遷為國尉
遷廣明 為淮陽太守
稍遷至栘中廄監
綰封為長安侯
孝景後三年封蚡 為武安侯</p>
      <p>復召為郎
於是上召寧成 為中尉</p>
      <p>以選除補御史掾
以郡吏察廉為樓煩長
後以御史舉為鄭令
復舉賢良為河南令</p>
      <p>擢為光祿大夫
因擢延壽 為諫大夫</p>
      <p>徵為廄丞
徵由 為大鴻臚</p>
      <p>徙為頻陽令
徙立 為太原太守
後復為淮陽都尉
以郡吏 察廉為樓煩長</p>
      <p>薦為議郎
薦宣 為長安令
贖為庶人
自立為代王
為駙馬都尉侍中
禹 為丞相史</p>
      <p>至中大夫
，秦莊襄王 子也</p>
      <p>，文公 少子
，其父高祖中子</p>
      <p>，父號孟卿
悼侯 父曰隱太子友
從太中大夫京房 受易
又事前將軍蕭望之 受論語</p>
      <p>常 授梁蕭秉君房
，授翼奉、蕭望之、匡衡</p>
      <p>
        ，事太傅夏侯勝
事梁孝王 為中大夫
弟子遂之者，蘭陵褚大，東平嬴公
嘗受韓子、雜家說於騶田生所
與顏安樂 俱事眭孟
patern-based bootstrapping method. It starts with only one or twovalue strings. Hamming distance is defined as the minimum
numsimple seed paterns for each atribute. Because the number of seed ber of substitutions required to change one string into the other.
paterns is small, it would not take much efort to find one or two. (
        <xref ref-type="bibr" rid="ref21">2</xref>
        ) Variety of the patern’s extracted valuesA: patern would be
For example, [遷為 $Title] (i.e., [relegated toTi$tle]) and [補 more reliable if it extracted more true values. Besides the frequency,
$Title] (i.e., [filled in the position ofT$itle]) were the two reli- we try another measurement: we assume that if there was a value
able seed paterns for the atributetitl.eThe iterative method runs whose frequency dominates the set of values one patern extracted,
the following steps until convergence. the patern would be less reliable. So we use 1 minus the ratio of
the count of the most frequent value over all the value counts.
      </p>
      <p>Step 1: Generating pattern candidaCtaensd.idate paterns are
generated using contextual features of the targevti vianlutehe Step 3: Selecting new patterns and extracting new values for
clause. We find that target values are more likely to be at the entdhe next iteratioFonr. each patern, we calculated the reliability
of the clause because of the linguistic structure. Therefore, the com-scorer (p) and the frequency of values that it extracted. For the
monly usedskip-gram contextual paternw“ 1____w1” [28] would next iteration, we first filter out the paterns whose frequency is
not work for our task. Instead, we explore two diferent kinds ofbelow a threshold and then select top paterns of the hrig(ph)e.st
contextual features described as follows: After that, we expand the set of true valVue+s by adding the
$Pattern $Value. The textual patern is a window of values extracted by the new paterns.
a certain size of Chinese characters before a target value.</p>
      <p>For example, if the target valueTiist$le, we can find the 4 EXPERIMENTS
patern candidate 遷[為 $Title] (i.e., [relegate toT$itle]), In this section, we first evaluate the quality of handcrafted paterns
when the window size is 2. given by domain experts. Then we evaluate the efectiveness of the
$Pattern $Entity $Pattern $Value. Both a window bootstrapping method. Finally, we discuss the limitations.
of one Chinese character befo$rEentity and all characters
between$Entity and $Value are selected as the contex- 4.1 Evaluating the Handcrafted Patterns
tual feature. For example, Tifit$le is the target value and</p>
    </sec>
    <sec id="sec-7">
      <title>Here we conduct experiments to answer: do the handcrafted pat</title>
      <p>$Person is the entity that has already been extracted in Setce-rns extract correct person entities, atributes, and relations?
tion3.1.1, we can find a new patern candidate 遷[ $Person
為 $Title] (i.e., [relegateP$erson to $Title]).</p>
    </sec>
    <sec id="sec-8">
      <title>Evaluation methodsW. e use the 15 complete person profiles</title>
      <p>
        (with 158 values) as ground truth. We use standard Information
Step 2: Ranking pattern candida tItesis. nontrivial to rank the Retrieval metricPsr:ecision, Recall,andF1 score. Precisionis the
fracquality of patern candidates. It has two serious issues when con-tion of true atribute or relation values (i.e., values that find a match
sidering all the unlabeled entities as false: (1) penalized reliabinlethe corresponding atribute in ground-truth profiles) among all
paterns that extracted true unlabeled values and (
        <xref ref-type="bibr" rid="ref21">2</xref>
        ) could not pvea-lues extracted by handcrafted paternRs.ecallis the fraction of
nalize unreliable paterns that extracted false unlabeled values. Ttorue atribute or relation values among all ground-truthFv1alues.
address these issues, we define the estimation score of patern re- score is the harmonic mean oPfrecision andRecall.
liability as follows:
      </p>
      <p>Evaluation resulRtse.garding the entity profiles, handcrafted
tex∑v 2Vp (1 minv+ 2V+ d(v;v+)) tual paterns achieve aPrecisionof 0.901, aRecallof 0.803, and anF1
r (p) = w1 ∑v 2Vp f req(v) spcroorfileeo(fl0e.ft)8a51n.dTathblee1grsohuonwds-tarcuotmhpparroisfiloen(rbiegthwt)eoenfMthenegg孟eXni喜er,ated
( maxv 2Vp f req(v) ) where most of the values extracted are correct. We also find the
fol+ w2 1 ∑v 2Vp f req(v) 2 [0; 1]; floorwmisnogflpimeritsoantieonntsitoyf tmheenthiaonndscmraakfteedepnatteirtnysl.iFnikrisntg,t(ih.ee.,dmifeernetniotn
wherep is a textual paternv, is a value strinvg,+ is a true value alignment) dificult. For example, in thmeaster-disciplerelation of
string,Vp is the set of unique value strings extracted by patern Meng Xi 孟喜, “同郡白光少子”, i.e., Bai Guang白光 whose
courp, V+ is the set of unique true value stridn(gvs1;;v2) is the nor- tesy name is Shao Z少i子 from the same (同“”) hometown (郡“”)
malized hamming distance between the two value strfirnegqs(v,) as Meng Xi’s, extracted by handcrafted paterns should refer to Bai
is the frequency of the value strvi n.gw1 and w2 are weights: Guang白光 in the annotation. Second, ZP problem was resolved
w1 + w2 = 1. in most of the cases but may still assign atributes or relations to
hTe estimator includes two kinds of features: wrong persons. For example, in thmeaster-disciplerelation, Hou
(1) The textual similarity between the patern’s extracted values Cang 后蒼 and Shu Guang疏廣 are indeed disciples of Meng Xi
and true valuesI:f the value a patern extracted is very similar with 孟喜’s father Meng Qin孟g卿, but are mistakenly regarded as the
one true value, the value is likely to be true and the patern is likeldyisciples of Meng X孟i喜 due to the assumption.
to be reliable. For example, suppose “Tai Sh太ou守”, the name of
an oficial position, has been in the set of true values T(aitsl$e). 4.2 Evaluating the Efectiveness of the
hTen the value “Nan Yang Tai Shou南陽太守” extracted by a pat- Bootstrapping Method
tern, which means the Tai Sho太u守 ruling a place called Nan We conduct experiments to see if the bootstrapping method can
Yang 南陽, is likely to be a good value (aTsit$le). We use Ham- ifnd the set of handcrafted paterns with only one or two seed
patming distanceas the metric to measure the similarity between twoterns and see if the atribute values can be accurately extracted.</p>
    </sec>
    <sec id="sec-9">
      <title>Parameter settinWgse. set the window size as 2. The frequency</title>
      <p>threshold of paterns is 10. The number of top paterns selected per
iteration is 10. We run until convergence but just report the first 10
iterations for the sake of space. The weights of patern reliability
features arwe1 = w2 = 0:5.</p>
    </sec>
    <sec id="sec-10">
      <title>Evaluation methodHs.ere are the metrics for the two tasks.</title>
      <p>Task of patern extraction: We evaluate the performance on
extracting paterns for thetitleatribute. We use the metriPcrecision@K,
which is the fraction of tKospcored generated paterns that are in
the ground-truth patern set. We also define a new metCroivcerage@K
for the task, which is the fraction ofKtsocpored ground-truth
paterns that are extracted by the bootstrapping method. The
generated paterns were scored by the reliability estimation in Step 2
in Section3.2 and the ground-truth paterns were scored by the
reliability in Tabl4e. Average precisionA(P) computes the average
precision value for coverage over 0 to 1.</p>
      <p>Figure 2: The performance of the bootstrapping method on
Task of person-title pair extractioWn:e first assign a confidence person-title pair extraction gradually improved through
itscore to eachperson-titlepair by weighting the reliability score erationsA.UC increased from 0.107 to 0.207 (in iteration 7)
of the textual patern that extrapcetrsson and titlerespectively. and decreased after the point.</p>
    </sec>
    <sec id="sec-11">
      <title>We evaluate the person-title pairs extracted by the bootstrapping</title>
      <p>method at diferent numbers of iterations wPritechision-Recallcurves.</p>
      <p>Precisionis the fraction of true person-title pairs among all persoRn-esults on person-title pair extracRtuinoni.ng the
bootstraptitle pairs generated by handcrafted paternRse.callis the fraction ping method for more iterations generally increases the
perforof true person-title pairs among all 516 ground-truth person-timtlaence ofperson-titlpeair extractions, while after certain iterations
pairs.AUC is the area under the curve. the performance starts to shrink. From Fig2,urAeUC keeps
increasing in the first 7 iterations, achieving a maximum of 0.207 in
Results on pattern extractTheiobno.otstrapping method had iteration 7, and then begins to decrease in the last 3 iterations.
been improving the performance of patern extraction since it started,Why the Recall scores were consistently loPwa?tern ID 11, ID 34
while after certain iterations the performance turned to be worse.</p>
      <p>and ID 36 from Table4 are not found by the bootstrapping method
From Table5, running the bootstrapping algorithm for 3 iterationdsue to the seting of a window size of 2. Therefore, values extracted
can increaseAP by 42.55%, compared to running only one 1 itera- by those paterns, which occupy 45% of the total true values, will
tion. After around 5 iterationAsP, displays a continuous trend of never be found.
declining and iteration 10 gives the loAwPesotf 0.131, which is a
Why many false person-title pairs were included after iteration
decrease of 44.26% from iteration 1. What’s mCoroev,erage@K no #8? Domain experts have also designed stop words for each
handlonger update after 7 iterations. It indicates that the bootstrappicnrgafted paterns, which are capable of screening out common noises
may meet certain barriers in extracting more paterns. with certain paterns. But for the bootstrapping method, those noises</p>
      <p>After observing the result paterns, we can infer some limita- extracted by the paterns are still regarded as true.
tions on patern extraction of the patern-based bootstrapping method:</p>
    </sec>
    <sec id="sec-12">
      <title>First, there exist many paterns with either a relatively low fre</title>
      <p>quency (i.e. less than 20) or lack of interpretability (i.e. paterns4.3 Discussions
with scarcely any actual semantic meaning but somehow capablWee find that the bootstrapping method can work only on
extractof extracting “good” entities, which are still considered “good” binyg atribute values of T$itle. The values of $ Title could be shared
our method) that tend not to be found by our domain expertsb,y multiple paterns’ extractions because multiple people can be
which we should be reasonably tolerant of. assigned to the same position in the government. Only when the</p>
      <p>Second, the patern-based bootstrapping method is not good at values are shared, we can find one patern with another by
bootabstracting the first type of contextual paterns mentioned in Stepstrapping. However, one person cannot have multiple fathers and
1 in Section3.2. Human experts can easily generalize paterns with rarely have multiple masters. By now, we have only investigated
av: + prep: structure that are composed of diferent verbs but thethe patern-based bootstrapping method in Sect3io.2non the
atsame pronoun into one super-categoprrye:p: For example, it is tribute in the taskaotrfibute discovery . The preliminary of this
reasonable for domain experts to find such common feature of pat- method lies in the fact that there should exist some entities that
terns like拜[ 為 $Title], [擢為 $Title], [舉為 $Title] and etc., could be extracted by multiple paterns, which makes it possible
all of which mean [promote tToit$le], and generalize them into to find new paterns through patern generation. However, for the
patern [ 為 $Title] (i.e., [to $Title]). However, the bootstrapping task ofrelation extractio(en.g., father-sonandmaster-discipl)e, since
method tends not to capture such abstraction of paterns and thereea-ch relation pair is unique in the text, there is not a patern shown
fore generates a subset of certain ground-truth paterns, whichin Table4 that shares even a single common instance that could
pulls down the evaluation metric. also be extracted by other paterns in the same category, which
makes it hard for instance-level bootstrapping method to work.
# of iterations P@3
1 0.667
2 0.667
3 1.000
4 0.667
5 0.667
6 0.333
7 0.333
8 0.333
9 0.000
10 0.000
5 RELATED WORK such as part-of-speech tags or entity types in order to extract a
In this section, we survey three main topics related to our worlka.rge collection of tuple-like informa2t3i,o3n1,[34, 40]. Hearst
patWe point out the uniqueness of our study. terns likeN“P such asNP, NP, andNP” were proposed to
automatically acquire hyponymy relations from text d1a4]t.aL[ater,
ma5.1 Chinese NLP Techniques chine learning experts designed the Snowball systems to propagate
in plain text for numerous relational pater1n,4s,[43]. Google’s
hTough robust NLP techniques are often language independent,
most of the NLP techniques for Chinese have their own specific Biperpedia [12, 13] generatedE-A paterns (e.g., “ A of E” and “E’s
characteristics and thus advantages compared to those for EnglAis”h) from users’fact-seeking queries by replacing entity wEit”h “
and noun-phrase atribute withA”“. ReNoun [35] generatedS-A-O
or other Latin-based languages. Unlike Latin-based languages,
Chinese languages do not use white-space as the natural delimitepra.terns (e.g., “ S’s A is O” and “O, A of S,”) from human-annotated
hTerefore word segmentation is always a key precursor for lan- corpus on a predefined subset of the atribute namePs.atty used
guage processing tasks in Chines5e, [6, 8, 30, 41, 42]. Moreover, parsing structures to generate relational paterns with semantic
due to lack of morphological features, Chinese Part-of-Speech (POSt)ypes [29]. The recent MetaPAD generated “meta paterns” based
tagging and dependency parsing are harder than Latin-based laonn- content qualit1y6][. However, all the paterns in the above
guages like English. Leit al[.24] proposed joint models for Chinese methods can only serve for English. Due to the fundamental
gramPOS tagging and dependency parsing tasks. As neural methods mar diference between classical Chinese and English, the above
have recently achieved significant performance with large amountmethods no longer work for our problem. Our work has made the
of annotated data, many deep neural models for Chinese POS tagfir-st step in the field of patern-based entity retrieval that is
suitable for classical Chinese text.
ging and dependency parsing have been develop9e,d2[1, 33]. Zero</p>
    </sec>
    <sec id="sec-13">
      <title>Pronoun (ZP) resolution is also a challenging problem in Chinese.</title>
      <p>Existing studies utilize heuristic rules to resolve ZP issues in Ch5i.-3 Neural Entity Information Extraction
nese [10, 36]. Recently, supervised neural approaches have been hTe task of named entity recognition (NER) is typically cast as a
vastly explored on many diferent task7s, 3[7–39]. sequence labeling problem and solved by supervised learning
mod</p>
      <p>However, all these studies focus on modern Chinese text. Clase-ls. Diferent from statistical learning methods like conditional
ransical Chinese is important but was paid litle atention, as the ma- dom fields (CRF) [ 19], end-to-end neural network methods have
jority of precious historical literature was writen in classical Chbei-en proposed to solve the proble1m5,[17, 20, 27]. Recent work
nese hundreds or even thousands of years ago. Doing NLP tasks onused language model as another type of supervised sign25a]l,s [
classical Chinese would be more dificult than modern Chinese, be- which can help models obtain more contextual knowledge from
cause of the very diferent writen style and very limited annotated corpus without extra annotation. Open source pre-train models
data. Our approach was the first to curate a person entity profilinghave been widely used in the entity information extraction tasks.
dataset for the studies and we proposed a patern-based
bootstrap</p>
      <p>hTey improved the performance with models pre-trained on
masping method to extract the atributes of historical actors in anciesnitve corpora. Note that all the models need large amount of
annoChina. The extracted high quality profile information would facil- tated data, while unfortunately we don’t have in classical Chinese.
itate history studies. Digital humanities need more atention from
both humanity studies and digital technologies.</p>
      <p>6 CONCLUSIONS
5.2 Textual Pattern-based Entity Information In this paper, we aimed at extracting and profiling historical
ac</p>
      <p>Extraction Techniques tors from classical Chinese literature. We addressed the challenge
Given a text corpus, textual paterns leverage statistics (e.g., highof low-resource language. In this study, we employed domain
exfrequency) by replacing words, phrases, or entities with symbolsperts to curate a ground-truth dataset of person entities and their
ACKNOWLEDGMENTS
hTe authors would like to thank all the funds for their support. This
work was supported in part by Notre Dame Research 2019 Global</p>
    </sec>
    <sec id="sec-14">
      <title>Gateway Faculty Research Award (RGG) FY19RGG03 373106 and NSF Grant CCF-1901059.</title>
      <p>profile atributes and relations (e.gc.o,urtesy name,place of birth,ti- (2001).
tle,father-son, master-discipl)ewith handcrafted paterns from two [20] Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya
Kawakami, and Chris Dyer. 2016. Neural architectures for named entity
books,Historical RecordsandBook of Han. We developed a patern- recognition. InNAACL.
based bootstrapping approach to extract the information with[21a] Haonan Li, Zhisong Zhang, Yuqi Ju, and Hai Zhao. 2018. Neural character-level
dependency parsing for Chinese. IAnAAI.
very small number of seed paterns. Experimental results showed [22] Kaiyuan Li. 2000. The Establishment of Han Dynasty and the Liu Bang Group:
the efectiveness and limitations of the iterative method.
large plain-text collectionPsr.oIcneedings of the fith ACM conference on Digital
libraries. ACM, 85–94.
[2] Liang Cai. 2014. Witchcraft and the Rise of the First Confucian EmApilrbea. ny,</p>
      <p>
        NY: State University of New York Pres(s2014).
[3] Liang Cai. 2019. Confucians, Social Networks, and Bureaucracy: Donghai Men
and Models for Success in the Western Han China (206–9BCE). Early China
(
        <xref ref-type="bibr" rid="ref26">2019</xref>
        ).
[4] Andrew Carlson, Justin Beteridge, Bryan Kisiel, Burr Setles, Estevam R
Hruschka, and Tom M Mitchell. 2010. Toward an architecture for never-ending
language learning. AInAAI.
[5] Pi-Chuan Chang, Michel Galley, and Christopher D Manning. 2008. Optimizing [30] Fuchun Peng, Fangfang Feng, and Andrew McCallum. 2004. Chinese
segmentation and new word detection using conditional random fiePlrdosc.eIendings
of the 20th international conference on Computational Ling.uAisstsiocsciation for
      </p>
      <p>Computational Linguistics, 562.
[31] Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi
Choudhury, and Michael Gamon. 2015. Representing text for joint embedding of text
and knowledge bases. IPnroceedings of Empirical Methods on Natural Language
Processing. 1499–1509.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>A Study of the Meritorious Military CBlaeisjsin</article-title>
          .g: San lian shu dian(
          <year>2000</year>
          ). [23]
          <string-name>
            <given-names>Qi</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Meng</given-names>
            <surname>Jiang</surname>
          </string-name>
          , Xikun Zhang, Meng Qu, Timothy P Hanraty,
          <string-name>
            <surname>Jing Gao</surname>
          </string-name>
          , and Ji-
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>awei Han</source>
          .
          <year>2018</year>
          .
          <article-title>Truepie: Discovering reliable paterns in patern-based informa-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>tion extraction</article-title>
          .
          <source>PIrnoceedings of the 24th ACM SIGKDD International Conference</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>on Knowledge Discovery &amp; Data Minin. gACM</source>
          ,
          <fpage>1675</fpage>
          -
          <lpage>1684</lpage>
          . [24]
          <string-name>
            <given-names>Zhenghua</given-names>
            <surname>Li</surname>
          </string-name>
          , Min Zhang, Wanxiang Che, Ting Liu, Wenliang Chen, and Haizhou
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Li</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Joint models for Chinese POS tagging and dependency parsing</article-title>
          . In
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>for Computational Linguistics</source>
          ,
          <fpage>1180</fpage>
          -
          <lpage>1191</lpage>
          . [25]
          <string-name>
            <surname>Liyuan</surname>
            <given-names>Liu</given-names>
          </string-name>
          , Jingbo Shang, Xiang Ren, Frank Fangzheng Xu, Huan Gui, Jian Peng,
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>and Jiawei</given-names>
            <surname>Han</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Empower sequence labeling with task-aware neural lan-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>neural networks</article-title>
          .
          <source>AInCL</source>
          .
          <volume>778</volume>
          -
          <fpage>788</fpage>
          . [8]
          <string-name>
            <given-names>Xinchi</given-names>
            <surname>Chen</surname>
          </string-name>
          , Xipeng Qiu, Chenxi Zhu, Pengfei Liu, and
          <string-name>
            <given-names>Xuanjing</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>Proceedings of Empirical Methods on Natural Language Process</source>
          .
          <fpage>in11g97</fpage>
          -
          <lpage>1206</lpage>
          . [9]
          <string-name>
            <given-names>Yufei</given-names>
            <surname>Chen</surname>
          </string-name>
          , Sheng Huang,
          <string-name>
            <surname>Fang</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Junjie Cao, Weiwei Sun, and Xiaojun
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Wan</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Neural Maximum Subgraph Parsing for Cross-Domain Semantic</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Dependency</given-names>
            <surname>Analysis</surname>
          </string-name>
          .
          <source>InProceedings of the 22nd Conference on Computational</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Natural Language</surname>
          </string-name>
          <article-title>Learni n</article-title>
          .g562-
          <fpage>572</fpage>
          . [10]
          <string-name>
            <surname>Susan</surname>
            <given-names>P</given-names>
          </string-name>
          <string-name>
            <surname>Converse and Martha Stone</surname>
          </string-name>
          <article-title>Palmer. 2P0r0o6n.ominal anaphora resolu-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          tion in Chines.eCiteseer. [11]
          <string-name>
            <surname>Crespigny R. De</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>A Biographical Dictionary of Later Han to the Three</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Kingdoms</surname>
          </string-name>
          (
          <fpage>23</fpage>
          -220 Ad).
          <source>Leiden: Brill</source>
          (
          <year>2007</year>
          ). [12]
          <string-name>
            <surname>Rahul</surname>
            <given-names>Gupta</given-names>
          </string-name>
          , Alon Halevy, Xuezhi Wang, Steven Euijong Whang, and Fei Wu. [37]
          <string-name>
            <surname>Qingyu</surname>
            <given-names>Yin</given-names>
          </string-name>
          , Yu Zhang, Weinan Zhang, and Ting Liu.
          <year>2017</year>
          . Chinese Zero Pro-
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          2014.
          <article-title>Biperpedia: An ontology for search applicatiVoLnDs</article-title>
          .
          <source>B 7</source>
          ,
          <issue>7</issue>
          (
          <year>2014</year>
          ), 505
          <article-title>- noun Resolution with Deep Memory NetworkE</article-title>
          .MInNLP. Association for Com-
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          516. putational Linguistics, Copenhagen, Denmark,
          <fpage>1309</fpage>
          -
          <lpage>131h8t</lpage>
          .tps://doi.org/10. [13]
          <string-name>
            <surname>Alon</surname>
            <given-names>Halevy</given-names>
          </string-name>
          , Natalya Noy, Sunita Sarawagi,
          <source>Steven Euijong Whang, and Xiao</source>
          <volume>18653</volume>
          /v1/
          <fpage>D17</fpage>
          -1135
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Yu</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Discovering structure in the universe of atribute nameWs</article-title>
          . WInW. [38]
          <string-name>
            <surname>Qingyu</surname>
            <given-names>Yin</given-names>
          </string-name>
          , Yu Zhang, Weinan Zhang, Ting Liu, and William Yang Wang.
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>International World Wide Web Conferences Steering Commitee</source>
          ,
          <fpage>939</fpage>
          -
          <lpage>949</lpage>
          .
          <article-title>Zero Pronoun Resolution with Atention-based Neural NetwoPrrko</article-title>
          .cIenedings [14]
          <string-name>
            <surname>Marti</surname>
            <given-names>A</given-names>
          </string-name>
          <string-name>
            <surname>Hearst</surname>
          </string-name>
          .
          <year>1992</year>
          .
          <article-title>Automatic acquisition of hyponyms from large text cor- of the 27th International Conference on Computational Ling</article-title>
          .uAisstsiocsciation for
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Computational</given-names>
            <surname>Linguistics</surname>
          </string-name>
          , Santa Fe, New Mexico, USA,
          <fpage>13</fpage>
          -
          <lpage>h2t3</lpage>
          .tps://www.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          pora.
          <source>InProceedings of the 14th conference on Computational linguistics-Volume</source>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          2. Association for Computational Linguistics,
          <fpage>539</fpage>
          -
          <lpage>545</lpage>
          . aclweb.org/anthology/C18-1002 [15]
          <string-name>
            <surname>Zhiheng</surname>
            <given-names>Huang</given-names>
          </string-name>
          , Wei Xu,
          <string-name>
            <given-names>and Kai</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Bidirectional LSTM-CRF models for</article-title>
          [39]
          <string-name>
            <surname>Qingyu</surname>
            <given-names>Yin</given-names>
          </string-name>
          , Yu Zhang,
          <string-name>
            <surname>Wei-Nan</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Ting Liu, and William Yang Wang.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <article-title>sequence tagginga</article-title>
          .
          <source>rXiv preprint arXiv:1508</source>
          .
          <year>01991</year>
          (
          <year>2015</year>
          ).
          <year>2018</year>
          .
          <article-title>Deep Reinforcement Learning for Chinese Zero Pronoun Resolution</article-title>
          . In [16]
          <string-name>
            <surname>Meng</surname>
            <given-names>Jiang</given-names>
          </string-name>
          , Jingbo Shang, Taylor Cassidy, Xiang Ren, Lance M Kaplan, Timo- ACL.
          <article-title>Association for Computational Linguistics</article-title>
          , Melbourne, Australia,
          <fpage>569</fpage>
          -
          <lpage>578</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>thy P Hanraty</surname>
          </string-name>
          , and Jiawei Han.
          <year>2017</year>
          .
          <article-title>Metapad: Meta patern discovery from https://doi</article-title>
          .org/10.18653/v1/
          <fpage>P18</fpage>
          -1053
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <article-title>massive text corpora</article-title>
          .
          <source>IPnroceedings of the ACM SIGKDD International</source>
          Confer- [40]
          <string-name>
            <surname>Wenhao</surname>
            <given-names>Yu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Zongze</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Qingkai</given-names>
            <surname>Zeng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Meng</given-names>
            <surname>Jiang</surname>
          </string-name>
          .
          <year>2019</year>
          . Tablepedia: Au-
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <source>ence on Knowledge Discovery &amp; Data Minin.AgCM</source>
          ,
          <fpage>877</fpage>
          -
          <lpage>886</lpage>
          .
          <source>tomating PDF Table Reading in an Experimental Evidence Exploration and An</source>
          [17]
          <string-name>
            <surname>Tianwen</surname>
            <given-names>Jiang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Tong</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Bing</given-names>
            <surname>Qin</surname>
          </string-name>
          , Ting Liu, Nitesh V Chawla, and
          <source>Meng alytic System. I nThe World Wide Web Conference . ACM</source>
          ,
          <volume>3615</volume>
          -
          <fpage>3619</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>Jiang</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>The Role of “Condition”: A Novel Scientific Knowledge Graph Rep-</article-title>
          [41]
          <string-name>
            <surname>Qi</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Xiaoyu Liu, and
          <string-name>
            <given-names>Jinlan</given-names>
            <surname>Fu</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Neural networks incorporating dic-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <article-title>resentation and Construction ModePlr.oIcneedings of the 25th ACM SIGKDD tionaries for chinese word segmentatioAnA</article-title>
          .
          <source>IAnI.</source>
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <source>International Conference on Knowledge Discovery &amp; Data Mi.nAinCgM</source>
          ,
          <fpage>1634</fpage>
          - [42]
          <string-name>
            <surname>Wei</surname>
            <given-names>Zhou</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aiping</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Hua Shu, Reinhold Kliegl, and
          <string-name>
            <given-names>Ming</given-names>
            <surname>Yan</surname>
          </string-name>
          .
          <year>2018</year>
          . Word
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          1642.
          <article-title>segmentation by alternating colors facilitates eye guidance in Chinese reading</article-title>
          . [18]
          <string-name>
            <given-names>Martin</given-names>
            <surname>Kern</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>The ”biography of Sima Xiangru” and the question of the Memory &amp; cognition46,</article-title>
          <issue>5</issue>
          (
          <year>2018</year>
          ),
          <fpage>729</fpage>
          -
          <lpage>740</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <article-title>Fu in Sima Qian's Shiji</article-title>
          .
          <source>Journal of the American Oriental Socie1t2y3</source>
          ,
          <volume>2</volume>
          (
          <year>2003</year>
          ), [43]
          <string-name>
            <surname>Jun</surname>
            <given-names>Zhu</given-names>
          </string-name>
          , Zaiqing Nie, Xiaojiang Liu, Bo Zhang, and
          <string-name>
            <surname>Ji-Rong Wen</surname>
          </string-name>
          .
          <year>2009</year>
          . Stat-
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          303-
          <fpage>316</fpage>
          . Snowball:
          <article-title>a statistical approach to extracting entity relatioWnsWhiWps.</article-title>
          .
          <source>In [19] John Laferty, Andrew McCallum, and Fernando CN Pereira</source>
          .
          <year>2001</year>
          .
          <string-name>
            <surname>Conditional</surname>
            <given-names>ACM</given-names>
          </string-name>
          ,
          <fpage>101</fpage>
          -
          <lpage>110</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>