<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Finding Topic-centric Identified Experts based on Full Text Analysis</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Hanmin</forename><surname>Jung</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Information Service Research Lab</orgName>
								<orgName type="institution">KISTI</orgName>
								<address>
									<country key="KR">Korea</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mikyoung</forename><surname>Lee</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Information Service Research Lab</orgName>
								<orgName type="institution">KISTI</orgName>
								<address>
									<country key="KR">Korea</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">In-Su</forename><surname>Kang</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Information Service Research Lab</orgName>
								<orgName type="institution">KISTI</orgName>
								<address>
									<country key="KR">Korea</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Seung-Woo</forename><surname>Lee</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Information Service Research Lab</orgName>
								<orgName type="institution">KISTI</orgName>
								<address>
									<country key="KR">Korea</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Won-Kyung</forename><surname>Sung</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Information Service Research Lab</orgName>
								<orgName type="institution">KISTI</orgName>
								<address>
									<country key="KR">Korea</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Finding Topic-centric Identified Experts based on Full Text Analysis</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">E42277028BB16EF5E31E2EBE77192E37</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T03:55+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper shows a method for finding topic-centric experts from open access metadata and full text documents. Topic-centric information including experts is served on OntoFrame, which is a Semantic Web-based academic research information service supporting R&amp;D activities. URI schemebased OntoFrame provides three entity pages: topic, person, and event. 'Persons by Topic' in topic page lists up topic-centric identified experts. SPARQL query is used to retrieve them from RDF triple store through backward chaining.</p><p>We gathered CiteSeer open access metadata and full text documents with the amount of about 110,000 papers. Using about 160,000 abundant topics, On-toFrame now serves topic-centric identified experts and relevant information acquired by full text analysis.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Finding experts is useful in such cases: seeking for consultants, collaborators, and speakers. It also provides a source of information to supplement or complement academic sources including metadata <ref type="bibr" target="#b6">[7]</ref>, thus, receives increased attention in recent years. However, identification resolution is not considered significantly even though this research topic mainly deals with persons. Many studies concentrate only on string-based person names <ref type="bibr" target="#b0">[1]</ref> [2] <ref type="bibr" target="#b4">[5]</ref>  <ref type="bibr" target="#b5">[6]</ref>. Semantic Web can be one of competent solutions for managing identified experts through underlying URI scheme. Another consideration is to guarantee reliability on the results of the task. Deep analysis based on full text documents is needed in that topically-classified documents in high precision ensure finding the right persons for each topic. On the basis of these considerations, we propose an experts-finding method based on identity resolution and full text analysis, and further extract topic-centric information such as 'Topic Trends' and 'Institutions by Topic'. Chapter 2 indicates several previous studies. Chapter 3 explains how to acquire topic-centric information based on a Semantic Web Framework.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Studies</head><p>The sources for finding experts are various: documents, programs, e-mails, databases, citations, communities and so on. Finding expertise information from e-mails with four simple binary association methods was proposed by <ref type="bibr" target="#b0">[1]</ref>. <ref type="bibr" target="#b4">[5]</ref> investigated the expertise of users and experts by combining information retrieval techniques. However, such e-mails and communities are insufficient to extract the right experts for a specific topic because they give clues about only relationship and context. An experts-finding study based on full text documents related with persons and on a set of terms in them was introduced <ref type="bibr" target="#b1">[2]</ref>. It extracts similar experts by measuring similarity between term vectors. However, it is not able to indicate which topics are related with experts, but only provides a bundle of persons as the results. ExpertFinder <ref type="bibr" target="#b5">[6]</ref> recommends persons with a lot of documents for a given topic. A keyword phrase is used to retrieve relevant documents, but the results are unsatisfactory because reasonable candidates are not listed within the top three or four candidates in most cases. Its slow response time and incorrect relationship between persons and documents are also problems. Another interesting study, performed by <ref type="bibr" target="#b7">[8]</ref>, introduced three innovative points: document authority in terms of their PageRanks, co-occurrence model, and multiple levels of associations between experts and query terms. It finds variants in experts' names for identity recognition, but failed to identify different persons with the same name uniquely. OntoFrame is a Semantic Web-based service which provides academic research information for supporting R&amp;D activities <ref type="bibr" target="#b2">[3]</ref>. Its two main components are URI server and OntoReasoner (inference engine). The latter interacts with user interfaces through receiving SPARQL queries and returning XML results. We introduce SPARQL rather than inflexible SQL because it is easy to construct queries with only knowledge on ontology schema. OntoReasoner also expands knowledge in ways of forwardchaining inference. The URI server has several functions: ontology schema parsing and loading, DB schema creation, ontology instance loading, and RDF triple generation as shown in figure <ref type="figure" target="#fig_0">1</ref>. When a new instance is inserted into the server, triple generator makes triples for the instance. The triples are then stored in RDF triple store, and further would be referred by OntoReasoner. OntoFrame distinguishes from other academic research information services such as CiteSeer (http://citeseer.ist.psu.edu/) and Google Scholar (http://scholar.google.com/) because it provides information acquired by inference beyond metadata. 'Persons by Topic', 'Topic Trends', and 'Social Network' are representative information served by OntoFrame.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Acquiring Topic-Centric Information</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">OntoFrame: an Academic Research Information Service</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Data Gathering and Refining</head><p>The Open Archives Initiative (OAI, http://www.openarchives.org/) develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content. CiteSeer (http://citeseer.ist.psu.edu/oai.html) also supports OAI, and thus allows downloading its own open access metadata which includes title, authors, publication year and so on. Identity resolution is an obligatory task for transforming string-based data to semantic data <ref type="bibr" target="#b3">[4]</ref>. Various forms of institution names in the metadata are mapped to a set of normalized institution names<ref type="foot" target="#foot_2">1</ref> , e.g. "U. Kassel" and "University of Kassel." We also identify different persons with the same name. There are a few metadata fields available for distinguishing authors such as affiliation, e-mail, and co-authors. It is possible to determine whether two authors with the same name are different or not using their affiliations and e-mails. However, affiliation and e-mail fields are not obligatory in many cases including CiteSeer metadata. Co-authorship information plays an important role in resolving identity problems because co-author field is usually filled up in metadata, and further many authors maintain co-authorship relation regardless of affiliation change. We consider two authors with the same name as the identical person when they share the identical co-author(s), otherwise they remain as different persons. 'sameAs' relation would compensate the short coverage of this method based on co-authorship. All of their information, including papers and topics, will be merged as one when we connect two authors with 'sameAs' relation later.</p><p>After identity resolution, we assign URI for each entity; for example, paper "A Bayesian Multiple Models Combination Method for Time Series Prediction" with 'http://www.kisti.re.kr/isrl/ResearchRefOntology#ART_00000000000000458673', topic "markov model" with 'http://www.kisti.re.kr/isrl/ResearchRefOntology#TOP_00000000000000046687' and person "V.</p><p>Petridis" with 'http://www.kisti.re.kr/isrl/ResearchRefOntology#PER_00000000000000128292'.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Topic Extraction</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Fig. 2. Workflow of Topic Extraction based on Full Text Documents</head><p>Extracting topics from papers is the most basic task to acquire topic-centric experts.</p><p>As full text documents as well as metadata of CiteSeer are available, we use the documents. Extracted topics are assigned to each paper. The followings explain the stages of the extraction as shown in figure <ref type="figure">2</ref>; First, indexer extracts index terms from a given document. Second, the terms are matched with topic keywords in topic index DB 2 . Third, successfully matched terms are ranked by the following algorithms, and then we select top-n (currently, five) topics for the input document.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Finding Experts</head><p>Many factors can be considered for finding experts: the number of papers, impact factor of sources, the degree of citations, hub persons in social network and so on. Currently, we take into account only the number of papers for several reasons. A great portion of source field in CiteSeer open access metadata has no information. Citation information also may be incomplete when compared with CiteSeer service page. We also do not consider social network because prosperous co-authorship with other persons does not always guarantee specialty on a topic. Acquiring topic-centric experts on OntoFrame requires querying to RDF triple store based on DBMS. 'Persons by Topic' is retrieved directly from the database through SPARQL query (shown as follows) and automatic SPARQL-to-SQL conversion. The query searches papers (?accomplishment) of which topic area is topicTerm, and then retrieves authors (?person) of the papers. Figure <ref type="figure" target="#fig_2">3</ref> shows backward chaining flow starting from topicTerm. 'createdByPerson' is one of derived properties induced by user-defined inference rules. It reduces the distance of backward path to find 'Persons by Topic' in ways that go through directly to 'Person' rather than without passing through 'CreatorInfo' (the dotted line in figure <ref type="figure" target="#fig_2">3</ref>). After retrieving persons, OntoReasoner performs postprocessing for ranking them by descending order of the number of their own papers. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusions</head><p>We gathered 114,337 papers <ref type="bibr">(2000 ~ 2006)</ref> from CiteSeer open access metadata. They include 161,853 persons and 17,093 institutions. 160,568 topic keywords <ref type="foot" target="#foot_3">3</ref> were extracted from titles and abstracts. Average consuming time for extracting maximum 5 topics from a paper is about 1.6 seconds. Within three seconds are enough to generate an entity page including 'Persons by Topic' on OntoFrame<ref type="foot" target="#foot_4">4</ref> . </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. OntoFrame Architecture</figDesc><graphic coords="2,143.57,399.98,324.85,225.73" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>( 1 )</head><label>1</label><figDesc>Index term list: The kth document } m index terms.indicates the ith index term in the document.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>( 3 )2</head><label>3</label><figDesc>TF (Term Frequency) of index term: is the term frequency of index term t in document . Topic keyword and topic are the same in this study. Successfully matched index terms are also a subset of topic keywords because the terms are always a member of topic keywords in topic index DB.(4) TF of the index term matched with topic keyword: is the term frequency of the index term t found in topic keyword DB.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Backward Chaining Path for Finding 'Persons by Topic' (Experts for a Topic)</figDesc><graphic coords="6,143.33,204.68,325.33,132.07" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Fig. 4 .</head><label>4</label><figDesc>Fig. 4. Example of Topic Page for 'markov model' ('Persons by Topic' shows ranked experts.) This paper showed a method for finding topic-centric identified experts from CiteSeer open access metadata and full text documents. Topic extraction based on full text analysis enables to construct topically-classified papers, and inference makes propagation to persons and institutions. SPARQL query retrieves URI-based 'Persons by Topic' from RDF triple store. Our future work includes introducing usability test to</figDesc><graphic coords="7,143.57,234.98,324.85,323.53" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">2nd International ExpertFinder Workshop (FEWS2007)</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">2nd International ExpertFinder Workshop (FEWS2007)</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_2">currently, about 14,000</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_3">Simple and compound nouns were extracted automatically and filtered manually by human dictionary constructors.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_4">The whole system will appear in Poster/Demo Track of ISWC2007.2nd International ExpertFinder Workshop (FEWS2007)</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Finding Experts and Their Details in E-mail Corpora</title>
		<author>
			<persName><forename type="first">K</forename><surname>Balog</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rijke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 15 th International Conference on World Wide Web</title>
				<meeting>the 15 th International Conference on World Wide Web</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Finding Similar Experts</title>
		<author>
			<persName><forename type="first">K</forename><surname>Balog</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rijke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 30 th Annual International ACM SIGIR Conference</title>
				<meeting>the 30 th Annual International ACM SIGIR Conference</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Semantic Web-Based Services for Supporting Voluntary Collaboration among Researchers Using an Information Dissemination Platform</title>
		<author>
			<persName><forename type="first">H</forename><surname>Jung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Sung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Park</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Data Science Journal</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Construction of Semantic Web-based Knowledge Using Text Processing</title>
		<author>
			<persName><forename type="first">H</forename><surname>Jung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Sung</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 4 th International Conference on Information Technology : New Generations</title>
				<meeting>the 4 th International Conference on Information Technology : New Generations</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Finding Experts in Community-Based Question-Answering Services</title>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Croft</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Koll</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 14 th ACM International Conference on Information and Knowledge Management</title>
				<meeting>the 14 th ACM International Conference on Information and Knowledge Management</meeting>
		<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Enterprise Expert and Knowledge Discovery</title>
		<author>
			<persName><forename type="first">D</forename><surname>Mattox</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Maybury</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Morey</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 8 th International Conference on Human-Computer Interaction</title>
				<meeting>the 8 th International Conference on Human-Computer Interaction</meeting>
		<imprint>
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Expert Finding Systems for Organizations: Domain Analysis and the DEMOIR Approach</title>
		<author>
			<persName><forename type="first">D</forename><surname>Yimam</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Beyond Knowledge Management: Sharing Expertise</title>
				<imprint>
			<publisher>MIT Press</publisher>
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">The Open University at TREC 2006 Enterprise Track Expert Search Task</title>
		<author>
			<persName><forename type="first">J</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rüger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Eisenstadt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Motta</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 15 th Text REtrieval Conference</title>
				<meeting>the 15 th Text REtrieval Conference</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
