<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Knowledge Based High-Frequency Question Answering in AliMe Chat</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Shuangyong</forename><surname>Song</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Alibaba Group</orgName>
								<address>
									<postCode>100102</postCode>
									<settlement>Beijing</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Chao</forename><surname>Wang</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Alibaba Group</orgName>
								<address>
									<postCode>100102</postCode>
									<settlement>Beijing</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Haiqing</forename><surname>Chen</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Alibaba Group</orgName>
								<address>
									<postCode>100102</postCode>
									<settlement>Beijing</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Knowledge Based High-Frequency Question Answering in AliMe Chat</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">D00201C90DB8F77046737BD9E86ED983</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T14:07+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Knowledge Graph</term>
					<term>E-commerce Chatbot</term>
					<term>Lucene Index</term>
					<term>Text Matching</term>
					<term>Multiple Answers Generation</term>
					<term>Index of Subgraph</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In our online chatbot serving, AliMe Chat, we design a knowledge graph based approach for solving high-frequency chitchat question answering. For meeting the demand of high Question per Second (QPS) of online system, we design several solutions to escape from questioning a large knowledge graph, details of those solutions are given in this paper, and the experimental results show the effectiveness and efficiency of them.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>AliMe Chat, presented by Alibaba in 2015, has provided services for billions of users and now on average with ten million of users access per day <ref type="bibr" target="#b3">[4]</ref>. AliMe service can be roughly classified into assistance service, customer service and chatting service, and the main idea of this paper is to improve ability of AliMe Chat with knowledge graph. A seq2seq based re-ranking and generation method has been proposed in <ref type="bibr" target="#b0">[1]</ref> to chat with AliMe users with general topics, such as greetings, jokes and other kinds of chitchats. However, fact-based and knowledge-based chatting ability of AliMe is still weak, and for improving those kinds of ability of AliMe and meanwhile increasing the diversity of chatting answers, we design a question-answering framework.</p><p>Since online servicing has a very high demand of QPS, our framework is just oriented to high-frequent questions or entities in historical user question logs. We design several methods: 1) for high-frequent questions, we try to find which of them can be answered with knowledge graph and those 'question-answer' pairs are indexed by Lucene for online matching and re-ranking; 2) for high-frequent entities, we extract subgraphs from complete knowledge graph, and differing from some related work which do this step in real time <ref type="bibr" target="#b2">[3]</ref>, we prepared those subgraphs offline for reducing online processing. We classify questions with those entities to 3 kinds: questions with an unambiguous entity, questions with an ambiguous entity and questions with multiple entities. For different kinds of questions, we design different answer generation methods.</p><p>In the following parts of this article, we will illustrate the details of the proposed framework, and report the experimental results. Figure <ref type="figure" target="#fig_0">1</ref> shows the proposed framework, we will introduce it in detail with two parts: question-answering with high frequent questions and question-answering with high frequent entities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Question-answering with high frequent questions</head><p>Text clustering is utilized to cluster users' question log and representative questions in top ranking clusters are extracted as high frequent questions. On the clustering step, we utilize a self-adapting clustering method proposed in <ref type="bibr" target="#b4">[5]</ref> and set a strict threshold to ensure that questions in a cluster are very similar to each other. On the representative question extraction step, we consider cluster-level keywords, question length and distance to cluster center as three factors, and a question with more keywords, average question length and nearest distance to cluster center has more chance to be chosen as the final representative one. A classic knowledge graph based question-answering technique <ref type="bibr" target="#b5">[6]</ref> is for obtaining answers of each representative questions, and all questions with knowledge graph based answers are collected into a 'question-answer' index with Lucene and in the online part, we first use Lucene to roughly recall top K candidates and then use a deep learning based text similarity model <ref type="bibr" target="#b1">[2]</ref> to exactly rank those candidates to get the final answer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Question-answering with high frequent entities</head><p>Entities with high frequency are extracted from user question log, and then we categorize those entities to unambiguous entities and ambiguous entities. For unambiguous entities, we can answer questions such as "where was Joe Hisaishi born" easily with classic knowledge graph based question-answering technique <ref type="bibr" target="#b5">[6]</ref>. And for questions with ambiguous entities, such as "you know Carlos, right?", we can answer this question with "you mean the Brazilian football player?" or "you mean the Brazilian football player or Carlos the Jackal?". Especially, for a user question that contains more than one entity, such as a question "Who is older, Louis Koo or Andy Lau?", the method proposed in <ref type="bibr" target="#b6">[7]</ref> is referred in our work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Experiments</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Dataset and Parameter Settings</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Datasets:</head><p>Question log: we collect anonymous online user question log from Nov. 1, 2018 to Dec. 31, 2018. This dataset contains 125.9 million user questions and with merging duplicate ones we can obtain 44.9 million diverse user questions.</p><p>High frequent questions: occurrences of 1.26 million questions are greater than or equal to 5, which are chosen as high frequent questions (HFQs).</p><p>QA pairs: we input each HFQ into knowledge-based QA system, and if we can get an answer, we take this 'HFQ-answer' as a QA pair. We totally obtained 53,187 QA pairs.</p><p>High frequent entities: 25,682 high frequent entities (HFEs), more than 10. Subgraph of entities: we extract all subgraphs of HFEs from Wikipedia.</p><p>Text matching training data: for creating enough dataset for training the text matching model, we implement following strategies: we randomly select 10,000 user questions from chatbot log, and top 15 candidates for each of them can be obtained with Lucene index of all question log. Then 8 service experts labeled those candidates with right/wrong, and some examples are shown in Table <ref type="table">1</ref>. Serious data unbalance shows in above labeled data, since just 14.3% candidates are labeled as right ones (positive samples). For balancing the data, we randomly extract about 20% candidates, which are labeled as wrong, of whole dataset as negative samples.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Parameter settings:</head><p>When choosing top K candidates from Lucene index, we empirical set K = 20, which is a number not too small to recall the real answer and not too huge to be quickly processed in the text matching step.</p><p>For the text-matching threshold, we check each decimal in (0,1) with an interval of 0.1, with respect to F1-value final answer obtaining, and a threshold of 0.85 can help obtain the best F1-value.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Experimental Results</head><p>The main purpose of the proposed framework is to increase the coverage of AliMe Chat, and reduce the 'no-answer' situations. With the real online testing, the coverage of AliMe Chat in the whole Alime Assist has been increased from 4.18% to 4.87%, which realizes a 16.5% increase. In Fig. <ref type="figure" target="#fig_1">2</ref>, we show several examples of online results of the proposed approach. In left sub-figure, the first user question is a frequent asked question and it can be answered with knowledge graph, so Lucene has indexed it. The second question contains a entity of 'East Hope' which has no ambiguity in knowledge graph and we can answer it with 'East Hope Group is a company' or 'East Hope Group is in electrolytic aluminum industry' etc. In right sub-figure, the first user question contains an ambiguous name 'James', which is also a 'half' person name. We can give user some choices of this ambiguous half name, and if then user choose one of the choices and ask some related question, we can continue to answer it. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Future Works</head><p>This paper is only a preliminary work. Knowledge based multi-turn conversation in ecommerce chatbot will be a key point in our future work, and the utilization of knowledge based named entity disambiguation models, especially that on abbreviation disambiguation, are predictable to be a helpful way of getting better responses.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. The proposed framework.</figDesc><graphic coords="2,124.70,168.40,345.89,133.70" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Online AliMe Chat severing with proposed framework.</figDesc><graphic coords="4,145.55,147.40,144.60,256.48" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">Copyright ©</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2019" xml:id="foot_1">for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">AliMe Chat: A Sequence to Sequence and Rerank based Chatbot Engine</title>
		<author>
			<persName><forename type="first">M</forename><surname>Qiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ACL&apos;17</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Modelling Domain Relationships for Transfer Learning on Retrieval-based Question Answering Systems in Ecommerce</title>
		<author>
			<persName><forename type="first">J</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Qiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">WSDM</title>
		<imprint>
			<biblScope unit="volume">2018</biblScope>
			<biblScope unit="page" from="682" to="690" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Formal query generation for question answering over knowledge bases</title>
		<author>
			<persName><forename type="first">H</forename><surname>Zafar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Napolitano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ESWC&apos;18</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">AliMe assist: an intelligent assistant for creating an innovative e-commerce experience</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">L</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Qiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Jin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CIKM&apos;17</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Summarizing Microblogging Users with Existing Welldefined Hashtags</title>
		<author>
			<persName><forename type="first">S</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Meng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zheng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Asian Language Processing</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="111" to="125" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Knowledgebased question answering</title>
		<author>
			<persName><forename type="first">F</forename><surname>Rinaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dowdall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hess</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mollá</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Schwitter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kaljurand</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">KES&apos;03</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Joint entity and relation linking for question answering over knowledge graphs</title>
		<author>
			<persName><forename type="first">M</forename><surname>Dubey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Banerjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chaudhuri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><surname>Earl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ISWC 2018</title>
				<imprint>
			<biblScope unit="page" from="108" to="126" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
