<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">An Automatic Procedure for Generating Datasets for Conversational Recommender Systems</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Alessandro</forename><surname>Suglia</surname></persName>
							<email>alessandro.suglia@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">University of Bari</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Claudio</forename><surname>Greco</surname></persName>
							<email>claudiogaetanogreco@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">University of Bari</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Pierpaolo</forename><surname>Basile</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">University of Bari</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Giovanni</forename><surname>Semeraro</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">University of Bari</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Annalina</forename><surname>Caputo</surname></persName>
							<email>annalina.caputo@adaptcentre.ie</email>
							<affiliation key="aff1">
								<orgName type="department">ADAPT Centre</orgName>
								<orgName type="institution">Trinity College Dublin</orgName>
								<address>
									<settlement>Dublin</settlement>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Aldo</forename><surname>Moro</surname></persName>
						</author>
						<title level="a" type="main">An Automatic Procedure for Generating Datasets for Conversational Recommender Systems</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">220ED003110C5C8EC663CD313D5A1DA0</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T20:28+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Conversational Recommender Systems assist online users in their information-seeking and decision making tasks by supporting an interactive process with the aim of finding the most appealing items according to the user preferences. Unfortunately, collecting dialogues data to train these systems can be labour-intensive, especially for data-hungry Deep Learning models. Therefore, we propose an automatic procedure able to generate plausible dialogues from recommender systems datasets.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>People have information needs of varying complexity, which can be solved by an intelligent agent able to answer questions formulated in a proper way, eventually considering user context and preferences. Conversational Recommender Systems (CRS) assist online users in their information-seeking and decision making tasks by supporting an interactive process <ref type="bibr" target="#b0">[1]</ref> with the aim of finding the most appealing items according to the user preferences.</p><p>Unfortunately, collecting dialogues data required for the training phase of these systems can be really labour-intensive, especially for the latest data-hungry Deep Learning models. For this reason, synthetic dialogue datasets can be extremely useful in order to bootstrap effective dialogue systems able to support a goal-oriented conversation with the user. Therefore, we propose an automatic procedure able to generate plausible dialogues directly from well-known recommender systems datasets exploiting data coming from the Linked Open Data Cloud and contextual information related to the user.</p><p>Given a user u and his/her set of binary preferences, we trained a decision tree from the user u preferences expressed towards items represented using Linked Open Data binary features extracted from the Wikidata<ref type="foot" target="#foot_0">3</ref> knowledge base. In particular, each pair predicate-object is represented as a binary feature which is 1 if</p><p>The dialogue datasets generated from MovieLens 1M and MovieTweetings datasets can be found at: http://github.com/swapUniba/ConvRecSysDataset. The source code of the automatic procedure for generating conversational recommender systems datasets will be released when the paper will be accepted.</p><p>the knowledge base contains the triple (item, predicate, object), 0 otherwise. The considered predicates are wdt:P57 (director), wdt:P161 (cast member), wdt:P136 (genre)<ref type="foot" target="#foot_1">4</ref> . The dialogue generation procedure is an iterative algorithm which is executed until all user preferences have been used. At each step of the dialog generation procedure, a top-n list of items composed by positive and negative items is generated by randomly choosing from positive and negative preferences of the given user u. Then, paths from the root of the decision tree to the consistently classified examples are exploited to generate a sequence of questions, randomly chosen according to a binomial distribution over the item features, to elicit user preferences. Depending from the percentage of positive items in the top-n, a "refine" step is triggered which extends the dialog with additional questions that lead to a list of suggestions which contains only positive items.</p><p>Table <ref type="table" target="#tab_0">1</ref> shows a conversation generated by applying the designed procedure to the well-known MovieLens 1M recommender systems dataset. In the first part of the conversation, utterances with the aim of introducing the user are generated by exploiting the contextual information included in the dataset. In this work we have proposed an automatic procedure able to generate synthetic dialogue datasets starting from well-known datasets in the recommender system field. The presented procedure is completely generic and can be applied on any dataset containing binary user preferences and whose items have a corresponding identifier in the Linked Open Data Cloud.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Conversation generated from the MovieLens 1M dataset. To help reading, Wikidata ids have been replaced with the corrisponding entities.</figDesc><table><row><cell cols="2">Agent Utterance</cell></row><row><cell>user</cell><cell>Hey</cell></row><row><cell>bot</cell><cell>What is your name?</cell></row><row><cell>user</cell><cell>I am Sofia</cell></row><row><cell>bot</cell><cell>How old are you?</cell></row><row><cell>user</cell><cell>I am 25</cell></row><row><cell>bot</cell><cell>What is your occupation?</cell></row><row><cell>user</cell><cell>sales/marketing</cell></row><row><cell>bot</cell><cell>Which are your favourite movies?</cell></row><row><cell>user</cell><cell>I love Speed</cell></row><row><cell>bot</cell><cell>I suggest you Erin Brockovich Witness Pocahontas Four Weddings and a Funeral Anna and the King. Do you like them?</cell></row><row><cell>user</cell><cell>I hate them</cell></row><row><cell>bot</cell><cell>What are your favourite directors?</cell></row><row><cell>user</cell><cell>I like John Waters</cell></row><row><cell>bot</cell><cell>What are your favourite actors?</cell></row><row><cell>user</cell><cell>I like Josef Sommer Wade Williams Marg Helgenberger Jeroen Krabb</cell></row><row><cell>bot</cell><cell>What genres do you like?</cell></row><row><cell>user</cell><cell>My favourite genres are teen film romance film biographical film</cell></row><row><cell>bot</cell><cell>I suggest you Erin Brockovich Witness Ever After Simply Irresistible Hairspray. Do you like them?</cell></row><row><cell>user</cell><cell>I like them</cell></row><row><cell>bot</cell><cell>I am glad that you like them</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0">http://www.wikidata.org</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_1">The prefix wdt stands for http://www.wikidata.org/prop/direct/</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Improving recommender systems with adaptive conversational strategies</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mahmood</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ricci</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 20th ACM conference on Hypertext and hypermedia</title>
				<meeting>the 20th ACM conference on Hypertext and hypermedia</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="73" to="82" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
