<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Definition of User Profiles based on the YAGO Ontology</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Silvia</forename><surname>Calegari</surname></persName>
							<email>calegari@disco.unimib.it</email>
							<affiliation key="aff0">
								<orgName type="department">DISCo</orgName>
								<orgName type="institution">Universitá degli Studi di Milano-Bicocca</orgName>
								<address>
									<addrLine>vle. Sarca 336/14</addrLine>
									<postCode>20126</postCode>
									<settlement>Milano (</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Gabriella</forename><surname>Pasi</surname></persName>
							<email>pasi@disco.unimib.it</email>
							<affiliation key="aff0">
								<orgName type="department">DISCo</orgName>
								<orgName type="institution">Universitá degli Studi di Milano-Bicocca</orgName>
								<address>
									<addrLine>vle. Sarca 336/14</addrLine>
									<postCode>20126</postCode>
									<settlement>Milano (</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Definition of User Profiles based on the YAGO Ontology</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">4727F9BFC5E904CBB023352DB1CD21A4</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T09:02+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this work, we consider the problem to personalize user's Web searches for improving the quality of results. To this aim, we propose a preliminary methodology that allows to define a conceptual user profile based on the YAGO ontology.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>To overcome the limitations of the "one size fits all" approach of search engines, personalized approaches to Information Retrieval have been proposed. Personalized search is based both on modeling the user's context by a user's profile that represents the user's preferences, and on the definition of processes that exploit the knowledge represented in the user profile to tailor the search outcome to users' needs. The accurate definition of a user profile plays then a central role to define effective approaches to personalization. Up to now, bags of words, and vectors or graph-based representations have been mainly used to define users' profiles. To improve the quality of the knowledge represented in user profiles, in some recent works, external knowledge sources (i.e., WordNet <ref type="bibr" target="#b1">[2]</ref>, or Web directories as the ODP <ref type="bibr" target="#b6">[7]</ref> and the Yahoo! Web directory <ref type="bibr" target="#b3">[4]</ref>) have been considered to represent in a more structured way the user context. The use of an ontology allows to give a more structured and expressive knowledge representation with respect to the above mentioned approaches <ref type="bibr" target="#b2">[3]</ref>. A user profile is defined based on the analysis of the information characterizing the user's interests and preferences. Elicitation of user's interests and preferences is not the focus of the research reported in this paper. Numerous approaches have been proposed in the literature to this aim <ref type="bibr" target="#b5">[6]</ref>. Our objective is the formal definition of an ontological user profile based on the use of YAGO as an external reference knowledge. YAGO <ref type="bibr" target="#b7">[8]</ref> is a general purpose ontology containing several millions of entities and facts. Only the entities and facts which match the appropriate user's interests are used to derive the user profile. To this aim a preliminary methodology aimed at the extraction of the appropriate fragment of the YAGO ontology has been defined. Then the main objective of the research reported in this paper is (assuming to have the user's interests specified as a bag of words) both to extract the portion of YAGO useful for the definition of a user profile, and to organize it into a coherent ontological representation expressed by a language such as RDFS.</p><p>The novelty of the research reported in this paper is to employ the YAGO <ref type="bibr" target="#b7">[8]</ref> ontology as external reference knowledge for building a conceptual user profile. YAGO is a general purpose ontology, and it consists of more than 1.7 million entities (like books, movies, . . . ), and over 14 million facts about them. The triple &lt; entity, relation, entity &gt; is called a f act. All facts are grouped in 99 relations such as FamilyNameOf,subClassOf,actedIn, etc. To build the YAGO -based user profile, our methodology is articulated in four phases as sketched in Fig. <ref type="figure" target="#fig_0">1</ref>. Our investigation addresses the methodology defined for extracting the sub-part of YAGO related to the user's interests. To produce a bag of words that represents the user's interest, we have decided to consider a set of documents residing on the user's PC related to his/her topical preferences. We have then analyzed them with standard IR techniques in order to extract meaningful terms, i.e. terms representative of the user's preferences (interest-terms). Thus, we have developed a strategy that allows to semantically extract the sub-YAGO ontology starting from the interest-terms. A similar approach has been reported in <ref type="bibr" target="#b0">[1]</ref>, where a set of documents are indexed, and the obtained index terms are semantically linked to a network of concepts, but to the different aim of the automatic construction of hypertexts. Moreover in <ref type="bibr" target="#b0">[1]</ref>, the external knowledge resource is a taxonomy, i.e. the ACM classification that defines a hierarchy of topics where each topic is a concept. Instead YAGO is an ontology with millions of entities (concepts plus individuals), and several relations with a different semantics; to this aim several rules have to be defined related to the possible relations for associating the index-terms with the right entities. Phase 1. This first phase consists in individuating the user's knowledge that has to be considered to extract the user's interests. In this specific case, we analyzed a set of documents collected by the user and stored in his/her personal computer.</p><p>Phase 2. Each document is analyzed in two steps: (1) document preprocessing and (2) term frequency analysis, respectively. In the first step, standard text processing techniques are applied such as stop-word removal, and stemming.</p><p>In the second step the open source software Lucene is used for indexing the documents; a standard normalized Tf-Idf formula is adopted to compute the index terms weights, but other approaches will be taken into account for further investigations.</p><p>Phase 3. The outcome of the previous phase is a list of interest-terms with index terms weight over a given threshold α. To enrich the knowledge of the user's interests a process of knowledge extraction from the YAGO ontology is performed. This process is articulated in 3 sub-phases: (1) individuals and facts extraction, (2) direct concepts extraction and expansion to their child nodes, and (3) addition of new synonyms, respectively. The fact extraction process is logically divided into non-taxonomic and taxonomic relations extraction. Non-taxonomic relations are defined in the YAGO ontology over entities which are referred to as individuals, while taxonomic relations can hold between an individual and its parent concept (class), or between two concepts. As previously stated a fact is a triple defined as &lt; entity, relation, entity &gt;, so the first step of the algorithm consists in locating the facts where an interest-term (obtained based on phase 2) matches with an entity. The outcome of this step is constituted by a set of facts and entities extracted from YAGO. From the analysis of the taxonomic relation different considerations have been made. In fact, it is possible that some facts based on the taxonomic relation SubClassOf do not report useful information with respect to the considered term. For example, the fact &lt; relational database systems, SubClassOf, database systems &gt; contains the knowledge that "relational database systems" are sub-class of "database systems", which is not very informative. For this reason, in case of a direct concept match, the algorithm takes all the first level children (individuals) of the matched concept. Referring to the previous example, for the term "relational database systems" the following instances -MySQL, Oracle, PostgreSQL etc., will be added in the user's profile.</p><p>A possibility is that the term is not found in YAGO. When this happens, our algorithm analyzes WordNet for checking the existence of synonyms. In case multiple synset exist, we adopt the methodology used by the authors of YAGO, where the most probable synset (i.e., the synset having higher probability of occurrence) is selected.</p><p>Phase 4. At this step, the resulting personal ontology is converted into the ontological language RDFS<ref type="foot" target="#foot_0">1</ref> , and its graph portions are visualized by the ontology editor Protégé <ref type="bibr" target="#b4">[5]</ref>. In the conversion process, every relation is exported into a single RDFS file, and afterwards all files are gathered into a single schema representing the personal ontology. A problem may arise related to the quality of the obtained profile. In fact, by the process of index analysis and facts extraction from YAGO, an unavoidable amount of noise is gathered into the final ontology. A first and preliminary solution was to manually improve its quality by using the Protégé editor.</p><p>Preliminary Experiment A preliminary analysis has been made for defining a conceptual user profile based on the YAGO ontology by considering 35 documents. This set of documents is related to several user interests such as art, literature, music, cinema and work. The second phase of the proposed methodology has been conducted by using Lucene, and the threshold for scoring was set to 0.5, thus obtaining 306 terms. At the end of phase 3, 578950 entities (i.e., individual plus concepts) have been counted in the user profile, where 11 new terms are added from WordNet. The last phase has consisted in converting the obtained profile in an ontological language (i.e., RDFS) in order to improve it by, for example, reducing noise or adding relations between terms. For example, if a term was related to the actor "Brad Pitt", all the corresponding information defined in YAGO are extracted such as categories it belongs to (i.e., Action film actors, American male model, . . . ), as well as its non-taxonomic relations (i.e., hasWonPrize, produced, actedIn, . . . ). By editing this ontological profile in Protégé, the user is allowed to delete non relevant information, for example the ones related to Brad Pitt as a model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Conclusions and Future Works</head><p>The aim of this work is to create user profiles based on the YAGO general purpose ontology, to the aim of Web search personalization. We believe that ontologies are worth to be investigated as an interesting support for structuring knowledge in user profiles. To this aim, in a first preliminary application, the documents collected by a user are considered as the evidence of his/her interests. We plan to improve the methodology presented in this paper by following three main directions: the first is to automatically remove some noise from the profile (e.g., by deleting non relevant entities and the relations involving them), the second is to add new relations and facts between terms not defined in YAGO, and the last one is to consider other sources of information (i.e., past user's queries) to extract user's interests. Furthermore we will test the obtained YAGO-based user profile for expanding the user's queries to contextualize his/her Web searches.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Phases of profile building</figDesc></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://www.w3.org/TR/PR-rdf-schema</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Tachir: A tool for automatic construction of hypertexts for information retrieval</title>
		<author>
			<persName><forename type="first">M</forename><surname>Agosti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Melucci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Crestani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">RIAO</title>
				<editor>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Funck-Brentano</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Seitz</surname></persName>
		</editor>
		<imprint>
			<publisher>CID</publisher>
			<date type="published" when="1994">1994</date>
			<biblScope unit="page" from="338" to="358" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">A content-collaborative recommender that exploits wordnet-based user profiles for neighborhood formation. User Model</title>
		<author>
			<persName><forename type="first">M</forename><surname>Degemmis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lops</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Semeraro</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">User-Adapt. Interact</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="217" to="255" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Ontology-based personalized search and browsing</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gauch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chaffee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pretschner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Web Intelligence and Agent Systems</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">3-4</biblScope>
			<biblScope unit="page" from="219" to="234" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Yahoo! as an ontology: Using yahoo! categories to describe documents</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Labrou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">W</forename><surname>Finin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CIKM</title>
				<imprint>
			<date type="published" when="1999">1999</date>
			<biblScope unit="page" from="180" to="187" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">The knowledge model of protege-2000: Combining interoperability and flexibility</title>
		<author>
			<persName><forename type="first">N</forename><surname>Noy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Fergerson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Musen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">EKAW</title>
				<imprint>
			<date type="published" when="2000">2000. 2000</date>
			<biblScope unit="page" from="17" to="32" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Issues in personalizing information retrieval</title>
		<author>
			<persName><forename type="first">G</forename><surname>Pasi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Intelligent Informatics Bulletin</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="3" to="6" />
			<date type="published" when="2010-12">December 2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Ontological user profiles for representing context in web search</title>
		<author>
			<persName><forename type="first">A</forename><surname>Sieg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mobasher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">D</forename><surname>Burke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Web Intelligence/IAT Workshops</title>
				<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="91" to="94" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Yago: A large ontology from wikipedia and wordnet</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Suchanek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kasneci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Weikum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Web Semantic</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="203" to="217" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
