<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Assessing Content Diversity in Medical Weblogs</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Kerstin</forename><surname>Denecke</surname></persName>
							<email>denecke@l3s.de</email>
							<affiliation key="aff0">
								<orgName type="department">L3S Research Center</orgName>
								<address>
									<addrLine>Appelstr. 9a</addrLine>
									<postCode>30167</postCode>
									<settlement>Hannover</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Assessing Content Diversity in Medical Weblogs</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">8F7B9EBA11382E74917A3E57BEC93870</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-19T16:19+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, we are considering weblogs focusing on medicine and health. Given this general topic, posts can deal with diseases, medical treatments, medications and the like to which in turn different aspects can be considered. Within a diversity-aware medical search engine knowledge about this kind of diversity can support grouping of search results into diversity dimensions covered by a text. Therefore, a method is introduced for studying topic and content diversity. The approach bases on information extraction technologies and domain knowledge and is applied to a set of medical weblog posts. The diversity of topics described in this dataset will be studied in more detail.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Weblogs, or blogs, have become a popular way to share experiences and information, to engage in discussions and to form communities. In this paper, we are considering medical weblogs, i.e., blogs whose topical focus is health and medicine. The medical blogging community consists of healthcare professionals writing about their daily practice and current issues related to medicine on the one hand, and of patients providing information about health related issues and experiences on living with medical conditions on the other hand. Therefore, the diversity of content varies a lot. It is of particular interest to find possibilities to automatically analyse the content diversity in these blog posts to enable better search and retrieval facilities. Consider the following scenario:</p><p>A person searching with the query breast cancer might be interested in the disease itself, in possible treatments, in medications and so on. Other dimensions are for example the content type (e.g., experience vs. information), the author (e.g., physician, patient) or the polarity. A user might be interested in experiences of persons living with breast cancer, or in experiences of physicians who treat patients suffering from breast cancer etc.</p><p>Current medical weblog search engines such as Medlogs or Medworm 1 only list search results matching query keywords in a flat list. Sometimes, results can be restricted to posts of specific author groups (e.g., physician, patients). But, additional content dimensions such as aspects considered or expressed sentiment remain hidden in the posts. Having methods in hand that allow to analyse and detect the different dimensions would help to present search results according to these dimensions. The work presented in this paper targets towards analysis of diversity in medical weblog posts. The focus is on analysing the diversity of content, in particular the topic diversity and the diversity of the aspect considered.</p><p>The remainder of the paper is structured as follows. Section 2 presents related work. Then, we give an overview on relevant diversity dimensions in medical texts (section 3). Then, methods and measures to study topic diversity in medical texts are introduced (section 4). This is applied to a real world data set in section 5. The paper finishes with conclusions and remarks on future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>Diversity of search results in text retrieval has been considered as problem of result diversification, i.e., finding the right balance between having more relevant results of the 'correct' intent and having more diverse results in the top positions. Existing approaches to this problem combine measures of diversity and similarity to improve the recommendation diversity. In order to improve user satisfaction, the top N search results are either ranked by diversity <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b6">7]</ref> or diversified by clustering them according to the different diversity dimensions covered <ref type="bibr" target="#b11">[12]</ref>.</p><p>Clustering of search results is be done within the search engines Newssift and Fairspin<ref type="foot" target="#foot_0">2</ref> . Within Newssift, content from major news and business sources are grouped into high-level categories such as Business Topic, Organizations, Place, Person and Theme. It leverages semantic technology, but relies also on manual work. FairSpin collects all the latest news and opinion from across the Web and organizes them by political bias based on community votes.</p><p>In faceted classification, a set of category hierarchies is built <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b10">11]</ref>. These capture the different facets, i.e., dimensions or features, relevant to the collection. Facets can for example be derived based on WordNet or Wikipedia as it is shown be <ref type="bibr" target="#b5">[6]</ref>. The work presented in <ref type="bibr" target="#b9">[10]</ref> describes several faceted search algorithms and employs collaborative filtering and personalization techniques to customize the search interface to each user.</p><p>In contrast to existing work, we intend to determine and analyse diversity in more detail and automatically. Instead of providing hierarchical browsing facilities, a potential application would be the grouping of search results according to diversity dimensions. Presenting different topical aspects to a user would enable him to see different aspects of his query at the same instant and to get deeper insights into all the facets of the topic under consideration. Furthermore, we are focusing on medical weblog posts since this is a very interesting domain from which many people can benefit. For the medical domain sophisticated ontologies exist. We will show how these can support analysis of topic diversity.</p><p>In this paper, we study the topic diversity of texts, considering a document as a mixture of topics. Topic models as introduced by Blei et al. also consider documents as mixture of topics <ref type="bibr" target="#b2">[3]</ref>. Each topic is represented by a set of keywords together with a probability indicating the word's contribution to the topic. In our approach, topics are considered medical concepts that are provided by a medical ontology. While within clustering approaches such as Latent Dirichlet Allocation <ref type="bibr" target="#b2">[3]</ref> automatic labeling of clusters is difficult, the use of domain knowledge helps to label document clusters and describe topics with concrete concept names.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Diversity Dimensions in Medical Weblogs</head><p>In this paper, we focus on processing medical weblogs written in English. A medical weblog deals with diseases, medical treatments, medications or health care politics, i.e., its main topic is medicine or health care. They can be differentiated with regard to their author into blogs written by health care professionals and written by non-healthcare professionals <ref type="bibr" target="#b4">[5]</ref>.</p><p>Diversity in medical weblogs can be considered along several dimensions or facets, including Time, Author, Location, Resource, Topic, Aspect considered, Information Content, or Information Type. An example for values for these dimensions identified for a medical post is given by Figure <ref type="figure" target="#fig_0">1</ref>. In this paper, we examine diversity of topic and the diversity of the aspect considered in more detail.</p><p>Given a topic T , topic diversity concerns the correlation between T and other topics that are frequently used together with T . In this paper, a topic is considered to be a medical concept, in particular a UMLS concept (see section 4.1) dealing with diseases, medical treatments or medications. A topic is highly diverse, if it cooccurs with a large number of other concepts. Even if posts have the same topic, different medical aspects can be considered. While one post rather talks about the treatment of asthma, others may rather focus on its symptoms. We consider this diversity dimension as diversity of the aspect considered. A post is highly diverse, when its content covers different semantic groups (e.g., symptoms, drugs, procedures).  We are now presenting a method and measures to study diversity in topic and aspect considered. First, medical content is extracted (see section 4.1) which is then exploited for analysing the diversity (see sections 4.2, 4.3 ). Since even in a medical weblog, posts can be completely unrelated to health (e.g., dealing with holiday plans or the weather), we exclude posts that are unrelated to medicine and health in a preprocessing step using a classifier based on language models (LingPipe classifier<ref type="foot" target="#foot_1">3</ref> ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>[...] Most parents are understandably reluctant to start their kids on psychiatric medications. While caution is indicated, I feel such a rigid rejection of their use in children and teenagers is a mistake and may deprive your child of a therapy that could help a lot. Used appropriately and wisely, with ongoing follow-up and psychotherapy, I have seen such medications make a huge positive difference lifting a child's mood and improving the quality of his</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Extracting Medical Content</head><p>The medical content of a post is determined by extracting medical concepts, i.e., concepts describing diagnoses, treatments or medications. For this purpose, an existing mapping algorithm (MMTx) is exploited. MMTx, or its java implementation MetaMap <ref type="bibr" target="#b1">[2]</ref> is based on natural language processing techniques and maps natural language to concepts of the UMLS Metathesaurus.</p><p>The Unified Medical Language System (UMLS<ref type="foot" target="#foot_2">4</ref> ) is a biomedical terminology that consists of around 1.7 Million biomedical concepts and integrates several biomedical vocabularies such as SNOMED CT or MeSH. Each concept defined in the UMLS is assigned to at least one of the 135 specified semantic types. The semantic types are grouped in turn into 15 main groups. The concept atrial fibrillation belongs for example to the semantic types Finding and Pathologic Function that in turn belong to the main group Disorders.</p><p>In order to determine UMLS concepts for a text, MMTx works in several steps. First, it parses a text into paragraphs, sentences, phrases, lexical elements and tokens. From the resulting phrases a set of (lexical) variants is generated. For the phrases, candidate concepts from the UMLS Metathesaurus are retrieved and evaluated. The best candidates are organized into a final mapping in such a way as to best cover the text. Out of the possible candidates provided by MetaMap, the first proposal out of the highest ranked candidate set is selected as list of extracted concepts which is exploited in our diversity study.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Determining the Diversity of the Aspect Considered</head><p>The diversity of the aspect considered is studied by means of the semantic types and main groups of the concepts extracted from a post and by applying formulas (2) and (3). The formulas determine the proportion of different semantic types (main groups) contained in a text on the overall number of possible types (or groups). A value close to 1 indicates a high diversity while a value close to 0 corresponds to a small diversity.</p><p>In addition, we consider the concept diversity div concept , i.e. how many different concepts co d a post contains related to the number of extracted concepts co (Formula (1)). A value close to zero indicates that the same concepts are used several times, i.e. the diversity of concepts is small. For example, a post can contain only a few different concepts, but these concepts belong to different semantic types and main groups, i.e., it deals with several aspects. In this case, the concept diversity is small, but a high diversity in semantic types and main groups would be detected. Furthermore, we calculate the diversity for single main groups Disorders, Procedures and Chemicals and Drugs by considering in formula (1) only the frequency of concepts of one of these three main groups. For studying diversity of topics, we have to identify relevant topics in the data collection under consideration. We assume that a post deals with a main topic to which in turn a set of subtopics is related. Therefore, the concept representation of texts determined by the method described in section 4.1 is used to (1) determine topics of posts and (2) to identify co-occurring concepts to study topical diversity. For this purpose, we follow a three-step approach:</p><p>1. Calculation of concept frequencies per post, 2. Selection of the most frequent concept as "topic concept", 3. Identification of relevant concepts related to the topic concept.</p><p>Concept frequencies are determined by calculating the number of mentions of a concept in a text. Then, the most frequent concept within the post under consideration is selected. We consider this concept as topic-describing concept (also referred to as "topic") for a post. Assuming that the topic of a medical weblog post deals with a disease, a clinical procedure or a medication, a topic concept needs belong to one of the UMLS main groups Disorder, Procedure or Chemicals and Drugs.</p><p>Given a document collection, we receive a list of topics together with the documents for which this topic has been determined. Documents with a joint topic concept are considered in the next step when concept cooccurrence pairs are determined for each topic concept. A pair is considered relevant when it occurs at least twice in one document and in at least ten documents of this topic. This results in a set of concept pairs for each topic concept.</p><p>The concept pairs provide information on how diverse a topic is: If for one topic a large amount of pairs can be identified within one document collection, this topic is highly diverse. In case a topic-describing concept cooccurs only with a few other concepts frequently, its diversity is rather low, i.e. only a few additional aspects are of interest to this topic.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Application Example</head><p>Having information about different diversity dimensions on hand allows for creating more sophisticated search engines and presentation of results, including grouping search results according to the aspect considered (diagnosis, treatment, medication...). Consider the following scenario: A woman just diagnosed with breast cancer enters breast cancer into her favourite search engine and receives results grouped into the clusters disease, medications, treatment and the like. This result structure offers her the opportunity to get a general impression on the different facets of the topic. She can now decide in what kind of information she is interested most. The introduced measures of diversity can be exploited in ranking, e.g. ranking more diverse texts higher.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Experiments</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Material</head><p>For the analysis in this paper, a set of different medical weblogs written in English and all their posts have been crawled. The resulting data set consists of 5480 posts (patient-written (4343), physician-written (1137)). The weblogs have been selected randomly by collecting addresses of weblogs from the two (medical) weblog search engines Medworm and Medlogs. For comparison reasons we decided to use as an additional data set articles from Yahoo! Encyclopedia<ref type="foot" target="#foot_3">5</ref> . 2777 articles on different topics related to illnesses, treatments and drugs have been collected from this resource. In the following sections, the results are reported when the introduced methods are applied to these data sets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Diversity of Aspect Considered</head><p>When comparing the concept diversity of the three data sets (see Table <ref type="table" target="#tab_1">1</ref>) it can be seen that the diversity value of the encyclopedia data set is significantly smaller than the one for the weblog datasets. In contrast, the diversity of semantic types and main groups is for the encyclopedia data set much higher. We can conclude that the concepts extracted from the encyclopedia data set belong to more different semantic types, i.e. a larger variety of thematic aspects is covered. In the weblog datasets the considered aspects are more restricted. Nevertheless, the values for div type are quite small for all three data sets. This shows that from the 135 possible UMLS semantic types only one fourth or one third is covered.</p><p>The concept diversity for the categories Disorders, Procedures and Chemicals &amp; Drugs are similar for all three data sets. In particular for the concept diversity in Procedure concepts, only a small diversity could be ascertained. For the Encyclopedia articles the diversity in disorder-related concepts is higher than the one for blogs. This shows that the spectrum of covered diseases in these articles is higher than in blog posts. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Diversity of Topics</head><p>For studying the diversity of topics in our data collections, we determine the number of different topics that is identified for each collection as well as the average number of co-occurring concepts per topic and collection. Taking into account the different data set sizes, the largest number of different topics is given by the collection of physician-written posts (385 topics) and the encyclopedia articles (787 topics). Nevertheless, the topics in the encyclopedia articles are more diverse: In average 60 co-occurring concepts are determined for each topic. For topics in patient-written posts 43 co-occurring concepts were identified and for topics in physician-written posts only 24 concepts were related. This might be due to the different text length (encyclopedia articles are longer than weblog posts), but is also an indicator to the larger topical diversity in encyclopedia articles.</p><p>To study the topical diversity in more detail, we selected the topic Diabetes Mellitus, Non-Insulin-Dependent. From the patient-written blogs, 788 cooccurring concepts could be detected that belong to 97 different semantic types. In this dataset, the topic Diabetes is highly diverse -a lot of different aspects are considered. In contrast, from the Yahoo! Encyclopedia dataset 259 related topics of 58 different semantic types were extracted and only 60 related concepts of 25 different types were identified in the physician-written posts. We can conclude, that with respect to this topic, the patient-written dataset contains additional information, i.e. covers additional aspects. Figure <ref type="figure" target="#fig_2">2</ref> shows some of the aspects and concepts extracted from the physician-written posts and related to Diabetes. In this paper, topic and thematic diversity in medical weblogs has been considered. We described how results of entity extraction together with a domain ontology can be exploited for studying these aspects and applied the methods to a data collection of medical texts. In order to apply the introduced method to documents of other domains, the underlying domain knowledge has to be replaced or the topics need to be discovered by alternative technologies. In future work, we will work towards this direction to come up with a more general approach. Nevertheless, the approach presented here can be considered as baseline when testing other topic detection algorithms in the medical domain.</p><p>Having information about topic diversity offers the opportunity to get insights into relevant aspects related to a topic that can for example be presented to a user. Furthermore, it can help to improve the retrieval of documents: Documents that consider the same topic but different aspects can be recommend to a user.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Example for diversity in medical texts</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>( 1 )( 2 )( 3 )</head><label>123</label><figDesc>div concept = co d co div type = types 135 div group = groups 15 4.3 Determining the Topic Diversity</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Related Concepts for the topic "Diabetes" from physician-written posts</figDesc><graphic coords="7,169.35,541.69,276.60,77.55" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>/her life.[...]    </figDesc><table><row><cell>Time: January 2005</cell><cell>Author: Dr. J</cell></row><row><cell>Resource: Blog</cell><cell>Topic: Depression</cell></row><row><cell>Aspect considered: Medication</cell><cell>Information type: experience</cell></row><row><cell cols="2">Information content: 50% informative, 50% affective</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 .</head><label>1</label><figDesc>Diversity values when considering all semantic types.</figDesc><table><row><cell>Measure</cell><cell cols="3">Patient Physician Encyclopedia</cell></row><row><cell>div concept</cell><cell>0.76</cell><cell>0.70</cell><cell>0.52</cell></row><row><cell>divtype</cell><cell>0.23</cell><cell>0.27</cell><cell>0.37</cell></row><row><cell>div group</cell><cell>0.68</cell><cell>0.78</cell><cell>0.87</cell></row><row><cell>div concept (DISO)</cell><cell>0.44</cell><cell>0.45</cell><cell>0.53</cell></row><row><cell>divconcept(P ROC)</cell><cell>0.18</cell><cell>0.21</cell><cell>0.20</cell></row><row><cell cols="2">div concept (CHEM ) 0.30</cell><cell>0.33</cell><cell>0.26</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">http://www.newssift.com, http://fairspin.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">http://alias-i.com/lingpipe/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">http://www.nlm.nih.gov/research/umls</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_3">http://health.yahoo.com/ency/</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Diversifying search results</title>
		<author>
			<persName><forename type="first">R</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gollapudi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Halverson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ieong</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">WSDM &apos;09</title>
				<imprint>
			<publisher>ACM</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Effective mapping of biomedical text to the umls metathesaurus: The metamap program</title>
		<author>
			<persName><forename type="first">A</forename><surname>Aronson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc AMIA Symp</title>
				<meeting>AMIA Symp</meeting>
		<imprint>
			<date type="published" when="2001">2001</date>
			<biblScope unit="page" from="17" to="21" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Latent dirichlet allocation</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Blei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Y</forename><surname>Ng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">I</forename><surname>Jordan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lafferty</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">JMLR</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Novelty and diversity in information retrieval evaluation</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">L</forename><surname>Clarke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kolla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">V</forename><surname>Cormack</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Vechtomova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ashkan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Büttcher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Mackinnon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGIR &apos;08</title>
				<meeting><address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="659" to="666" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Family medicine meets the blogosphere</title>
		<author>
			<persName><forename type="first">M</forename><surname>Cohen</surname></persName>
		</author>
		<ptr target="http://www.aafp.org/fpm" />
	</analytic>
	<monogr>
		<title level="m">American Academy of Family Physicians. Family Practice Management Web site</title>
				<imprint>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Automatic extraction of useful facet hierarchies from text databases</title>
		<author>
			<persName><forename type="first">W</forename><surname>Dakka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">G</forename><surname>Ipeirotis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICDE &apos;08</title>
				<meeting><address><addrLine>Washington, DC, USA</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE Computer Society</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="466" to="475" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">An axiomatic approach for result diversification</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gollapudi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sharma</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">WWW &apos;09</title>
				<meeting><address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="381" to="390" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Clustering versus faceted categories for information exploration</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Hearst</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Commun. ACM</title>
		<imprint>
			<biblScope unit="volume">49</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="59" to="61" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Representative-based diversity retrieval</title>
		<author>
			<persName><forename type="first">C.-S</forename><surname>Hwang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kuo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Yu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICICIC &apos;08</title>
				<meeting><address><addrLine>Washington, DC, USA</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE Computer Society</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page">155</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Personalized interactive faceted search</title>
		<author>
			<persName><forename type="first">J</forename><surname>Koren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">WWW &apos;08</title>
				<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="477" to="486" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Extending faceted navigation for rdf data</title>
		<author>
			<persName><forename type="first">E</forename><surname>Oren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Delbru</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Decker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ISWC 2006</title>
				<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="559" to="572" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Improving recommendation lists through topic diversification</title>
		<author>
			<persName><forename type="first">C.-N</forename><surname>Ziegler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Mcnee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Konstan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lausen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">WWW &apos;05</title>
				<meeting><address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM Press</publisher>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
