<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">ADOR: A New Medical Dataset for Sentiment-based IR</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Mohammad</forename><surname>Bahrani</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Queen Mary University of London</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Thomas</forename><surname>Roelleke</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Queen Mary University of London</orgName>
								<address>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">ADOR: A New Medical Dataset for Sentiment-based IR</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">912E3003ADB5BEBCA31E2AE756E1D707</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T03:01+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Semantic Retrieval</term>
					<term>Query Analysis</term>
					<term>Language Modelling</term>
					<term>Benchmark</term>
					<term>TREC</term>
					<term>Query Formulation</term>
					<term>Knowledge Representation</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Sentiment analysis has received attention in retrieval applications. Combining opinions such as user feelings with semantics would enhance the performance of these applications, especially when the level of urgency is essential, e.g., medical domain. However, no widely medical benchmark is known for evaluating sentiment-aware IR. In this paper, we create a dataset based on Amazon reviews for medical products and make it publicly available. To assess the compatibility of the benchmark with opinions and concepts we propose a sentiment-aware extension of TF.IDF and apply it to the dataset. This model is derived from linear combinations of sentiment-based TF.IDF score with term-based and conceptual TF.IDF scores. The benchmark could help healthcare organizations to effectively detect, rank and filter the most urgent notifications based on patient's health status, narratives and conditions.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Despite the fact that both sentiment analysis and IR are of importance with regards to medical applications, the work on incorporating sentiments into medical IR is limited, and there is no well-known benchmark established for this task. Many review-based datasets have been released for the task of sentiment analysis such as multidomain Amazon dataset <ref type="bibr" target="#b0">[1]</ref>, INEX social book search <ref type="bibr" target="#b1">[2]</ref> and IMDB dataset of reviews <ref type="bibr" target="#b2">[3]</ref>. However, researchers need a benchmark which primarily takes into consideration the integration of opinions and medical concepts. This is due to the importance of feelings in detecting the level of urgency in medical domain. Moreover, biomedical companies need to analyse customer's general feelings about their products. On the other hand, patients need to know the sentiment of product reviews before buying. Wherefore the examination of sentiments would be beneficial for both buyers and suppliers of medical products.</p><p>In this paper, we address this problem by creating and making available a medical benchmark specifically for the task of opinion-aware retrieval.</p><p>Bio-medical benchmarks consider various pillars of semantics in collections and queries, e.g., terms, concepts and attributes. These semantics would enable data scientists to develop effective models for different tasks, e.g., filtering and classification.</p><p>Several benchmarks have been published to examine CIKM'21: Fourth Workshop on Knowledge-driven Analytics and Systems Impacting Human Quality of Life, November 01-05, 2021, CIKM, Australia m.bahrani@qmul.ac.uk (M. Bahrani); t.roelleke@qmul.ac.uk (T. Roelleke) different IR models with respect to medical applications including OHSUMED <ref type="bibr" target="#b3">[4]</ref>, CLEF-eHealth <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b4">5]</ref>. However, developing a sentiment-focused query-set for a dataset such as OHSUMED is not optimal since documents are generated from medical literature. Although sentiments, e.g., cancer and treatment are included in documents, implications of urgency and feelings e.g., emojis are rarely found. Table <ref type="table" target="#tab_1">1</ref> shows the overview of well-known medical datasets which listed fundamental statistics of their semantic features.</p><p>Sentiment analysis and opinion mining are popular research fields in natural language processing, data science and text mining. They analyse textual contents based on people's opinions, emotions and attitudes <ref type="bibr" target="#b5">[6]</ref>. In this paper, we create a benchmark that consists of a dataset, a query-set and the relevance results. The dataset consists of Amazon reviews for medical products. Additionally, it supports the use of common semantics (terms, concepts and relations) in biomedical retrieval.</p><p>The second contribution of this paper is to apply sentiment-aware models to the dataset. We propose a family of opinion-aware models for ranking medical reviews. These models are semantic instances of a generalizable TF.IDF. The technology of semantic retrieval is of particular importance in medical applications and the integration of semantics with the standard content-based retrieval tools could lead to more intelligent search experiences <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref>. The generalization of TF.IDF towards semantic frameworks is discussed in <ref type="bibr" target="#b8">[9]</ref>. When compared to retrieval systems built upon only bag-of-words, the integrated methods result in more performant question answering (QA) systems with constraint checking abilities. There has been research on developing conceptual models for medical applications <ref type="bibr" target="#b9">[10]</ref> and <ref type="bibr" target="#b10">[11]</ref>. It could be interesting to leverage sentiments and feelings in these applications. By consolidating the methods for modelling opinions and sentiments in medical ranking, we aim to address the deficiencies in different tasks including but not limited to notification filtering and review filtering. In terms of notification filtering, we know that doctors and patients are overloaded with massive health-related data, and it is critical for health organizations to focus on the most important and urgent cases. In this scenario, the detection of urgency is associated with both ranking and acquisition of sentiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Dataset</head><p>Our work contributes to building the grounds for improving medical review filtering through IR. It is the starting point of developing models that could better meet the needs of bio-medical organizations, companies and individual buyers for analysing most critical, positive and negative reviews.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">The ADOR Dataset</head><p>The Amazon Dataset of Reviews (ADOR) is based on reviews from bio-medical Amazon products derived from three super categories which are Medication &amp; Remedies, Diagnostic and Monitoring Tools and Health-Related Books. We have defined a set of sub-category products inherited from the super-categories and subsequently extracted reviews of related top ten items retrieved by Amazon search engine. However, in order to achieve a more balanced dataset in terms of polarity, we ignored items without negative reviews. To make the data easily reusable, we followed two steps. Firstly, we converted the encoding of the contents to UTF-8 and secondly, we defined the schema and the required fields. The essential fields consists of Amazon ASIN number, medical category, star-rating, the title of the review, review text and labels including star-rating and helpful, have been embedded into the dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>#Concepts</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">ADOR Query Set</head><p>We have defined 25 topics based on five purposes. Figure <ref type="figure">3</ref> shows the distribution of queries and number of relevant documents. The five categories of information need are as follows:</p><p>1. The retrieval of positive or negative reviews associated with medical products. 2. Fact-based and non-sentiment-bearing queries which only intend to retrieve medical entities. 3. Ranking the polarity of item-reviews within the sub-categories, e.g. vitiligo cream and flu tablets. 4. Ranking the polarity of item-reviews within the super-categories, e.g. medications or diagnostic tools. 5. The retrieval of extreme (most positive or most negative) reviews given different medical concepts. We used modifiers to give attention to the information need, e.g. Highly negative reviews for books about borderline personality disorder. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Overview of ADOR</head><p>In this section, we briefly present the dataset and provide the statistics of ADOR. Table <ref type="table" target="#tab_3">2</ref> lists the fundamental statistics of the dataset. There are 194790 opinion features and 59442 medical concepts in the dataset which are distributed across 44796 documents. We used VADER lexicon to capture opinions and Meta-Map to bind terms to medical concepts. Figure <ref type="figure">2</ref> presents the distribution of document length and query length. The majority of queries (more than 35%) have a length between 9 and 12 words. More than 50% of documents have between 1 and 20 words, whereas 7% of them are longer than 100 words. The statistics regarding distribution of queries and their relevant documents are shown in Figure <ref type="figure">3</ref>.</p><p>As can be seen, 28% of queries contain 1-60 documents which is the exact same percentage for queries with more than 240 documents. The rest of the queries contain between 60 and 240 relevant documents. We extracted the average document and collection frequencies of semantic types (neutral terms, concepts and opinions) of the ADOR which can be found in Figure <ref type="figure" target="#fig_0">1</ref>. Even though the average document frequency of opinions is high, opinions could significantly impact the retrieval quality due to the nature of reviews. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Application of the Benchmark</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Rationales</head><p>Although the use of human judgments could seem ideal for the generation of gold standards, we developed a generic framework which has some privileges, e.g., it could be easily used to build gold standards for new query sets.</p><p>We provided informative labels, including rating-star, the number of people who found reviews helpful and medical categories of Amazon products when preparing the data. This framework helps to rapidly develop new queries that could be formulated into the provided labels. Considering the example query Why do some customers are happy with books about caffeine addiction and narcissistic personality disorder., the formulated query is : ( Rating= <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5]</ref>, Super-Category=[Books], Sub-Category=[NPD,Caffeine Addiction] ). In other words, any review in the dataset that meets the information needs requested by the formulated query could be selected.</p><p>To evaluate the accuracy of models, one approach would be the use of existing reviews as queries. However, there are two substantial issues with this approach. Firstly, data scientists need to analyse and classify their experimental results based on the query intent, e.g. factbased, binary and explorative queries. The use of reviews as queries is not in line with the nature of query intent. Secondly, reviews are strongly focused on opinions. Therefore, generating a robust query set consists of a balanced combination of concepts, terms and opinions do interfere with the structure of reviews.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Baseline Models</head><p>The focus of this paper is to introduce a dataset for the task of semantic retrieval in the medical domain, sentiment-based and conceptual IR. Therefore, advanced ranking algorithms are the primary baselines. However, the benchmark is also able to be used for the prediction/classification tasks. For example, a review could be considered as a message posted by a patient or a customer. In this case, the evaluation approach is to predict if it is extreme (very negative) and requires attention by an expert, e.g., doctor, nurse or a company member. The other applicable task is notification systems. In this scenario, users post messages and an algorithm needs to decide who (e.g. which doctor, expert) should be notified for analysing the message or responding to it.</p><p>Furthermore, the framework could be employed by data scientists to predict features provided by the dataset such as positive/negative and helpful/not helpful. Baselines could be used such as Neural Network classifier (e.g., Bert or scikit), Bayesian predictor, regression and K-NN (nearest neighbours) to measure the q prediction quality. The KNN classifier could be applied to retrieve the most similar train reviews (e.g., cosine similarity), aggregate evidence and assigns a label to the test review.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Model</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Processing the New Queries</head><p>To confirm the capability of the benchmark with models derived from opinions and concepts, we have developed a naive semantic approach. We briefly describe the methodology and then show the experimental results of comparing the semantic approach with well-known and recent IR methods on ADOR.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.1.">Methodology</head><p>Our approach is to leverage the well-known TF.IDF and capture its semantic extensions which are built upon opinions and/or concepts. To make the formulations readable, we use type-aware 𝑥 functions, e.g. OF (𝑜, 𝑑) is the opinion frequency of opinion 𝑜 in document 𝑑, where CF (𝑐, 𝑑) is the frequency of concept 𝑐 in the document. Let 𝑞 be a query, 𝑑 be a document and let 𝑐 be the collection, the Retrieval Status Value (RSV) of the opinion-aware model is as follows:</p><p>RSVOF.IDF(𝑑, 𝑞, 𝑐) :=</p><formula xml:id="formula_0">∑︁ 𝑜∈𝑡 OF(𝑜, 𝑞) • OF(𝑜, 𝑑) • IDF(𝑜, 𝑐)<label>(1)</label></formula><p>IDF (𝑜, 𝑐) is the Inverse Document Frequency of the opinion 𝑜 in the collection. 𝑡 is a list of all lexical features in lexicon where the sentiment polarity is equal to query polarity. For example, given query Any useless or poor medications for allergy or cold sore., the query polarity is negative, and consequently, the 𝑡 list comprises all negative opinions in the lexicon.</p><p>Let 𝜙 be a medical concept and let IDF (𝜙, 𝑐) be the Inverse Document Frequency weight of the concept, the conceptual extension of TF.IDF is defined as below:</p><formula xml:id="formula_1">RSVCF.IDF (𝑑, 𝑞, 𝑐) := ∑︁ 𝜙∈𝑞 CF (𝜙, 𝑞) • CF (𝜙, 𝑑) • IDF (𝜙, 𝑐)<label>(2)</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.2.">Evaluation</head><p>In this section, we briefly discuss the evaluation results of the propose semantic models, TF.IDF, BM25 and neural ranking models when applied to ADOR. We have trained neural ranking models including KNRM <ref type="bibr" target="#b14">[15]</ref>, DSSM <ref type="bibr" target="#b15">[16]</ref> and arc-I <ref type="bibr" target="#b16">[17]</ref> on ADOR. We performed 5-fold cross-validation where the final fold in each run was considered as the test set. We randomly divided queries into five-folds and repeatedly captured the average of the fivefold-level evaluation results. All neural models were developed using MatchZoo <ref type="bibr" target="#b17">[18]</ref> based on tensorflow with Adam optimizer, batch size 16 and learning rate=0.001. Using the Lucene framework and the Language Modelling with Dirichlet Prior, we retrieved pseudo-relevant documents and subsequently, the top 100 documents were re-ranked by the models. In addition to OF.IDF and CF.IDF, we conducted experiments on linear combinations of opinion-aware TF.IDF with term-based and conceptual TF.IDF using aggregation parameter 𝑤 = 0.5. Concerning concept-based models, we used MetaMap to extract concepts accompanied by their frequencies, semantic types and scores. We counted 'trigger' attributes of MetaMap-outputs to calculate the corresponding frequencies of semantic types.</p><p>Table <ref type="table" target="#tab_4">3</ref> shows the experimental results on ADOR using four metrics including P@5, p@10, NDCG and Mean Average Precision (MAP). We also conducted the paired t-test with 𝑝 &lt; 0.05 to compute the significance of improvements. The isolated OF.IDF and CF.IDF worked better than TF.IDF, BM25 and neural models (KNRM, DSSM, arc-I) while the combination of opinions and concepts received the best results. The interesting finding is that the models based on combinations of opinions with both terms (OF.IDF+TF.IDF) and concepts (OF.IDF+CF.IDF) improved all the measures.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusion</head><p>In this paper, we introduced a new benchmark, namely ADOR which is a subset of Amazon reviews. For our research aim, the dataset allows for bringing and testing sentiment-based IR to medical domain. The corresponding dataset focuses on medical products within three categories including medicine, monitoring tools and healthrelated books. The collection of reviews comes with a structured framework which enables users to automatically generate relevance labels for new topics. Moreover, a query set with relevance results was consolidated into the benchmark. In order to develop this query set, we considered factors such as query intent, sentiment score of query and concept query frequency.</p><p>To measure the suitability of the benchmark for sentiment-based IR, we proposed naive but reproducible opinion-aware models as semantic instances of the generalizable TF.IDF. These models are derived from combinations of sentiment-only TF.IDF with term-only and concept-only TF.IDF. We compared the new approach with well-established and modern retrieval models. Our experiments confirmed that the integration of sentiments with IR improves the quality of ranking with regards to the ADOR dataset. The semantic model based on combination of OF.IDF and CF.IDF achieved the best results against gold standards.</p><p>In conclusion, the ADOR benchmark could help researchers to develop and evaluate opinion-aware retrieval models. These models would benefit companies and healthcare organizations to effectively detect, rank and filter urgent notifications based on patient's health status, narratives and conditions. The benchmark is available at https://github.com/mb320/ADOR.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Document and collection statistics of the ADOR semantic types: The opinions group has the highest document frequency.</figDesc><graphic coords="2,341.80,520.36,125.01,103.13" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :Figure 3 :</head><label>23</label><figDesc>Figure 2: The distribution of document length and query length.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Task Reports Number of Queries avg-Opinions-per-query avg-concepts-per-query</head><label></label><figDesc></figDesc><table><row><cell>clef2013 e-health</cell><cell>Task3: Patients' Questions when Reading Clinical Reports</cell><cell>Overview of the ShARe CLEF eHealth Evaluation Lab 2013 [5]</cell><cell>50</cell><cell>0.3</cell><cell>2.9</cell></row><row><cell>clef2014 e-health</cell><cell cols="2">Task 3: use of information e.g. discharge summary and ontologies in IR Overview of the share-clef ehealth evaluation lab 2014 [12]</cell><cell>50</cell><cell>0.34</cell><cell>1.86</cell></row><row><cell>OHSUMED</cell><cell>TREC-9 Filtering: Evaluate text filtering system</cell><cell>OHSUMED [4] -TREC-9 Final Report [13]</cell><cell>63</cell><cell>0.41</cell><cell>4.87</cell></row><row><cell cols="2">TREC 2006 Genomics Track passage retrieval for Genomics question answering</cell><cell>TREC 2006 genomics track overview. [14]</cell><cell>27</cell><cell>0.32</cell><cell>6.00</cell></row><row><cell cols="2">TREC 2007 Genomics Track Genomics passage retrieval based on biologists' needs</cell><cell>TREC 2007 genomics track overview.</cell><cell>35</cell><cell>0.27</cell><cell>4.6</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 :</head><label>1</label><figDesc>Overview of well established benchmarks for health-related retrieval.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>595442 #Distinct.Concepts 404748 #Opinions 194790 #Distinct.Opinions</head><label></label><figDesc></figDesc><table><row><cell></cell><cell>163045</cell></row><row><cell>#Query</cell><cell>25</cell></row><row><cell>#Docs</cell><cell>44796</cell></row><row><cell>#Avg.Query Length</cell><cell>9.08</cell></row><row><cell cols="2">#Avg.Review.Text Length 35.38</cell></row><row><cell>#Sampling Date</cell><cell>31-03-2020</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 2 :</head><label>2</label><figDesc>The statistics of ADOR.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>.4640 𝛽𝜃𝜁 0.4280 𝛽𝜃𝜁 0.2825 𝛽𝜃𝜁 0.1274 𝛽𝜃Table 3 :</head><label>3</label><figDesc>Ranking performances of the opinion-aware models and the baseline methods: The bold font denotes the best result in that evaluation metric. 𝛽, 𝜃, 𝜁 indicate statistically significant improvements of the best model over BM25 𝛽 , KNRM 𝜃 and DSSM 𝜁 . The statistically significance is based on the paired t-test with p-value &lt; 0.05.</figDesc><table><row><cell></cell><cell></cell><cell cols="2">Evaluation Measure</cell><cell></cell></row><row><cell></cell><cell>P@5</cell><cell>P@10</cell><cell>NDCG</cell><cell>MAP</cell></row><row><cell>TF.IDF</cell><cell>0.2480</cell><cell>0.2720</cell><cell>0.2354</cell><cell>0.0833</cell></row><row><cell>BM25</cell><cell>0.3120</cell><cell>0.3160</cell><cell>0.2336</cell><cell>0.0813</cell></row><row><cell>KNRM</cell><cell>0.2320</cell><cell>0.2440</cell><cell>0.2445</cell><cell>0.0906</cell></row><row><cell>DSSM</cell><cell>0.2080</cell><cell>0.2200</cell><cell>0.2422</cell><cell>0.1039</cell></row><row><cell>arc-I</cell><cell>0.3520</cell><cell>0.3040</cell><cell>0.2476</cell><cell>0.0902</cell></row><row><cell>CF.IDF</cell><cell>0.3840</cell><cell>0.4080</cell><cell>0.2619</cell><cell>0.1106</cell></row><row><cell>OF.IDF</cell><cell>0.3680</cell><cell>0.4120</cell><cell>0.2758</cell><cell>0.1250</cell></row><row><cell cols="2">OF.IDF+TF.IDF (w=0.5) 0.3600</cell><cell>0.3920</cell><cell>0.2705</cell><cell>0.1175</cell></row><row><cell cols="2">OF.IDF+CF.IDF (w=0.5) 0</cell><cell></cell><cell></cell><cell></cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Multi-domain sentiment classification</title>
		<author>
			<persName><forename type="first">S</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zong</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of ACL-08: HLT, Short Papers</title>
				<meeting>ACL-08: HLT, Short Papers</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="257" to="260" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Overview of the inex 2014 interactive social book search track</title>
		<author>
			<persName><forename type="first">M</forename><surname>Hall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Huurdemann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Skov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Walsh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Conference &amp; Labs of the Evaluation Forum (CLEF)</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Learning word vectors for sentiment analysis</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">L</forename><surname>Maas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">E</forename><surname>Daly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">T</forename><surname>Pham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Y</forename><surname>Ng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Potts</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1, Association for Computational Linguistics</title>
				<meeting>the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="142" to="150" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Ohsumed: an interactive retrieval evaluation and new large test collection for research</title>
		<author>
			<persName><forename type="first">W</forename><surname>Hersh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Buckley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Leone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hickam</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGIR&apos;94</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="1994">1994</date>
			<biblScope unit="page" from="192" to="201" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Overview of the share/clef ehealth evaluation lab</title>
		<author>
			<persName><forename type="first">H</forename><surname>Suominen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Salanterä</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Velupillai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">W</forename><surname>Chapman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Savova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Elhadad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Pradhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>South</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">L</forename><surname>Mowery</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">J</forename><surname>Jones</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference of the Cross-Language Evaluation Forum for European Languages</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013. 2013</date>
			<biblScope unit="page" from="212" to="231" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Sentiment analysis and opinion mining</title>
		<author>
			<persName><forename type="first">B</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Synthesis lectures on human language technologies</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="1" to="167" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Effective use of semantic structure in xml retrieval</title>
		<author>
			<persName><forename type="first">R</forename><surname>Van Zwol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Van Loosbroek</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">European Conference on Information Retrieval</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="621" to="628" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">FDCM: Towards balanced and generalizable concept-based models for effective medical ranking</title>
		<author>
			<persName><forename type="first">M</forename><surname>Bahrani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Roelleke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 29th ACM International Conference on Information &amp; Knowledge Management</title>
				<meeting>the 29th ACM International Conference on Information &amp; Knowledge Management</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1957" to="1960" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">A schema-driven approach for knowledge-oriented retrieval and query formulation</title>
		<author>
			<persName><forename type="first">H</forename><surname>Azzam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Yahyaei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bonzanini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Roelleke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Third International Workshop on Keyword Search on Structured Data</title>
				<meeting>the Third International Workshop on Keyword Search on Structured Data</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="39" to="46" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Conceptual language models for domain-specific retrieval</title>
		<author>
			<persName><forename type="first">E</forename><surname>Meij</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Trieschnigg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>De Rijke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Kraaij</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information Processing &amp; Management</title>
		<imprint>
			<biblScope unit="volume">46</biblScope>
			<biblScope unit="page" from="448" to="469" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Concept-based relevance models for medical and semantic information retrieval</title>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Akella</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 24th ACM International on Conference on Information and Knowledge Management</title>
				<meeting>the 24th ACM International on Conference on Information and Knowledge Management</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="173" to="182" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Overview of the share/clef ehealth evaluation lab</title>
		<author>
			<persName><forename type="first">L</forename><surname>Kelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Suominen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Schreck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Leroy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">L</forename><surname>Mowery</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Velupillai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">W</forename><surname>Chapman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Martinez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Zuccon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference of the Cross-Language Evaluation Forum for European Languages</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2014">2014. 2014</date>
			<biblScope unit="page" from="172" to="191" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">The trec-9 filtering track final report</title>
		<author>
			<persName><forename type="first">S</forename><surname>Robertson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">A</forename><surname>Hull</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">TREC</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page" from="344250" to="344253" />
			<date type="published" when="2000">2000</date>
			<publisher>Citeseer</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Trec 2006 genomics track overview</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">R</forename><surname>Hersh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Cohen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">M</forename><surname>Roberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">K</forename><surname>Rekapalli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">TREC</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="500" to="274" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Endto-end neural ad-hoc ranking with kernel pooling</title>
		<author>
			<persName><forename type="first">C</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Dai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Callan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Power</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 40th International ACM SI-GIR conference on research and development in information retrieval</title>
				<meeting>the 40th International ACM SI-GIR conference on research and development in information retrieval</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="55" to="64" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Learning deep structured semantic models for web search using clickthrough data</title>
		<author>
			<persName><forename type="first">P.-S</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Acero</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Heck</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 22nd ACM international conference on Information &amp; Knowledge Management</title>
				<meeting>the 22nd ACM international conference on Information &amp; Knowledge Management</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="2333" to="2338" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Convolutional neural network architectures for matching natural language sentences</title>
		<author>
			<persName><forename type="first">B</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="2042" to="2050" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Fan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Pang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Cheng</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1707.07270</idno>
		<title level="m">Matchzoo: A toolkit for deep text matching</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
