<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Harmony Assumptions: Extending Probability Theory for Information Retrieval</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Thomas</forename><surname>Roelleke</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Queen Mary University of London</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Harmony Assumptions: Extending Probability Theory for Information Retrieval</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">E4291DAC99E434AF72C173A184972091</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-19T15:21+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In many applications, independence of event occurrences is assumed, even if there is evidence for dependence. Capturing dependence leads to complex models, and even if the complex models were superior, they fail to beat the simplicity and scalability of the independence assumption. Therefore, many models assume independence and apply heuristics to improve results. Theoretical explanations of the heuristics are seldom given or generalisable.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"> <ref type="bibr" target="#b0">[1]</ref> <p>reports that some of these heuristics can be explained as encoding dependence in an exponent based on the generalised harmonic sum. Unlike independence, where the probability of subsequent occurrences of an event is the product of the single event probability, harmony is based on a product with decaying exponent.</p><p>For independence, the sequence probability is p 1+1+...+1 = p n . For harmony, the probability is p 1+1/2+...+1/n ≈ p 1+log(n) . The generalised harmonic sum is the exponent of p, and this leads to a spectrum of harmony assumptions. We will discuss that settings of the term frequency (TF) in IR correspond to harmony assumptions. We will focus on four settings of the TF:</p><formula xml:id="formula_0">TF(t, d) :=        tf d</formula><p>total TF: corresponds to assuming independence √ tf d + 1 − 1 sqrt TF: middle between total TF and log-TF log(tf d + 1) log-TF: assumes a form for harmony tf d /(tf d + K d ) BM25 TF: assumes a strong form of harmony <ref type="bibr" target="#b0">[1]</ref> shows series-based explanations of the TF settings, and these lead to new insights regarding the relationships between IR and probability theory. From an IR point of view exciting is the finding that the BM25-TF is the harmonic sum of Gaussian sums.</p><formula xml:id="formula_1">tf d tf d + 1 = 1 2 • 1 + 1 1 + 2 + . . . + 1 1 + 2 + . . . + tf d</formula><p>This finding provides a probabilistic interpretation of the BM25-TF quantification.</p><p>An experimental study for IR and social media investigates assumptions that explain the dependence between term occurrences. Interestingly, the assumption sqrt-harmony, i.e. the middle between the total-TF and log-TF, is on average a better assumption than independence or the strong harmony assumptions corresponding to log-TF and BM25-TF. The potential impact of harmony assumptions lies beyond IR, since many scientific disciplines and applications rely on probability theory and apply heuristics to compensate the independence assumption. Given the concept of harmony assumptions, the dependence between multiple occurrences of an event can be reflected in an intuitive and effective way.</p></div>		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Harmony assumptions in information retrieval and social networks</title>
		<author>
			<persName><forename type="first">Thomas</forename><surname>Roelleke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andreas</forename><surname>Kaltenbrunner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ricardo</forename><forename type="middle">A</forename><surname>Baeza-Yates</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Comput. J</title>
		<imprint>
			<biblScope unit="volume">58</biblScope>
			<biblScope unit="issue">11</biblScope>
			<biblScope unit="page" from="2982" to="2999" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
