<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Clustering multi-relationnal TV data by diverting supervised ILP</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Vincent</forename><surname>Claveau</surname></persName>
							<email>vincent.claveau@irisa.fr</email>
							<affiliation key="aff0">
								<orgName type="institution" key="instit1">IRISA</orgName>
								<orgName type="institution" key="instit2">CNRS Campus de Beaulieu</orgName>
								<address>
									<settlement>Rennes</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Clustering multi-relationnal TV data by diverting supervised ILP</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">CC93DAD8EF2DC67A2D85020F88FF2EEE</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T21:09+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Traditionally, clustering operates on data described by a fixed number of (usually numerical) features; this description schema is said propositional or attribute-value. Yet, when the data cannot be described in that way, usual data-mining or clustering algorithms are no longer suitable. In this paper, we consider the problem of discovering similar types of programs in TV streams. The TV data have two important characteristics: 1) they are multi-relational, that is to say with multiple relationships between features; 2) they require background knowledge external to their interpretation. To process the data, we use Inductive Logic Programming (ILP) <ref type="bibr" target="#b8">[MD94]</ref>. In this paper, we show how to divert ILP to work unsupervised in this context: from artificial learning problems, we induce a notion of similarity between broadcasts, which is later used to perform the clustering. Experiments presented show the soundness of the approach, and thus open up many research avenues.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Many TV services require the TV stream to be correctly segmented and tagged (thematic corpora from archives, TV on demand...). Thus, one needs a complete TV guide, also documenting inter-program (short spots between main programs, such as ads, trailers...), with a very high precision (at the frame level). Such guides usually do not exist, which makes their automatic building necessary. This task is at the heart of automatic structuring of TV streams. Several approaches have been proposed; some relies on meta-data <ref type="bibr" target="#b11">[Pol08]</ref> or audio/video clues <ref type="bibr" target="#b10">[NG08,</ref><ref type="bibr" target="#b7">MB10,</ref><ref type="bibr" target="#b5">IG11]</ref>. They all rely on a supervised classification step (assign a class to each TV segment), thus requiring a priori knowledge (the user need to define the classes) and also too many manually annotated data to be actually usable. In this paper, we propose to reduce this important a priori involvement of the user by tackling the problem as a non-supervised one, that is as clustering. The remaining role of the user would then be to tag the clusters.</p><p>As with the well-known k-means, clustering techniques rely on a simple representation of the data and on a distance notion operating of these representations which has to be provided by the user <ref type="bibr" target="#b6">[Jai10]</ref>. In our case, this leads to two problems. First, our data need to be represented in a complex way, as they are multi-relational. Second, we do not know how to define a priori a relevant distance over these complex representations. In this </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">ILP and multi-relational data</head><p>For classification problems, objects are usually described in a propositional form, also said attribute-value or vector-based. In this representation, objects must have the same number of features, and the features are to be considered independently (relations between features are not exploited). In our case, each object is a segment of TV-streams corresponding to a program or an inter-program. But each object may have several occurrences, such as a particular ad which is repeated several times in the stream. The number of occurrences vary from one object to another, which makes the attribute-value description impossible. Moreover, certain relations between occurrences may be very relevant (eg. two occurrences are broadcast on different TV channels, two occurrences are broadcast in less than 1 day...). This multi-relational aspect of our data is thus important to consider for the clustering task. Figure <ref type="figure" target="#fig_0">1</ref> shows these different relations between occurrences and their feature as arrows with different colors (in gray: the class of broadcast, which is unknown in our problem).</p><p>ILP is usually used as supervised machine learning technique able to infer rules (eg. Horn clauses) H from examples (E + ) and counter-examples E − of a concept, and with the help of background knowledge B [MD94]. Figure <ref type="figure" target="#fig_2">2</ref> shows how a program can be described in B (with standard Prolog). One can see how the relations between the occurrences are easily encoded with predicates next occ/2 and next in stream/2.</p><p>In B we also define the predicates that can be used to infer rules in H, such as prev occ/2 which indicate two occurrences of the same program, one occurring after the other, or such as interval/3 wich indicates the time interval between two occurrences of two program. Here is an example of rule that can be inferred : This rule highlights the interest of the multi-relational representation: it covers every broadcast A having two occurrences B and C, lasting 3 seconds, such as these two occurrences are followed by two occurrences (D,E) from a same program (F). This rule typically covers sponsoring broadcast always appearing before a program.</p><p>3 From supervised to unsupervised</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Principles</head><p>Our approach aims at deducing distances (or similarities) between two programs from repeated random classifications problems with ILP. For a given random classification problem, if the two programs are covered by H, it tends to show that they are related. If this is the case for every random classification problem, it means that they are very similar. Algorithm 1 gives an overview of the process. As for bagging <ref type="bibr" target="#b0">[Bre96]</ref>, classification is repeated many times with different learning parameters: examples (step 3 wichich divides the data into positive E + train and E OoB , a out-of-bag set used later), counter-examples (step 4), the hypothesis language (step 5). At each iteration, we record the pairs of programs (x i , x j ) that are covered by the same inferred clauses (called  The strategy at the heart of this approach is to vary the learning biases at each iteration. The first bias is the set of examples used. In our experiments we use 1/10 of the programs to be used as positive examples. The inferred rules are then applied on the 9/10 remaining programs to find which one are co-covered. The generation of negative examples is an important step in our algorithm. In our case, it means inventing programs, with their occurrences and features. They have to be realistic enough in order to produce learning problems that will generate discriminative enough clauses, and thus relevant co-covers. In order to generate counter-examples, we randomly copy parts of the description of real programs (with a renaming of the constants in order to produce a coherent set of occurrences and features). The hypothesis language, setting the format of acceptable clauses, Algorithm 1 Clustering with ILP 1: input: E total : programs 2: for i in [1 .. N ] do 3:</p><formula xml:id="formula_0">E + i , E OoB i ← Divide(E total ) 4: Generate negative examples E − i 5:</formula><p>Generate randomly the hypothesis language L H i and the ILP parameters θ i</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>6:</head><p>Inferring :</p><formula xml:id="formula_1">H i ← ILP(E + i ,E − i ,L H i ,θ i ) 7:</formula><p>for all clause h l among H i do is also different at each iteration. In practice, every mode of every predicate is given at the initialization of the algorithm, and a subset is randomly chosen at each iteration. All these machine learning problems on fake supervised tasks brings, through their variety, important properties to the obtained similarity: it mixes complex descriptions, implements feature selection, take into account redundancy between descriptions, and is robust to outliers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experiments</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Experimental setting</head><p>The data use for our experiments are those developed by <ref type="bibr" target="#b10">[NG08]</ref>; it consists of a 22-day recording of the French France2 channel in May 2005. The stream is segmented in programs and the different occurrences of a same program have been identified automatically and manually consolidated <ref type="bibr" target="#b10">[NG08]</ref>. To build the groundtruth needed to evaluate our clustering results, we used the manual annotation of the data proposed by <ref type="bibr" target="#b10">[NG08]</ref> who tagged the programs according to 6 classes: movie/show, series, commercials, sponsoring, branding (short programs displaying the the name or logo of the channel), trailers (short programs announcing what will be broadcast later). This ground-truth tagging of the stream will be used as reference clusters (cf. Figure <ref type="figure">3</ref> for their repartition). The evaluation scores are those commonly used for clustering comparison (the one produced automatically vs. the ground-truth one): Adjusted Purity, Normalized mutual information and Adjusted Rand Index (ARI) [Ran71, HA85, VEB10].</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Results</head><p>Figure <ref type="figure" target="#fig_4">4</ref> presents the results of the relational clustering after 1 000 iterations as well as several baselines relying on a usual propositional representation. In this latter case, the features used are: number of occurrences, average duration, mean, minimal and maximal interval between two occurrences, maximal number of occurrences during a 24h, duration between the first and last occurrences, presence or not of every occurrences in the same day, and average number of other programs occurring before or after the program occurrences. The baseline algorithms are: k-means, EM, CobWeb, such that implemented in weka [HFH + 09]; for each of them, we only report the results of the configurations yielding the best ARI. The ILP algorithm used is aleph <ref type="bibr" target="#b14">[Sri01]</ref>, the data are described as shown in Section 2. We also, report the results of our ILP-based approach exploiting the same representation (i.e. discarding the relational predicates of L H ).</p><p>For any evaluation score, our ILP-based clustering approach perform better than the propositional approaches; it clearly shows the added-value of the ability to handle the multi-relational representation of the data. The generated clusters are nonetheless different in terms of numbers of clusters and in terms and of the content of these clusters. An analysis of the differences between the ILP clusters and ground-truth ones shows that the trailer class is difficult to capture (such programs appear in several ILP clusters). Other problems are caused by programs at the boundaries of our 22-day TV recording or for programs for which the 3 weeks are not enough to capture the recurrence patterns. An analysis of the inferred rule for each iteration also allow an indirect validation of our approach since they exhibit the multi-relational property of our data. This is the case of the following rule which covers programs broadcast at fixed interval: </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusions</head><p>Our clustering approach, relying on ILP, allows us to make the most of the multi-relational aspect of our TV data. It makes it possible to get a notion of distance even in rich description spaces where metrics cannot be defined a priori. Of course, even if there is no explicit definition of the distance, other biases from the user are unavailable, such as the way the data are described, the definition of the modes in the hypothesis language... Several perspectives are foreseen. For our TV application, the use of a larger dataset (recording several months with several channels) would allow us to limit the errors mentioned in the previous section. Adding multimodal features (logo detection, black frames, speech detection...) would also bring useful information about the content of the TV segment. These features should help the clustering process to distinguish between branding and sponsoring, or to better categorize trailers. More generally, the good results obtained by the ILPbased clustering argues in favor of applying this approach to other problems where the multi-relational aspect in important [DL01, MDP + 12].</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Multiple relations of the trailer Clara Sheller. paper, we propose to define a clustering technique suited to our complex data by diverting supervised Inductive Logic Programming (ILP) into a non-supervised technique. ILP makes it possible to easily represent our multirelational data, and a distance between broadcasts is automatically from fake supervised classification problems, in the vein of [SH05, CN13].</figDesc><graphic coords="2,137.70,54.07,340.20,162.70" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>broadcast(A) :-has occ(A,B), duration(B,3), next occ(B,C), next in stream(B,D), next in stream(C,E), has occ(F,D), has occ(F,E).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Excerpt of the example description and background knowledge co-covers hereafter) in a matrix M co-cov . One can give more weight to a clause covering very few examples, and less weight to a clause covering most of the examples (function weight). The last step is simply to use a standard clustering technique on the co-cover matrix, considered as a similarity matrix. In the experiments presented below, we use Markov Clustering [vD00]. Its main advantage compared with k-means/k-medoids is to avoid the need to decide a priori the number of expected clusters.The strategy at the heart of this approach is to vary the learning biases at each iteration. The first bias is the set of examples used. In our experiments we use 1/10 of the programs to be used as positive examples. The inferred rules are then applied on the 9/10 remaining programs to find which one are co-covered. The generation of negative examples is an important step in our algorithm. In our case, it means inventing programs, with their occurrences and features. They have to be realistic enough in order to produce learning problems that will generate discriminative enough clauses, and thus relevant co-covers. In order to generate counter-examples, we randomly copy parts of the description of real programs (with a renaming of the constants in order to produce a coherent set of occurrences and features). The hypothesis language, setting the format of acceptable clauses,</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>8 :Figure 3 :</head><label>83</label><figDesc>Figure 3: Class repartition in the ground-truth.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Results of clustering methods in terms of Adjusted Purity, Normalized mutual information and Adjusted Rand Index.</figDesc><graphic coords="5,186.30,54.07,243.00,185.16" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head></head><label></label><figDesc>broadcast(A) :-has occ(A,B), next occ(B,C), next occ(C,D), interval(B,C,E), interval(C,D,E).</figDesc></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Bagging predictors</title>
		<author>
			<persName><forename type="first">L</forename><surname>Breiman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Machine Learning</title>
				<imprint>
			<date type="published" when="1996">1996</date>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="page" from="123" to="140" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Dcouverte de connaissances dans les squences par CRF nonsuperviss</title>
		<author>
			<persName><forename type="first">Vincent</forename><surname>Claveau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Abir</forename><surname>Ncibi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Actes de la confrence TALN 2013</title>
				<meeting>s de la confrence TALN 2013</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m">Relational Data Mining</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Dzerosky</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Lavrac</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Comparing partitions</title>
		<author>
			<persName><forename type="first">Lawrence</forename><surname>Hubert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Phipps</forename><surname>Arabie</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Classification</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="193" to="218" />
			<date type="published" when="1985">1985</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">The WEKA data mining software: An update</title>
		<author>
			<persName><surname>Hfh + ; Mark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Eibe</forename><surname>Hall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Geoffrey</forename><surname>Frank</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bernhard</forename><surname>Holmes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Peter</forename><surname>Pfahringer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ian</forename><forename type="middle">H</forename><surname>Reutemann</surname></persName>
		</author>
		<author>
			<persName><surname>Witten</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIGKDD Explorations</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="10" to="18" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Tv stream structuring</title>
		<author>
			<persName><forename type="first">Zein</forename><surname>Al</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Abidin</forename><surname>Ibrahim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Patrick</forename><surname>Gros</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ISRN Signal Processing</title>
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Data clustering: 50 years beyond k-means</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Jain</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Pattern Recognition Letters</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="651" to="666" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Automatic tv broadcast structuring</title>
		<author>
			<persName><forename type="first">Gal</forename><surname>Manson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sid-Ahmed</forename><surname>Berrani</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2010">2010</date>
			<publisher>International Journal of Digital Multimedia Broadcasting</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Inductive Logic Programming: Theory and Methods</title>
		<author>
			<persName><forename type="first">Stephen</forename><surname>Muggleton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Luc</forename><surname>De</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Raedt</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Logic Programming</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="page" from="629" to="679" />
			<date type="published" when="1994">1994</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">ILP turns 20 -biography and future challenges</title>
		<author>
			<persName><forename type="first">Luc</forename><surname>Mdp + ; Stephen Muggleton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><surname>De Raedt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ivan</forename><surname>Poole</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Peter</forename><forename type="middle">A</forename><surname>Bratko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Katsumi</forename><surname>Flach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ashwin</forename><surname>Inoue</surname></persName>
		</author>
		<author>
			<persName><surname>Srinivasan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">86</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="3" to="23" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Detecting repeats for video structuring</title>
		<author>
			<persName><forename type="first">Xavier</forename><surname>Naturel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Patrick</forename><surname>Gros</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Multimedia Tools and Applications</title>
		<imprint>
			<biblScope unit="volume">38</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="233" to="252" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">An automatic television stream structuring system for television archives holders</title>
		<author>
			<persName><forename type="first">Jean-Philippe</forename><surname>Poli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Multimedia Systems</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="255" to="275" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Objective criteria for the evaluation of clustering methods</title>
		<author>
			<persName><forename type="first">William</forename><forename type="middle">M</forename><surname>Rand</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the American Statistical Association</title>
		<imprint>
			<biblScope unit="volume">66</biblScope>
			<biblScope unit="issue">336</biblScope>
			<biblScope unit="page" from="846" to="850" />
			<date type="published" when="1971">1971</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Unsupervised learning with random forest predictors</title>
		<author>
			<persName><forename type="first">Tao</forename><surname>Shi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Steve</forename><surname>Horvath</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Computational and Graphical Statistics</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="118" to="138" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">The aleph manual. Machine Learning at the Computing Laboratory</title>
		<author>
			<persName><forename type="first">Aswin</forename><surname>Srinivasan</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2001">2001</date>
		</imprint>
		<respStmt>
			<orgName>Oxford University</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">Graph Clustering by Flow Simulation</title>
		<author>
			<persName><surname>Stijn Van Dongen</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2000">2000</date>
		</imprint>
		<respStmt>
			<orgName>Universit d&apos;Utrecht</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Thse de doctorat</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Information theoretic measures for clusterings comparison</title>
		<author>
			<persName><forename type="first">Xuan</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Julien</forename><surname>Vinh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">James</forename><surname>Epps</surname></persName>
		</author>
		<author>
			<persName><surname>Bailey</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
