<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">LIGON -Link Discovery with Noisy Oracles</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Mohamed</forename><forename type="middle">Ahmed</forename><surname>Sherif</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Data Science Group</orgName>
								<orgName type="institution">Paderborn University</orgName>
								<address>
									<addrLine>Technologiepark 6</addrLine>
									<postCode>33100</postCode>
									<settlement>Paderborn</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">University of Leipzig</orgName>
								<address>
									<postCode>04109</postCode>
									<settlement>Leipzig</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Kevin</forename><surname>Dreßler</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">University of Leipzig</orgName>
								<address>
									<postCode>04109</postCode>
									<settlement>Leipzig</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Axel-Cyrille</forename><surname>Ngonga Ngomo</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Data Science Group</orgName>
								<orgName type="institution">Paderborn University</orgName>
								<address>
									<addrLine>Technologiepark 6</addrLine>
									<postCode>33100</postCode>
									<settlement>Paderborn</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">University of Leipzig</orgName>
								<address>
									<postCode>04109</postCode>
									<settlement>Leipzig</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<address>
									<country>www</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">LIGON -Link Discovery with Noisy Oracles</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">58DD6AB3DE272FE7064ABAC3E2DFC6B4</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T07:02+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Link discovery plays a key role in the integration and use of data across RDF knowledge graphs. Active learning approaches are a common family of solutions to address the problem of learning how to compute links from users. So far, only active learning from perfect oracles has been considered in the literature. However, real oracles are often far from perfect (e.g., in crowdsourcing). We hence study the problem of learning how to compute links across knowledge graphs from noisy oracles, i.e., oracles that are not guaranteed to return correct classification results. We present a novel approach for link discovery based on a probabilistic model, with which we estimate the joint odds of the oracles' guesses. We combine this approach with an iterative learning approach based on refinements. The resulting method, Ligon, is evaluated on 11 benchmark datasets. Our results suggest that Ligon achieves more than 95% of the F-measure achieved by state-of-the-art algorithms trained with a perfect oracle.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The provision of links between knowledge graphs in RDF 3 is of central importance for numerous tasks on the Semantic Web, including federated queries, question answering and data fusion. While links can be created manually for small knowledge bases, the sheer size and number of knowledge bases commonly used in modern applications (e.g., DBpedia with more than 3 × 10 6 resources) demands the use of automated link discovery mechanisms. In this work, we focus on active learning for link discovery. State-of-the-art approaches that rely on active learning <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b7">8]</ref> assume that the oracle they rely upon is perfect. Formally, this means that given an oracle ω, the probability of the oracle returning a wrong result (i.e., returning false when an example is to be classified as true) is exactly 0. While these approaches show pertinent results in evaluation scenarios, within which the need for a perfect oracle can be fulfilled, this need is difficult if not impossible to uphold in real-world settings (e.g., when crowdsourcing training data). No previous work has addressed link discovery based on oracles that are not perfect.</p><p>We address this research gap by presenting a novel approach for learning link specifications (LS) from noisy oracles, i.e., oracles that are not guaranteed to return correct classifications. This approach is motivated by the problem of learning LS using crowdsourcing. Previous works have shown that agents in real crowdsourcing scenarios are often not fully reliable (e.g., <ref type="bibr" target="#b18">[19]</ref>). We model these agents as noisy oracles, which provides erroneous answers to questions with a fixed probability. We address the problem of learning from such oracles by using a probabilistic model, which approximates the odds of the answer of a set of oracles being correct. Our approach, dubbed Ligon, assumes that the underlying oracles are independent, i.e., that the probability distributions underlying oracles are pairwise independent. Moreover, we assume that the oracles have a static behavior, i.e., that the probability of them generating correct/incorrect answers is constant over time.</p><p>The contributions of this paper are as follows: <ref type="bibr" target="#b0">(1)</ref> We present a formalization of the problem of learning LS from noisy oracles. We derive a probabilistic model for learning from such oracles. <ref type="bibr" target="#b1">(2)</ref> We develop the first learning algorithm dedicated to learning LS from noisy data. The approach combines iterative operators for LS with an entropy-based approach for selecting most informative training examples. In addition, it uses cumulative evidence to approximate the probability distribution underlying the noisy oracles that provide it with training data. Finally, (3) we present a thorough evaluation of Ligon and show that it is robust against noise, scales well and converges with 10 learning iterations to more than 95% of the average F-measure achieved by Wombat-a state-of-the-art approach for learning LS-provided with a perfect oracle.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Preliminaries</head><p>Knowledge graphs (also called knowledge bases) in RDF are defined as sets of triples K ⊆ (R ∪ B) × P × (R ∪ B ∪ L), where R is the set of all resources, i.e., of all objects in the domain of discourse (e.g., persons and publications); P ⊆ R is the set of all predicates, i.e., of binary relations (e.g., author); B is the set of all blank nodes, which basically stand for resources whose existence is known but whose identity is not relevant to the model and L is the set of all literals, i.e., of values associated to datatypes (e.g., integers). <ref type="foot" target="#foot_0">4</ref> The elements of K are referred to as facts or triples. We call the elements of R entities or resources.</p><p>The link discovery task on RDF knowledge graphs is defined as follows: Let S and T be two sets of resources, i.e., S ⊆ R and T ⊆ R. Moreover, let r ∈ P be a predicate. The aim of link discovery is to compute the set M = {(s, t) ∈ S ×T : r(s, t)}. We call M a mapping. In many cases, M cannot be computed directly and is thus approximated by a mapping M . To find the set M , declarative  </p><formula xml:id="formula_0">LS [[LS]] M f (m, θ) {(s, t)|(s, t) ∈ M ∧ m(s, t) ≥ θ} L1 L2 {(s, t)|(s, t) ∈ [[L1]] M ∧ (s, t) ∈ [[L2]] M } L1 L2 {(s, t)|(s, t) ∈ [[L1]] M ∨ (s, t) ∈ [[L2]] M } L1\L2 {(s, t)|(s, t) ∈ [[L1]] M ∧ (s, t) / ∈ [[L2]] M }</formula><p>link discovery frameworks rely on link specifications (LS), which describe the conditions under which r(s, t) can be assumed to hold for a pair (s, t) ∈ S × T . Several formal models have been used for describing LS in previous works <ref type="bibr" target="#b7">[8]</ref>.</p><p>We adopt a formal approach derived from <ref type="bibr" target="#b16">[17]</ref> and first describe the syntax and then the semantics of LS. LS consist of two types of atomic components: similarity measures m, which allow the comparing of property values of input resources and operators op, which can be used to combine LS to more complex LS. Without loss of generality, we define a similarity measure m as a function m : S × P × T × P → [0, 1]. An example of a similarity measure is the edit similarity dubbed edit<ref type="foot" target="#foot_1">5</ref> which allows computing the similarity of a pair (s, t) ∈ S × T w.r.t. the values of a pair of properties (p s , p t ) for s resp. t. An atomic LS is a pair (m, θ). A complex LS is the result of combining two LS L 1 and L 2 through an operator that allows merging the results of L 1 and L 2 . Here, we use the operators , and \ as they are complete w.r.t. the Boolean algebra and frequently used to define LS. An example of a complex LS is given in Figure <ref type="figure" target="#fig_0">1</ref>.</p><p>We define the semantics </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Noisy Oracles</head><p>We model oracles Ω for r as black boxes with a characteristic function ω : S × T → {true, false}. The characteristic function ω i of the oracle Ω i returns true iff the oracle Ω i assumes that r(s, t) holds. Otherwise, it returns false.</p><p>For ease of notation, we define LC = S × T and call the elements of LC link candidates. For l ∈ LC, we write l ≡ to signify that r(l) holds, i.e., r(s, t) is true for l = (s, t). Otherwise, we write l ≡ ⊥. We now assume a learning situation typical for crowdsourcing, where n oracles are presented with a link candidate l and asked whether l ≡ holds. We can describe each oracle Ω i by the following four probabilities: 1 p(ω i (l) = true|l ≡ ), i.e., the probability of the oracle Ω i generating true positives. This value is exactly 1 for a perfect oracle. 2 p(ω i (l) = false|l ≡ ), the probability of false negatives (0 for a perfect oracle). 3 p(ω i (l) = true|l ≡ ⊥), i.e., the probability of false positives, (0 for a perfect oracle), and 4 p(ω i (l) = false|l ≡ ⊥), the probability of true negatives (1 for a perfect oracle). Given that p(A|B) + p(¬A|B) = 1, the sum of the first two and last two probabilities is always 1.</p><p>Example 1. A noisy oracle can have the following description:</p><formula xml:id="formula_1">p(ω i (l)=true|l≡ )=0.7, p(ω i (l)=true|l≡⊥)=0.5, p(ω i (l)=false|l≡ )=0.3, p(ω i (l)=false|l≡⊥)= 0.5.</formula><p>For compactness, we use the following vector notation in the rest of the formal model: − → ω refers to the vector of characteristic functions over all oracles.</p><p>We write − → ω (l) = − → x to signify that the ith oracle returned the x i for the link candidate l. Let us assume that the probabilities underlying all oracles Ω i are known (we discuss ways to initialize and update these probabilities in the subsequent section). Recalling that we assume that our oracles are independent, we can now approximate the probability that l ≡ y (with y ∈ { , ⊥}) for any given link candidate l using the following Bayesian model:</p><formula xml:id="formula_2">p(l=y| − → w = − → x )= n i=1 p(ω i =x i |l≡y) n i=1 p(ω i =x i ) p(l≡y)<label>(1)</label></formula><p>Recall that the odds of an event A occurring are defined as odds(A) = p(A)/P (¬A). For example, the odds of any link candidate being a correct link (denoted o + ) are given by</p><formula xml:id="formula_3">o + = p(l ≡ ) p(l ≡ ⊥) for any l ∈ LC. (<label>2</label></formula><formula xml:id="formula_4">)</formula><p>o + is independent of l and stands for the odds that an element of LC chosen randomly would be a link. Given feedback from our oracles, we can approximate the odds of a link candidate l being a correct link by computing the following:</p><formula xml:id="formula_5">odds(l≡ | − → w = − → x )= n i=1 p(ω i =x i |l≡ ) p(ω i =x i |l≡⊥) p(l≡ ) p(l≡⊥) = n i=1 p(ω i =x i |l≡ ) p(ω i =x i |l≡⊥) o + .<label>(3)</label></formula><p>A key idea behind our model is that a link candidate l can be considered to be a correct link if odds(l</p><formula xml:id="formula_6">≡ | − → w = − → x ) ≥ k with k &gt; 1. A link candidate is assumed to not be a link if odds(l| − → w = − → x ) ≤ 1/k. All other link candidates remain unclassified.</formula><p>Computing the odds for a link now boils down to (1) approximating the four probabilities which characterize our oracles and (2) computing o + . As known from previous works on probabilistic models <ref type="bibr" target="#b6">[7]</ref>, o + is hard to compute directly as it requires knowing the set of links M , which is exactly what we are trying to compute. Several strategies can be used to approximate o + . In this work, we consider the following three:</p><p>1. Ignore strategy: We can assume the probabilities p(l = ) and p(l = ⊥) to be equally unknown and hence use o + = 1 . This reduces Equation <ref type="formula" target="#formula_5">3</ref>to  <ref type="bibr" target="#b12">[13]</ref>). Hence,</p><formula xml:id="formula_7">odds(l= | − → w = − → x )= n i=1 p(ω i =x i |l≡ ) p(ω i =x i |l≡⊥) . (<label>4</label></formula><formula xml:id="formula_8">o + ≈ min(|S|, |T |) |S||T | − min(|S|, |T |) .<label>(5)</label></formula><p>3. Approximate strategy: We approximate o + by using our learning approach. We select the mapping [[L * ]] computed using the best specification L * learned by Ligon (see the subsequent Section) as our best current approximation of the mapping we are trying to learn. o + is then computed as follows:</p><formula xml:id="formula_9">o + ≈ |[[L * ]]| |S||T | − |[[L * ]]| . (<label>6</label></formula><formula xml:id="formula_10">)</formula><p>We quantify the effect of these strategies on our learning algorithm in our experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">The LIGON approach</head><p>Ligon is an active learning algorithm designed to learn LS from noisy oracles. An overview of the approach is given in Algorithm 1 and explained in the sections below.</p><p>Confusion Matrices. We begin by assuming that we are given an initial set E 0 ⊆ LC of positive and negative examples for links. In the first step, we aim to compute the initial approximations of the conditional probabilities which describe each of the oracles Ω i . To this end, each oracle is assigned a confusion matrix C i of dimension 2 × 2 (see lines 2-3 of Algorithm 1). Each entry of the matrix is initialized with 1 2 to account for potential sampling biases due to high disparities between conditional probabilities. The first and second row of each C i contains counts for links where the oracle returned true resp. false. The first and second column of C i contain counts for positive resp. negative examples. Hence, C 11 basically contains counts for positive examples that were rightly classified as by the oracle. In each learning iteration, we update the confusion matrix by presenting the oracle with unseen link candidates and incrementing the entries of C (see lines 4-8 of Algorithm 1). We discuss the computation of the training examples in the subsequent section. Based on the confusion matrix, we can approximate all conditional probabilities necessary to describe the oracle by computing the 2 × 2 matrix D with d ij = c ij /(c 1j + c 2j ). For example, d 11 ≈ p(ω i (l) = true|l ≡ ). We call D the characteristic matrix of Ω. .</p><p>Updating the Characteristic Matrices. Updating the probabilities is done via the confusion matrices. In each learning iteration, we present all oracles with the link candidates deemed to be most informative. Based on the answers of the oracles, we compute the odds for each of these link candidates. Link candidates l with odds in [0, 1/k] and [k, +∞[ are considered to be false respectively true. The new classifications are subsequently used to update the counts in the confusion matrices and therewith also the characteristic matrix of each of the oracles.</p><p>Active Learning Approach. So far, we have assumed the existence of an active learning solution for link discovery. Several active learning approaches have been developed over recent years <ref type="bibr" target="#b7">[8]</ref>. Of these approaches, solely those based on genetic programming can generate specifications of arbitrary complexity. However, genetic programming approaches are not deterministic and are thus difficult to use in practical applications. Newer approaches based on iterative operators such as Wombat <ref type="bibr" target="#b16">[17]</ref> have been shown to perform well in classical link discovery tasks. Therefore, we implemented a generic interface to apply Ligon to several active learning algorithms, where we used the Wombat algorithm as the default active learning algorithm for Ligon. See our last set of experiments for results of applying Ligon to other state-of-the-art active learning approaches.</p><p>Selecting the Most Informative Examples. Given an active learning algorithm, we denote the set of the m best LS generated in a given iteration i as B i . The most informative examples are those link candidates l, which maximize the decision entropy across the elements of B i <ref type="bibr" target="#b11">[12]</ref>. Formally, let [[B i ]] be the union of the set of link candidates generated by all LS b ∈ B i . Then, the most informative link candidates are the l ∈ B i which maximize the entropy function e(l, B i ), which is defined as follows: Let p(l, B i ) be the probability that a link candidate belongs to [[b]] for b ∈ B i . Then, e(l, B i ) = −p(l, B i ) log 2 p(l, B i ). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Experiments and Results</head><p>We aimed to answer 6 research questions with our experimental evaluation: Q 1 . Which combination of strategies for computing odds and the threshold k leads to the best performance?, Q 2 . How does Ligon behave when provided with an increasing number of noisy oracles?, Q 3 . How well does Ligon learn from noisy oracles?, Q 4 . How well does Ligon scale?, Q 5 . How well does Ligon perform compare to batch learning approaches trained with a similar number of examples? and Q 6 . How general is Ligon, i.e., can Ligon be applied to problems outside the link discovery domain? and does Ligon depend on the underlying active learning algorithm?</p><p>Experimantal Setup. All experiments were carried out on a 64-core 2.3 GHz PC running OpenJDK 64-Bit Server 1.8.0 151 on Ubuntu 16.04.3 LTS. Each experiment was assigned 20 GB RAM. We evaluated Ligon using 8 link discovery benchmark datasets. Five of these benchmarks were real-world datasets <ref type="bibr" target="#b5">[6]</ref> while three were synthetic from the OAEI 2010 benchmark. <ref type="foot" target="#foot_2">6</ref> We used the paradigm proposed by <ref type="bibr" target="#b4">[5]</ref> and measured the performance of algorithms using the best Fmeasure they achieved. As this measure fails to capture the average behaviour of algorithm over several iterations, we also report the normalized average area under the F-measure curve, which we denote AUC. We initialized Ligon with 10 positive examples (ergo, |E 0 | = 10). We fixed the number of the most informative examples to be labeled by the noisy oracles at each iteration to 10. For labeling the most informative examples, we use n = 2, 4, 8 and 16 noisy oracles which were all initialized with random confusion matrices. We set the size of B to 10. All experiments were repeated 10 times and we report average values. The characteristic matrices C of our noisy oracles were generated at random. To this end we generated the true positive and true negative probabilities using a uniform distribution between 0.5 and 1, i.e. p(ω i (l) = true|l ≡ ) ∈ [0.5, 1] and p(ω i (l) = true|l ≡ ) ∈ [0.5, 1]. The other probabilities were set accordingly, as they are complementary to the former two. Fig. <ref type="figure" target="#fig_1">2</ref>  Parameter Estimation. Our first series of experiments aimed to answer Q 1 . We ran Ligon with k = 2, 4, 8 and 16. These settings were used in combination with all three strategies for computing o + aforementioned. A first observation is that the AUC achieved by Ligon does not depend much on the value of k nor on the strategy used. This is a highly positive feature of our algorithm as it suggests that our approach is robust w.r.t. to how it is initialized. Interestingly, this experiment already suggests that Ligon achieves more than 95% of the performance of the original Wombat algorithm trained with a perfect oracle.</p><p>We chose to run the remainder of our experiments with the setting k = 16 combined with the equivalent strategy as this combination achieved the highest average F-measure of 0.86.</p><p>Comparison with Perfect Oracle. In our second set of experiments, we answered Q 2 by measuring how well Ligon performed when provided with an increasing number of oracles. In this series of experiments, we used 2, 4, 8, and 16 oracles which were initialized randomly. k was set to 16 and we used the Equivalent strategy. Once more, the robustness of our approach became evident as its performance was not majorly perturbed by a variation in the number of oracles. In all settings Ligon achieves an average AUC close to 0.86 with no statistical difference. We can hence conclude that the performance of our approach depends mostly on the initial set of examples E 0 being accurate, which leads to our prior-i.e., the evaluation of the initial confusion matrix of the oracles-being sufficient. This sufficient approximation means that our Bayesian model is able to distinguish between erroneous classifications well enough to find most informative examples accurately and generalize over them. In other words, even a small balanced set containing 5 positive and 5 negative examples seems sufficient to approximate the confusion matrix of the oracles sufficiently well to detect positive and negative examples consistently in the subsequent steps. This answers Q 2 . Figures <ref type="figure" target="#fig_3">2 and 3</ref> show the detailed results of running Ligon for 10 iterations for each of our 8 benchmark datasets.</p><p>To answer Q 3 , we also ran our approach in combination with a perfect oracle (i.e., an oracle which knew and returned the perfect classification for all pairs from (S, T )). The detailed results are provided in Figures <ref type="figure" target="#fig_3">3 and 2</ref>. Combining our approach with a perfect oracle can be regarded as providing an upper bound to our learning algorithm. Over all datasets, Ligon achieved 95% of the AUC achieved with the perfect oracle (min = 88% on DBpedia-LMDB, max = 100% on Restaurants) of the AUC achieved with the perfect oracle. This answers Q 3 and demonstrates that Ligon can learn LS with an accuracy close to that of an approach provided with perfect answers. Runtime. In our third set of experiments, we were interested in knowing how well our approach scales. To this end, we measured the runtime of our algorithm while running the experiments carried out to answer Q 2 and Q 3 . In our experiments, Wombat, the machine learning approach used within Ligon, makes up for more than 99% of Ligon's runtime. for more detailed results see Table <ref type="table" target="#tab_4">2</ref>. This shows that the Bayesian framework used to re-evaluate the characteristic matrices of the oracles is clearly fast enough to be used in interactive scenarios, which answers Q 4 . Our approach completes a learning iteration in less than 10 seconds on most datasets, which we consider acceptable even for interac- Comparison with Batch Learning. While active learning commonly requires a small number of training examples to achieve good F-measures, other techniques such as pessimistic and re-weighted batch learning have also been designed to achieve this goal <ref type="bibr" target="#b4">[5]</ref>. In addition, the positive-only learning algorithm Wombat has also been shown to perform well with a small number of training examples. In our final set of experiments, we compared the best F-measure achieved by Ligon when trained with 16 noisy oracles, k = 16 and the equivalent strategy with the pessimistic and re-weighted models proposed in <ref type="bibr" target="#b4">[5]</ref> as well as the two versions of the Wombat approach <ref type="bibr" target="#b16">[17]</ref>. All approaches were trained with 2% of the reference data (i.e., with a perfect oracle) as suggested by <ref type="bibr" target="#b4">[5]</ref>. The results of these experiments are shown in Table <ref type="table" target="#tab_5">3</ref>. Note that, we did not consider the datasets Persons 1, Persons 2 and Restaurant because 2% of the training data accounts to less than 10 examples, which Ligon requires as initial training dataset E 0 . Our results answer Q 5 clearly by showing that Ligon outperforms previous batch learning algorithms even when trained with noisy oracles. On average, Ligon is more than 40% better in F-measure. This clearly demonstrates that our active learning strategy for selecting training examples is superior to batch learning.</p><p>Generalization of Ligon. In our last set of experiment, we implemented a generalization of Ligon for binary classification tasks behind link discovery. We thus used the active learning framework JCLAL <ref type="bibr" target="#b14">[15]</ref> to wrap WEKA <ref type="bibr" target="#b1">[2]</ref> classifiers and implemented Ligon as a custom oracle. We selected three well known binary classification datasets (i.e., Diabetes, breast-cancer and Ionosphere) from the WEKA distribution on which we applied two state-of-the-art classification algorithms, namely GBDT and Random Forests <ref type="bibr" target="#b19">[20]</ref>. Based on our previous experiments, we used 4 noisy oracles, k was set to 16 and we used the Ignore strategy, since all the other strategies are specific to the link discovery domain. We executed two sets of experiments for noisy oracles with true positive/negative probabilities drawn from the two uniform distributions in [0.5, 1] and [0.75, 1]. On average, Ligon achieves 75% and 89% of the learning accuracy for noisy oracles drawn from [0.5, 1] and [0.75, 1] respectively. These results indicate that Ligon is not only applicable to problems outside the link discovery domain but also independent from the underlying active learning algorithm is able to achieve F-measures near to the ones scored using a perfect oracle, which answers Q 6 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Related Work</head><p>Link Discovery for RDF knowledge graphs has been an active research area for nearly a decade, with the first frameworks for link discovery <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b8">9]</ref> appearing at the beginning of the decade. Raven <ref type="bibr" target="#b9">[10]</ref> was the first active learning approach for link discovery and used perception learning to detect accurate LS. Other approaches were subsequently developed to learn LS within the active learning setting <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b11">12]</ref>. Unsupervised learning approaches for monogamous relations <ref type="bibr" target="#b10">[11]</ref><ref type="bibr" target="#b11">[12]</ref><ref type="bibr" target="#b12">[13]</ref> rely on different pseudo-F-measures to detect links without any training data. Positive-only learning algorithms <ref type="bibr" target="#b16">[17]</ref> address the open-world characteristic of the Semantic Web by using generalization algorithms to detect LS. The work presented by <ref type="bibr" target="#b13">[14]</ref> proposes an active learning approach for link prediction in knowledge graphs. Ligon differs from the state of the art in that it does not assume that it deals with perfect oracles. Rather, it uses several noisy oracles to achieve an F-measures close to those achieved with perfect oracles. An active learning approach with uncertain labeling knowledge is proposed by <ref type="bibr" target="#b0">[1]</ref>, where the authors used diversity density to characterize the uncertainty of the knowledge. A probabilistic model of active learning with multiple noisy oracles was introduced by <ref type="bibr" target="#b17">[18]</ref> to label the data based on the most perfect oracle. For Crowdsourcing scenarios, <ref type="bibr" target="#b15">[16]</ref> propose a supervised learning algorithm for multiple annotators (oracles), where the oracles' diverse reliabilities were treated as a latent variables.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Conclusions and Future Work</head><p>We presented Ligon, an active learning approach designed to deal with noisy oracles, i.e., oracles that are not guaranteed to return correct classification results. Ligon relies on a probabilistic model to estimate the joint odds of link candidates based on the oracles' guesses. Our experiments showed that Ligon achieves 95% of the learning accuracy of approaches learning with perfect oracles in the link discovery setting. Moreover, we showed that Ligon is (1) not dependent on the underlying active learning algorithm and (2) able to deal with other classification problems. In future work, we will evaluate Ligon within real crowdsourcing scenarios. A limitation of our approach is that it assumes that the confusion matrix of the oracles is static. While this assumption is valid with the small number of iterations necessary for our approach to converge, we will extend our model so as to deal with oracles which change dynamically. Furthermore, we will extend Ligon to handle n-ary classification problems and evaluate it on more stat-of-the-art approaches from the deep learning domain.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 :</head><label>1</label><figDesc>Fig. 1: Complex LS example. The filter nodes are rectangles while the operator nodes are circles.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Example 2 .</head><label>2</label><figDesc>Imagine an oracle were presented with a set of 5 positive and 5 negative training examples, of which he classified 4 resp. 3 correctly. We get</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Example 3 .</head><label>3</label><figDesc>Let us assume |B i | = 4. A link candidate l returned by two of the LS in B i would have a probability p(l, B i ) = 0.5. Hence, it would have an entropy e(l, B i ) = 0.5. Termination Criterion. Ligon terminates after a set number of iterations has been achieved or if a link specification learned by Wombat achieves an Fmeasure of 1 on the training data.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 3 :</head><label>3</label><figDesc>Fig.3: F-measure results of Ligon. x-axes show the iteration number while the y-axes show the F-measure. Note that, the y-axes show different value for better legibility. Gray bars represent the F-measure of Ligon with the perfect oracle while the F-measure achieved by the 2, 4, 8 and 16 noisy oracles are represented by red, blue, orange and green lines respectively.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Link Specification Syntax and Semantics</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head></head><label></label><figDesc>: Average AUC heatmap of Ligon. using 2, 4, 8 and 16 noisy oracles and the perfect oracle.</figDesc><table><row><cell>Dataset / # oracles</cell><cell>2</cell><cell>4</cell><cell>8</cell><cell>16</cell><cell>Perfect</cell></row><row><cell>Person 1</cell><cell>0.97</cell><cell>0.93</cell><cell>0.93</cell><cell>0.93</cell><cell>0.99</cell></row><row><cell>Person 2</cell><cell>0.98</cell><cell>0.98</cell><cell>0.98</cell><cell>0.98</cell><cell>0.99</cell></row><row><cell>Restaurants</cell><cell>0.97</cell><cell>0.97</cell><cell>0.97</cell><cell>0.97</cell><cell>0.97</cell></row><row><cell>ABT-Buy</cell><cell>0.89</cell><cell>0.88</cell><cell>0.88</cell><cell>0.88</cell><cell>0.97</cell></row><row><cell>Amazon-GoogleProducts</cell><cell>0.72</cell><cell>0.71</cell><cell>0.72</cell><cell>0.72</cell><cell>0.73</cell></row><row><cell>DBLP-ACM</cell><cell>0.70</cell><cell>0.71</cell><cell>0.70</cell><cell>0.71</cell><cell>0.76</cell></row><row><cell>DBpedia-LinkedMDB</cell><cell>0.89</cell><cell>0.88</cell><cell>0.88</cell><cell>0.88</cell><cell>0.97</cell></row><row><cell>DBLP-GoogleScholar</cell><cell>0.81</cell><cell>0.79</cell><cell>0.78</cell><cell>0.78</cell><cell>0.92</cell></row><row><cell>Average</cell><cell>0.86</cell><cell>0.86</cell><cell>0.85</cell><cell>0.86</cell><cell>0.91</cell></row><row><cell>Standard deviation</cell><cell>0.11</cell><cell>0.11</cell><cell>0.11</cell><cell>0.11</cell><cell>0.11</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 2 :</head><label>2</label><figDesc>Average learning iteration runtime analysis (in seconds).</figDesc><table><row><cell>Datasets</cell><cell cols="2">Ligon Wombat</cell></row><row><cell>Persons 1</cell><cell>2.415</cell><cell>2.412</cell></row><row><cell>Persons 2</cell><cell>0.946</cell><cell>0.942</cell></row><row><cell>Restaurants</cell><cell>0.261</cell><cell>0.258</cell></row><row><cell>ABT-Buy</cell><cell>4.277</cell><cell>4.273</cell></row><row><cell>Amazon-GoogleProd</cell><cell>2.848</cell><cell>2.844</cell></row><row><cell>DBLP-ACM</cell><cell>4.277</cell><cell>4.273</cell></row><row><cell cols="2">DBpedia-LinkedMDB 6.158</cell><cell>6.154</cell></row><row><cell cols="3">DBLP-GoogleScholar 16.072 16.067</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 3 :</head><label>3</label><figDesc>F-Measure achieved by Ligon vs. State of the art from<ref type="bibr" target="#b4">[5]</ref> and<ref type="bibr" target="#b16">[17]</ref>.tive scenarios. The longer runtime on DBLP-GoogleScholar (roughly 16 seconds per iteration on average) is due to the large size of this dataset. Here, a parallel version of the Wombat algorithm would help improving the interaction with the user. The implementation of a parallel version of Wombat goes beyond the work presented here.</figDesc><table><row><cell>Dataset</cell><cell cols="3">Pessimistic Reweighted Simple</cell><cell>Complete</cell><cell>Ligon</cell></row><row><cell>DBLP-ACM</cell><cell>0.93</cell><cell>0.95</cell><cell>0.94</cell><cell>0.94</cell><cell>0.73</cell></row><row><cell cols="2">Amazon-GoogleProduct 0.39</cell><cell>0.43</cell><cell>0.53</cell><cell>0.45</cell><cell>0.71</cell></row><row><cell>ABT-Buy</cell><cell>0.36</cell><cell>0.37</cell><cell>0.37</cell><cell>0.36</cell><cell>0.93</cell></row><row><cell>Average</cell><cell>0.77</cell><cell>0.78</cell><cell>0.77</cell><cell>0.74</cell><cell>0.89</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_0">See https://www.w3.org/RDF/ for more details.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_1">We define the edit similarity of two strings s and t as (1 + lev(s, t)) −1 , where lev is the Levenshtein distance.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_2">http://oaei.ontologymatching.org/2010</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgments. This work has been supported by the EU H2020 project Know-Graphs (GA no. 860801) as well as the BMVI projects LIMBO (GA no. 19F2029C) and OPAL (GA no. 19F2028A).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Active learning with uncertain labeling knowledge</title>
		<author>
			<persName><forename type="first">M</forename><surname>Fang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Pattern Recognition Letters</title>
		<imprint>
			<biblScope unit="volume">43</biblScope>
			<biblScope unit="page" from="98" to="108" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note>Supplement C. ICPR2012 Awarded Papers</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">The WEKA data mining software: an update</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Hall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Frank</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Holmes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Pfahringer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Reutemann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">H</forename><surname>Witten</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGKDD Explorations</title>
				<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Active learning of expressive linkage rules using genetic programming</title>
		<author>
			<persName><forename type="first">R</forename><surname>Isele</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bizer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Web Sem</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="page" from="2" to="15" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Silk -generating RDF links while publishing or consuming linked data</title>
		<author>
			<persName><forename type="first">A</forename><surname>Jentzsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Isele</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bizer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ISWC Posters &amp; Demons</title>
				<meeting>the ISWC Posters &amp; Demons</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Semi-supervised instance matching using boosted classifiers</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kejriwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Miranker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web. Latest Advances and New Domains</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Evaluation of entity resolution approaches on real-world match problems</title>
		<author>
			<persName><forename type="first">H</forename><surname>Köpcke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Thor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Rahm</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Proc. VLDB Endow</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">1-2</biblScope>
			<biblScope unit="page" from="484" to="493" />
			<date type="published" when="2010-09">Sept. 2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Introduction to information retrieval</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Raghavan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Schütze</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">A survey of current link discovery frameworks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Nentwig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hartung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Ngomo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Rahm</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Semantic Web</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="419" to="436" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">LIMES -A time-efficient approach for large-scale link discovery on the web of data</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Ngomo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Auer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Joint Conference on Artificial Intelligence</title>
				<meeting>the International Joint Conference on Artificial Intelligence<address><addrLine>Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">RAVEN -active learning of link specifications</title>
		<author>
			<persName><forename type="first">A.-C</forename><surname>Ngonga Ngomo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Auer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Höffner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 6th International Workshop on Ontology Matching</title>
				<meeting>the 6th International Workshop on Ontology Matching<address><addrLine>Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Eagle: Efficient active learning of link specifications using genetic programming</title>
		<author>
			<persName><forename type="first">A.-C. Ngonga</forename><surname>Ngomo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lyko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Extended Semantic Web Conference</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="149" to="163" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Coala-correlation-aware active learning of link specifications</title>
		<author>
			<persName><forename type="first">A.-C</forename><surname>Ngonga Ngomo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lyko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Christen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Extended Semantic Web Conference</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Unsupervised learning of link discovery configuration</title>
		<author>
			<persName><forename type="first">A</forename><surname>Nikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>D'aquin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Motta</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Extended Semantic Web Conference</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Activelink: deep active learning for link prediction in knowledge graphs</title>
		<author>
			<persName><forename type="first">N</forename><surname>Ostapuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cudré-Mauroux</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The World Wide Web Conference</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">JCLAL: A java framework for active learning</title>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">G R</forename><surname>Pupo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Pérez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Del Carmen Rodríguez-Hernández</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">M</forename><surname>Fardoun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ventura</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Mach. Learn. Res</title>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Learning from multiple annotators: Distinguishing good from random labelers</title>
		<author>
			<persName><forename type="first">F</forename><surname>Rodrigues</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Pereira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ribeiro</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Pattern Recognition Letters</title>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">WOMBAT -A Generalization Approach for Automatic Link Discovery</title>
		<author>
			<persName><forename type="first">M</forename><surname>Sherif</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A.-C. Ngonga</forename><surname>Ngomo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">14th Extended Semantic Web Conference</title>
				<meeting><address><addrLine>Slovenia</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">A probabilistic model of active learning with multiple noisy oracles</title>
		<author>
			<persName><forename type="first">W</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neurocomputing</title>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Challenges and opportunities for trust management in crowdsourcing</title>
		<author>
			<persName><forename type="first">H</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Miao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>An</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the The 2012 IEEE/WIC/ACM WI-IAT &apos;12</title>
				<meeting>the The 2012 IEEE/WIC/ACM WI-IAT &apos;12<address><addrLine>USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">An up-to-date comparison of state-of-the-art classification algorithms</title>
		<author>
			<persName><forename type="first">C</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Almpanidis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Expert Systems with Applications</title>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
