<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Towards Linked Data Fact Validation through Measuring Consensus</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Shuangyan</forename><surname>Liu</surname></persName>
							<email>shuangyan.liu@open.ac.uk</email>
							<affiliation key="aff0">
								<orgName type="institution">The Open University</orgName>
								<address>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Enrico</forename><surname>Mathieu D'aquin</surname></persName>
							<email>mathieu.daquin@open.ac.uk</email>
							<affiliation key="aff0">
								<orgName type="institution">The Open University</orgName>
								<address>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><surname>Motta</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">The Open University</orgName>
								<address>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Towards Linked Data Fact Validation through Measuring Consensus</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">C75C99D0E88A69487848E0FF3CE6D7DD</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T23:43+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Linked Open Data</term>
					<term>Data Quality</term>
					<term>Fact Validation</term>
					<term>Semantic Similarity</term>
					<term>DBpedia</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In the context of linked open data, different datasets can be interlinked together, thereby providing rich background knowledge for a dataset under examination. We believe that knowledge from interlinked datasets can be used to validate the accuracy of a linked data fact. In this paper, we present a novel approach for linked data fact validation using linked open data published on the web. This approach utilises owl:sameAs links for retrieving evidence triples, and a novel predicate similarity matching method. It computes the confidence score of an input fact based on weighted average of similarity of the evidence triples retrieved. We also demonstrate the feasibility of our approach using a sample of facts extracted from DBpedia.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Linked datasets created from unstructured sources are likely to contain factual errors <ref type="bibr" target="#b4">[5]</ref> (e.g. a wrong population number for a country). Measuring the semantic accuracy of linked sources is viewed as one of the challenging dimensions for data quality assessment <ref type="bibr" target="#b7">[8]</ref>. Zaveri et al. defined semantic accuracy as "the degree to which data values correctly represent the real world facts." <ref type="bibr" target="#b7">[8]</ref> A simple example to illustrate this would be: when our search engine returns the state where New York City is located as CA, this is viewed as semantically inaccurate since the state CA does not represent the real world state of NYC, i.e. NY.</p><p>Different approaches were discussed in previous studies <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b4">5]</ref> for linked data semantic accuracy measurement. The DeFacto approach <ref type="bibr" target="#b2">[3]</ref> validated facts by retrieving webpages that contain the actual statement phrased in natural language using search engines and fact confirmation method. Paulheim and Bizer presented in <ref type="bibr" target="#b4">[5]</ref> an algorithm for detecting type incompletion based on the statistical distributions of properties and types, and an algorithm for identifying wrong statements by finding large deviation between actual types of the subject and/or objects and apriori probabilities given by the distribution.</p><p>However, no studies have investigated how to validate linked data facts leveraging the very nature of linked data (via collecting matched evidence triples from other linked sources). This paper presents an approach for RDF facts validation by collecting consensus from other linked datasets. Owl:sameAs links are followed to collect triples describing same real-world entities in other datasets. A predicate matching method is described to collect "equivalent" facts and a consensus measure is presented to quantify the agreement among the sources.</p><p>The rest of the paper is structured as follows. Section 2 presents the details of our approach. The method and results of an experiment with sample facts from DBpedia are described in Section 3. Finally, we conclude in Section 4 and provide an outlook for future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Approach</head><p>Subject Links Crawling and Cleaning. The first task addressed in this subsection deals with the process of automatically collecting the resource or subject links equivalent to the subject of the input fact(s). We approach the problem in two steps. Firstly, the values of the property owl:sameAs<ref type="foot" target="#foot_0">1</ref> of the subject of a fact are retrieved. It can be achieved by querying the underlying dataset of the input fact. Secondly, we fetch the equivalent subject links via querying the http://sameas.org service.</p><p>There may be duplicated and non-resolvable subject links in the results obtained via owl:sameAs and the http://sameas.org service. The duplication cases can happen since two separate services are used and the resources that they provide may overlap. It can also be due to the fact that the underlying dataset contains multilingual versions of the same resources and link them together via owl:sameAs. In addition, there are several reasons for non-resolvable subject links. The resources may have been deleted from the underlying dataset while the value of the relevant owl:sameAs property not being updated coordinately. The services of publishing the datasets may be down or have retired.</p><p>The erroneous subject links need to be cleaned before the next task can be performed effectively and efficiently. We follow the following steps for cleaning the errors. First, all subject links are verified by "pinging" the corresponding URIs. If a valid response is received within a given timeout, the subject links are considered as resolvable. Second, duplicated subject links are removed if they have the identical URIs. Finally, multilingual versions of the same resource are removed from the result set.</p><p>In our approach the reliability of the subject links are determined according to the provenance of the subject links, i.e., the methods or services used to retrieve the links, for example, the DBpedia owl:sameAs property and the http: //sameas.org service. Details of how to determine the reliability of the subject links are addressed later. The provenance information of the subject links are retained for calculating the confidence score of an input fact.</p><p>Predicate Links and Objects Retrieving. The next task of fact validation is collecting all triples that use the collected resources as the subject links. This problem cannot be tackled by simply dereferencing the URIs of the collected subject links. <ref type="foot" target="#foot_1">2</ref> There are three reasons. First, not all of the corresponding URIs can be dereferenced such as the URI of the mosquito Aedes vexans. <ref type="foot" target="#foot_2">3</ref> Second, some dereferenceable URIs may not return the real data of the resources since they were redirected to somewhere else, e.g. yago:Borough of Buckingham. <ref type="foot" target="#foot_3">4</ref>Finally, the content types of the representation of the information resources obtained via dereferencing can be different.</p><p>The non-dereferenceable URIs are removed from the set of subject links as a result of performing the subject links cleaning task. For those dereferenceable URIs, a combination of methods are applied to extract the desired predicates and objects, and convert them to a uniform format for performing the subsequent tasks.</p><p>The first method used in our approach is HTTP GET with the resource URI and content negotiation. It allows to obtain the RDF facts of an information resource in most cases. Programming libraries such as the Jena API<ref type="foot" target="#foot_4">5</ref> can be used to extract the desired data from the RDF data. The second method is HTTP GET with a SPARQL query to a dataset endpoint. This method is adopted when the resource URIs cannot return the real data of that resources, and there is a SPARQL endpoint associated with that knowledge base. Last but not the least, when there are only dumps of data available from the knowledge bases, e.g. Wikidata, <ref type="foot" target="#foot_5">6</ref> , particular toolkits can be developed to extract desired data from the dumps. Predicate Similarity Measurement. After completing the beforementioned tasks, a large amount of triples with subjects being equivalent to the subject links of the input facts are collected. The objective of the next task is selecting the evidence triples that have predicates matching the predicates of the input facts.</p><p>We choose to measure the predicate similarity based on the semantic similarity between the predicates of the input facts and the collected triples. String similarity measures such as the Trigram similarity metric <ref type="bibr" target="#b0">[1]</ref> are not used since they cannot effectively detect predicates which are composed of different words but actually have the same meaning. For example, the property dbpedia-owl:popu-lationTotal and the property yago:hasNumberOfPeople should be identified as highly related.</p><p>There are a number of semantic relatedness measures available including Jiang &amp; Conrath <ref type="bibr" target="#b1">[2]</ref>, Resnik <ref type="bibr" target="#b5">[6]</ref>, Lin <ref type="bibr" target="#b3">[4]</ref>, and Wu &amp; Palmer <ref type="bibr" target="#b6">[7]</ref>. They rely mas-sively on the enormous store of knowledge available in WordNet. <ref type="foot" target="#foot_6">7</ref> The principle of our approach for detecting highly related predicates is applying a suitable semantic relatedness measure on the predicates of the evidence triples. In addition, our method is based on WS4J<ref type="foot" target="#foot_7">8</ref> which can generate a matrix of pairwise similarity scores for two input sentences, according to selected semantic relatedness measures. WS4J implements several semantic similarity algorithms described earlier.</p><p>Many predicates use compound words such as dbpedia-owl:populationTotal and yago:hasNumberOfPeople. Thus, our method should be able to handle predicates of compound words as well as predicates composed of single words. Our method consists of three parts. First, a compound word splitter is used to transform predicate names into space separated words (i.e. sentences). Second, a matrix of pairwise similarity scores are generated for two input sentences by the means of WS4J. Finally, formulas are defined to measure the semantic similarity of the input sentences (i.e. the predicates) using the pairwise similarity matrix. Table <ref type="table">2</ref> provides an example of the pairwise similarity matrix for the sentences "population Total" and "has Number Of People" (as generated by WS4J). Let r be the number of rows of a similarity matrix and c the number of columns of the matrix. The scores in the n th row or column are represented by the sets S row (n), S column (n) respectively. For each word in the shorter sentence (either r ≤ c or r &gt; c), we choose the max score in the row or column where the word lies as the semantic similarity score of that word, noted as W (n). This leads to the following formula:</p><formula xml:id="formula_0">W (n) = max ( S row ( n ) ) if r ≤ c max ( S column ( n ) ) if r &gt; c<label>(1)</label></formula><p>Moreover, let Φ(W ) be the set of similarity scores of the words in the shorter sentence of a similarity matrix, and k the number of values in the set. If any word in the shorter sentence has a value of similarity greater than the threshold θ, then the two input sentences may have similar meaning. Thus we define the average of the scores belonging to Φ(W ), P , as the semantic similarity score for the two input sentences (i.e. the predicates). Thus, it leads to the following formula:</p><formula xml:id="formula_1">P = W ∈Φ(W ) W k with ∃ W ∈ Φ(W ) and W &gt; θ<label>(2)</label></formula><p>If no word in the shorter sentence has a value of similarity greater than the threshold θ, then the two input sentences can not have similar meaning. In this case, the value of the similarity score for the two input sentences is assigned to zero.</p><p>To obtain the set of matched predicates for the predicate of the input facts, a threshold is applied, e.g., all predicates with P ≥ 0.5 are considered as matched predicates.</p><p>Confidence Calculation. As mentioned in the first task above, the reliability of the subject links collected are determined according to the provenance of the subject links (i.e., owl:sameAs and http://sameas.org service). A weighting factor is assigned to the subject links of the evidence triples to represent their reliability. The value of a weighting factor ranges from 1 to 5. The greater the value, the more reliable the subject link is.</p><p>We define a confidence score for the input fact to represent the degree to which the evidence triples agree with the input fact (or triple). The confidence of the input fact is based on the weighted average of the values of the objects of the evidence triples, represented as γ.</p><p>The values of the objects, defined as ν, are considered to be literal values (either numerical or string). If the type of the objects is string, string similarity scores of the objects for the input facts and the evidence tripes are applied as the values of ν. If the type of the objects is numerical, the numerical values of the objects are directly used. The weight ω is the product of the reliability of the subject link and the similarity of the predicate link of an evidence triple. Additionally, let m be the number of evidence triples collected through the abovementioned tasks. Thus, γ is represented as: <ref type="formula">3</ref>) is applied to represent the confidence score of an input fact where the value of the objects of the evidence triples are the type of string.</p><formula xml:id="formula_2">γ = m i=1 ω i • ν i m j=1 ω j (3) Formula (</formula><p>Furthermore, the following formula is applied to represent the confidence score of the input fact, denoted as Γ when the values of the objects are numerical. In Formula (4) x represents the numerical value of the object of the input fact while γ is the weighted average number calculated via formula <ref type="bibr" target="#b2">(3)</ref>.</p><formula xml:id="formula_3">Γ = 1 − (x − γ) 2 γ<label>(4)</label></formula><p>Based on Formula (4), a smaller difference in the numerical values of the objects between the input fact and the weighted average value will lead to a higher confidence score.</p><p>In order to test the feasibility of the approach described in the previous section, we conducted an experiment with a property from DBpedia (dbpedia:popu-lationTotal) and a sample of facts using this property as the predicate. This property was selected since the type of its values are numerical.</p><p>We made a query to the DBpedia SPARQL endpoint for obtaining all towns in Milton Keynes that have a population of more than 10,000. The resulting 18 triples were utilised as the input facts. The subjects of these facts were used as seeds to crawl equivalent subject links from other knowledge bases.</p><p>The number of subject links retrieved for a single fact ranges from dozens to several hundred. For example, dbpedia:Stantonbury has 23 subject links found while dbpedia:Buckingham has 232 subject links retrieved. The number of the cleaned subject links is reduced greatly, ranging from a few to several tens.</p><p>We selected a representative resource dbpedia:Buckingham to examine the correctness of the subject links cleaning process. A total of 207 noise subject links were found for the resource dbpedia:Buckingham. It consisted of 172 nonresolvable links, and 35 duplicate links. We manually examined the causes of the non-resolvable links, and corrected 56 out of 172 as valid links (Figure <ref type="figure" target="#fig_0">1</ref>). Initially the 56 links were identified as invalid links due to a small value of the read timeout field set for the tool used for the subject links cleaning process. It allowed us to adjust the timeout field for a suitable value. We also found that different data access services were provided by the knowledge bases where the subject links originated from. Accordingly, we needed to adopt different methods to deal with this diversity in terms of retrieving the predicate links and objects from these knowledge bases.</p><p>In addition, the compound word splitter<ref type="foot" target="#foot_8">9</ref> was utilised in the predicate similarity measurement process. It could split compound predicate names into sen- tences. The Wu &amp; Palmer <ref type="bibr" target="#b6">[7]</ref> semantic similarity measure (WUP) was selected since the result similarity scores are normalised from 0 to 1. We also tested other measures such as Lin <ref type="bibr" target="#b3">[4]</ref>. The WUP measure demonstrated the highest rate of correctness (threshold θ ≥ 0.8). The distribution of the predicate similarity scores generated is provided in Figure <ref type="figure" target="#fig_1">2</ref>. Furthermore, 45% of the sample facts (i.e. statements about the population of the 18 subjects) were assigned to a confidence score and 55% were not (as no evidence triples were found). Figure <ref type="figure" target="#fig_2">3</ref> demonstrates the distribution of the confidence scores generated for the sample facts. 22% of the facts were identified highly reliable (Γ ≥ 0.9). Two facts were assigned to very low confidence scores (0.04 and -68.58). We manually examined the causes of the low confidence values, and discovered that a matched triple for each fact had a very large or small population number. It caused the weight average of the object values of the evidence triples to be too large or small. It was due to the fact that the subject links of the erroneous triples (retrieved from sameas.org service) were pointed to resources not identical to the subjects of the facts (wrong subject links). We corrected the errors by removing the erroneous triples from the set of evidence triples. It leaded to the fact (initially with 0.04 confidence) to get a much higher confidence (0.94), and no confidence score produced for the fact (initially with -68.58 confidence) because no evidence triples are found. Based on this experiment, we plan to extend our approach to verify abnormal evidence triples with "fake" subject links in future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusion and Future Work</head><p>In this paper, we presented an approach for validating linked data facts using RDF triples retrieved from open knowledge bases. Our approach enables the assessment of the accuracy of facts using the vast interlinked RDF resources on the Web. This would become increasingly important due to the fast growth of LOD on the Web.</p><p>The presented work is still at its early stage, the experiment discussed in this paper focused on testing the feasibility of each component of the presented approach. This can help refine our approach before an evaluation of the approach as a whole is carried out. We are planning to demonstrate that the proposed approach can be applied proficiently to arbitrary predicates, and evaluate the predicate similarity matching method with standard evaluation measures (Precision/Recall) on well-known datasets. Moreover, we are also going to define a gold standard and apply the standard for evaluating our method for validating RDF facts.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Correctness of non-resolvable subject links cleaning for dbpedia:Buckingham with analysis of causes (Total=172)</figDesc><graphic coords="6,224.41,426.47,162.39,84.11" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Distribution of predicate similarity by applying the WUP semantic similarity measure and Formulas (1) and (2)</figDesc><graphic coords="7,220.88,122.68,172.84,103.29" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Confidence of the sample of facts collected from DBpedia</figDesc><graphic coords="7,226.98,363.08,173.81,100.06" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Pairwise semantic similarity matrix for two input sentences.</figDesc><table><row><cell cols="3">has Number of</cell><cell>People</cell></row><row><cell>population 0.0</cell><cell>0.4286</cell><cell cols="2">0.0 0.9091</cell></row><row><cell>Total 0.0</cell><cell>1.0</cell><cell cols="2">0.0 0.3636</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">The following namespace conventions are used in this document: owl=http: //www.w3.org/2002/07/owl, dbpedia=http://dbpedia.org/resource/, dbpedia-owl=http://dbpedia.org/ontology/, dbpprop=http://dbpedia.org/ property/, yago=http://yago-knowledge.org/resource/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">According to the W3Cs note on dereferencing HTTP URIs, the act of retrieving a representation of a resource identified by a URI is known as dereferencing that URI, http://www.w3.org/2001/tag/doc/httpRange-14/2007-05-31/HttpRange-14</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">http://lod.geospecies.org/ses/4XSQO</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">http://tinyurl.com/mxdkv4s</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">https://jena.apache.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">http://www.wikidata.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">http://wordnet.princeton.edu/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_7">https://code.google.com/p/ws4j/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_8">http://www.lina.univ-nantes.fr/?Compound-Splitting-Tool.html</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Automatic spelling correction using a trigram similarity measure</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">C</forename><surname>Angell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">E</forename><surname>Freund</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Willett</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information Processing &amp; Management</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="255" to="261" />
			<date type="published" when="1983">1983</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Semantic similarity based on corpus statistics and lexical taxonomy</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">W</forename><surname>Conrath</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of International Conference on Research in Computational Linguistics</title>
				<meeting>International Conference on Research in Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Defacto-deep fact validation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gerber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Morsey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C N</forename><surname>Ngomo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web-ISWC 2012</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="312" to="327" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">An information-theoretic definition of similarity</title>
		<author>
			<persName><forename type="first">D</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ICML</title>
		<imprint>
			<biblScope unit="volume">98</biblScope>
			<biblScope unit="page" from="296" to="304" />
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Improving the quality of linked data using statistical distributions</title>
		<author>
			<persName><forename type="first">H</forename><surname>Paulheim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bizer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal on Semantic Web and Information Systems (IJSWIS)</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="63" to="86" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Using information content to evaluate semantic similarity in a taxonomy</title>
		<author>
			<persName><forename type="first">P</forename><surname>Resnik</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 14th International Joint Conference on Artificial Intelligence</title>
				<meeting>the 14th International Joint Conference on Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="1995">1995</date>
			<biblScope unit="page" from="448" to="453" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Verbs semantics and lexical selection</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Palmer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 32nd annual meeting on Association for Computational Linguistics</title>
				<meeting>the 32nd annual meeting on Association for Computational Linguistics</meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="1994">1994</date>
			<biblScope unit="page" from="133" to="138" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Quality assessment for linked data: A survey</title>
		<author>
			<persName><forename type="first">A</forename><surname>Zaveri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rula</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Maurino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Pietrobon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Auer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Semantic Web journal</title>
				<imprint/>
	</monogr>
	<note>to appear</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
