<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Evaluation of Analogical Inferences Formed from Automatically Generated Representations of Scientific Publications</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Yalemisew</forename><surname>Abgaz</surname></persName>
							<email>abgaz.yalemisew@nuim.ie</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">Maynooth University</orgName>
								<address>
									<settlement>Maynooth</settlement>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Diarmuid</forename><surname>O'donoghue</surname></persName>
							<email>diarmuid.odonoghue@nuim.ie</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">Maynooth University</orgName>
								<address>
									<settlement>Maynooth</settlement>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Dmitry</forename><surname>Smorodinnikov</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">Maynooth University</orgName>
								<address>
									<settlement>Maynooth</settlement>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Donny</forename><surname>Hurley</surname></persName>
							<email>donny.hurley@nuim.ie</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">Maynooth University</orgName>
								<address>
									<settlement>Maynooth</settlement>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Evaluation of Analogical Inferences Formed from Automatically Generated Representations of Scientific Publications</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">7F6897DA073840FDA7AB08B2B4C1B9DD</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T14:13+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Humans regularly exploit analogical reasoning to generate potentially novel and useful inferences. We outline the Dr Inventor model that identifies analogies between research publications, describing recent work to evaluate the inferences that are generated by the system. Its inferences, in the form of subjectverb-object triples, can involve arbitrary combinations of source and target information. We evaluate three approaches to assess the quality of inferences. Firstly, we explore an n-gram based approach (derived from the Dr Inventor corpus). Secondly, we use ConceptNet as a basis for evaluating inferences. Finally, we explore the use of Watson Concept Insights (WCI) to support our inference evaluation process. Dealing with novel inferences arising from an ever growing corpus is a central concern throughout.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Introduction</head><p>An analogy is a comparison between two concepts (the source and target), where the comparison itself is somewhat novel and interesting due to differences between the two concepts. Based on their perceived similarities and subsequently extending them is called analogical inference and such inferences often cast new information onto the target using information obtained from the source. Such comparisons aid our understanding of less well-known concepts, by "re-cycling" other information. Analogy requires systematic comparison of the structure of the two concepts involved. Analogical reasoning is used in education <ref type="bibr" target="#b0">[1]</ref>, scientific discovery <ref type="bibr" target="#b1">[2]</ref>, and to explain and discover new knowledge about less-known systems. However, analogical inferences are not be always true and can be misleading <ref type="bibr" target="#b3">[3]</ref>.</p><p>Analogical reasoning <ref type="bibr" target="#b4">[4]</ref> focuses on three main processes: 1) Retrieval of a source for a given target, 2) Mapping [5] <ref type="bibr" target="#b5">[6]</ref> the source to the target by structural alignment and inferences generation <ref type="bibr" target="#b1">[2]</ref>, 3) Evaluation, where inferences are judged <ref type="bibr" target="#b3">[3]</ref> and potentially rejected. Elsewhere we <ref type="bibr" target="#b6">[7]</ref> [8] described our analogy model ("Dr Inventor") that discovers analogies between scientific documents-but validating the resulting inferences is crucial to the successful use of Dr Inventor. This paper describes an inference evaluation model for use by Dr Inventor that aims to remove invalid inferences while preserving the good inferences. Thus, we present an n-gram based familiarity analysis method and try to answer the following main questions: 1) How to differentiate familiar/good inferences from those that are unfamiliar/bad inferences, 2) How different knowledge sources can be used and how they affect analysis of familiarity of inferences, and 3) which metrics can be used and how can they be tuned to measure the degree of familiarity of the inferences. We expect to find similar inferences: 1) made by other papers, 2) exhibit strong associations between the subjects, verbs and objects, and 3) be familiar to human evaluators.</p><p>The rest of this paper is organized as follows. Section 2 focuses on some related work in the area of analogical reasoning and evaluation of inferences and Section 3 gives highlights for our analogy mapping model with a well-known analogy example and how inferences are generated, followed by a detailed examination of our validation model. In Section 4 we present the experiment and evaluation results. Finally, section 5 focuses on conclusion and future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>Thinking with analogies is a form of structure driven reasoning that appears to play a role in many different areas of intelligence. Computational modelling of this cognitive process is enabled through Gentner's <ref type="bibr" target="#b1">[2]</ref> Structure Mapping Theory (SMT). This theory posits that to find the analogical similarity between the source and target, we must identify the largest common sub-graph between the source and target structures. Since its inception, SMT has led to focused work on distinct phases of analogy, particularly on the retrieval and mapping phases.</p><p>The key algorithm for generating inferences is called CWSG -Copy With Substitution and Generation <ref type="bibr" target="#b8">[9]</ref>. Building upon the inter-graph mapping, CWSG identifies structures from the source that can be transferred to the target. But CWSG is blind to the potential credibility of its inferences. As noted by <ref type="bibr" target="#b9">[10]</ref> and others, analogy is a profligate inference mechanism giving rise to our inference evaluation system.</p><p>Several attempts have been made to evaluate analogical inferences. <ref type="bibr" target="#b3">[3]</ref> argue that the strength of analogical inferences depends on the level of similarity between a source and a target. Humans are highly selective analogizers and they focus on relational pattern completion which (they argue) effectively filters out bad inferences. Dr Inventor does not have access to the expertise required to support such filtering, so we shall adopt a different approach. <ref type="bibr" target="#b10">[11]</ref>, used analogical reasoning techniques (inference rules) to generate the facts, it uses humans to evaluate the plausibility of the inferences and 35.6% of the inferences express new true inferences. However, it lacks any automatic evaluation method and thus, is not applicable to Dr Inventor.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>3</head><p>Inferences with Dr Inventor</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Generating Graph Representations of Research Information</head><p>This section outlines the preprocessing and mapping phases of Dr Inventor. It accepts academic publications as input (such as PDF documents). Text extraction using PDFX resolves complications like: headers, footers, equations, table, page numbers etc.</p><p>Identified text is passed to a state-of-the-art natural language processing pipeline to generate dependency trees. The parser includes a classifier that classifies sentences according to their rhetorical category (abstract, background, approach, outcome, future work). Details of the text processing pipeline are discussed in <ref type="bibr" target="#b11">[12]</ref>.</p><p>Using the output of the text processing pipeline, we convert the information from the dependency tree to a Research Object Skeleton (ROS) graph that efficiently represent the concepts (nouns) and relationships (verbs) of each sentence. A ROS-graph captures contents of the input text in a form of subject-verb-object triples constructed from each sentence. Using co-reference resolution that is built into the dependency parser, multiple occurrences of the same concept are uniquely represented within each ROS. Co-reference resolution greatly improves ROS graph quality, linking words like "it" to their referents. Each ROS represents each concept uniquely across the entire document. Interestingly, this echoes a recent work on embodied cognition identifying three reasons for unique representation <ref type="bibr" target="#b12">[13]</ref>.</p><p>All triples for a document form a large interconnected ROS graph, though some disconnected triples also occasionally arise. Subgraphs can be extracted for lexical or rhetorical subsection of papers. We demonstrate ROS generation with the following example. The abstract of the source paper ("Gaussian KD-tree for fast high-dimensional filtering" 1 ) is found analogous to the target paper ("Linear Combination of transformation" 2 ) and the method applied to the source is analogically applicable to the target paper's problem. The two abstracts are transferred into a ROS graph (Fig. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Source Paper. We propose a method for accelerating a broad class of non-linear filters that includes the bilateral, non-local means, and other related filters. These filters can all be expressed in a similar way: First, assign each value to be filtered a position in some vector space. Then, replace every value with a weighted linear combination of all values…</head><p>Target Paper. Geometric transformations are most commonly represented as square matrices in computer graphics. Following simple geometric arguments we derive a natural and geometrically meaningful definition of scalar multiples and a commutative addition of transformations based on the matrix representation, given that the matrices have…</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Analogy Mapping</head><p>After constructing the ROS graph, Dr Inventor commences the analogical mapping process, which is based on structure mapping theory <ref type="bibr">[2] [14]</ref>. It uses subgraph isomorphism for finding the best alignment between the source and target graphs.</p><p>A ROS is a form of attributed relational graph with labels as types to identify the conceptual category of the graph as "noun" or "verb", the mapping process only maps nodes that are in the same conceptual category. This constraint further reduces the time required by the graph matching process by significantly reducing the search space. Our algorithm ranks nodes based on some centrality metrics (Degree, Node rank) and starts the mapping from the most "central" node, to further expedite this process. The graph matching is primarily guided by structure (comparing in degree, our degree etc) and complemented by the WordNet <ref type="bibr" target="#b14">[15]</ref> based Lin semantic similarity metric <ref type="bibr" target="#b15">[16]</ref> when a single node of the target structurally matches two or more candidate nodes from the source. Thus, mapping occurs between two most structurally similar nodes and when two or more nodes have equal importance, we select pairs that have highest semantic similarity. We customized the (sub-)graph isomorphism algorithm (VF2) <ref type="bibr" target="#b16">[17]</ref> to identify analogies between two ROS graphs. VF2 was selected due to its efficiency in search time, as Dr Inventor is expected to explore many mappings in order to find novel and useful comparisons. A snippet of the mapping of the example is given in Table <ref type="table" target="#tab_0">1</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Analogical Inference</head><p>In this section we will discuss our proposed analogical transfer and evaluation system. Inference generation uses the mapping pairs, the source ROS and the target ROS to identify the candidate inferences. There are constraints we defined to identify candidate nodes for transfer -or candidate inferences. The Dr Inventor system is designed as a creativity support tool, identifying novel comparisons between publications. This novelty requirement inevitability results in inferences that involve new (previously unseen) inference that must be evaluated for their likely usefulness. Novel inferences involve novel combinations of subject, verb and object terms originating in the two publications. A typical novel inference might involve two terms from the source publication (say) while the other is from the targetas exemplified below.</p><p>Source: subjects, verbs, objects. Target: subjectt, verbt, objectt. Candidate Inference: subjectt, verbs, objectt. Some of the bad triples include "mirror ask butter" and "bridge phone sun". Particular challenges for Dr Inventor include: 1) evaluating novel combinations of subject, verb and object 2) evaluating candidate inferences between arbitrary pairs of publications 3) dealing with an ever changing corpus of documents.</p><p>All candidate inferences may be referred back to mapped pairs -enabling use of the "grounded inference" constraint. This means, for a node to be considered as candidate for the transfer, it should be linked to one or more of the nodes that are already mapped to the target node. We split the general constraint into three simple constraints: constraint on verbs, constraint on nouns and constraint on edges (Fig <ref type="figure">2)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Fig. 2. Analogical transfer from source to target</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Candidate verb inference.</head><p>A verb node is considered as a candidate for inference if there is an edge that links it to the source end of a mapping pair or if it forms a link between two transferred noun nodes (Figure <ref type="figure">2</ref>. B or D). Candidate noun inference. A noun node is considered as a candidate for inference if there is an edge that links it to the source end of a mapping pair or if it forms a link between two transferred verb nodes (Figure <ref type="figure">2</ref>. A or D). Candidate edge inference. An edge is transferred as a candidate inference if it is linked to the candidate verb node or the candidate noun node or if it links two mapped nodes in the source (Figure <ref type="figure">2</ref>. C). Only inferences that match these constraints are considered a viable inferences, being sufficiently connected to the underlying mapping. It further requires to put additional constraints to determine the number of nodes that should be transferred. If the number of nodes transferred to the target is greater than the number of mapping nodes, it becomes less usable. One example of an inferred triple from the previous analogy is "definition include bilateral" where "include bilateral" is transferred and attached to "definition".</p><p>Familiarity as a Basis for Validating Inferences. The two defining characteristics of creativity are Novelty and Quality and in this paper we explore the use of "familiarity" as a basis for the joint evaluation of novelty and quality. We start with an n-gram-based technique that deals well with familiar inferences, followed by partial evaluation of subj-verb, verb-obj or subj-obj pairs as partial evaluation of inferences. The n-gram approach is extended by exploring several "smoothing" techniques to estimate the familiarity of unseen triples. Finally, we extend these techniques by exploring ConceptNet and Watson Concept Insights for assessing quality of novel inferences. While these approaches estimate quality, it should be pointed out that for any "collection" of inference to be considered truly creative, we expect that a number of these inferences will not successfully validated. Any collection of inferences all of which are familiar can be rejected as it does not offer sufficient novelty! Conversely, if all inferred information is invalidated (thus is considered very novel) this too could be rejected as a useful comparison as it could place too high burden on the user.</p><p>Our concern with creative comparison and creative inferences is that the resulting creativity should contain an appropriate level of novelty. Work is currently ongoing to assess the optimum balance of familiar and novel information with which to serving users creative needs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Validation of Inferences with n-grams.</head><p>Inference validation focuses on evaluating the degree of strength of a triple using familiarity analysis. We use n-gram based methods to calculate the familiarity of inferences. We later used these scores to rank triples based on their familiarity, showing how they can be applied to the novel triples that Dr Inventor aims to generate.</p><p>n-gram model. We employ an n-gram model to evaluate how good a given sequence of words fit together. Thus the probability of a series of words is given by</p><formula xml:id="formula_0">‫ݓ‪ሺ‬‬ ଵ , ‫ݓ‬ ଶ , ‫ݓ‬ ଷ … ‫ݓ‬ ሻ = ‫ݓ‪ሺ‬‬ ଵ ሻ‫‬ሺ‫ݓ‬ ଶ ‫ݓ|‬ ଵ ሻ‫‬ሺ‫ݓ‬ ଷ ‫ݓ|‬ ଵ , ‫ݓ‬ ଶ ሻ … ‫ݓ‪ሺ‬‬ ‫ݓ|‬ ଵ … ‫ݓ‬ ିଵ ሻ.<label>(1)</label></formula><p>This formula can be simplified by applying the Markov Assumption, which states that the probability of a word in a text depends only on n-1 preceding words. In our case the sequence of words is in a form of "subject-verb-object". As for this particular work, unigrams, bigrams and trigrams are of prime interest. The probability of a word depends only on one preceding word in bigram model and on two preceding words in trigram model. For unigrams, a probability of a word is independent of the preceding words. To estimate ‫ݓ‪ሺ‬‬ ‫ݓ|‬ ିଵ ሻ we need two components: 1) the count of the bigram ‫ݓ‬ ିଵ ‫ݓ‬ and 2) the count of all possible bigrams where ‫ݓ‬ ିଵ is the first word. Now we explore the n-gram models as our subject-verb-object inferences fit the ngram models. The unigram approach takes all the individual elements of the triple and calculates the probability independent of the other remaining elements. But this approach gives us little information as it doesn't tell us any information on how well each of the terms "fit together". The bigram approach calculates the probability of one element in relation to the other two elements (in the form of two separate bigram probabilities). Thus, the probability of a triple is given by ‫,ݏ‪ሺ‬‬ ‫,ݒ‬ ‫‬ሻ = ‫|ݏ‪ሺ‬‬ &lt; start ‫&lt;‪ሺ‬‪ሻ‬ݒ|‪ሺ‬‪ሻ‬ݏ|ݒ‪ሺ‬‪&gt;ሻ‬‬ end &gt; |‫‬ሻ.</p><p>(</p><formula xml:id="formula_1">)<label>2</label></formula><p>&lt;start&gt; indicate beginning and &lt;end&gt; indicate the end of a triple. And trigrams are calculated as</p><formula xml:id="formula_2">‫,ݏ‪ሺ‬‬ ‫,ݒ‬ ‫‬ሻ = ‫|ݏ‪ሺ‬‬ &lt; start ‫ݏ|ݒ‪ሺ‬‪&gt;ሻ‬‬ &lt; start ‫&lt;‪ሺ‬‪ሻ‬ݒݏ|‪ሺ‬‪&gt;ሻ‬‬ end &gt; ‫,‪ሻ‬ݒ|‬<label>(3)</label></formula><p>where, ‫ݏ‬ is Subject, ‫ݒ‬ is Verb, ‫‬ is Object. Using a trigram approach, for example, we can calculate the probability of "wedescribe-algorithm" as p("we", "describe", "algorithm") = p("we") p("describe" | "algorithm") p("algorithm" | "we", "describe"). Such n-gram model will allow us to calculate the probability of one word occurring with another in such sequence.</p><p>However, the n-gram model has an inherent problem in that if any of the probabilities are zero, then the whole probability become zero. This makes the familiarity analysis useless. To avoid this problem different methods are proposed. First, we apply synonym substitution method and then we consider two smoothing approaches called additive smoothing and Good-Turing smoothing.</p><p>Additive Smoothing. We explore additive smoothing to avoid the zero probability by replacing r occurrences of n-gram in a corpus with ‫ݎ‬ + ߜ occurrences. ߜ needs to be a small number between 0 and 1. This changes the probability to</p><formula xml:id="formula_3">‫‬ ୟୢୢ ൫‫ݓ‬ ห‫ݓ‬ ିାଵ ିଵ ൯ = ఋାሺ௪ షశభ ሻ ఋ||ା∑ ሺ௪ షశభ ሻ ೢ ,<label>(4)</label></formula><p>where V is a set of all words considered c is the count of the corresponding word.</p><p>Good-Turing Smoothing. Good-Turing smoothing uses the count of events we have seen once to predict the count of things we have never seen. This strategy tried to estimate the weight of the unseen events by reducing the probability mass of already observed events. We introduce a notation Nr a frequency of frequencies, meaning how many things occurred with frequency. Let's assume that some n-gram occurs ‫ݎ‬ times in our database. According to classical Good-Turing, should be replaced by r*, where</p><formula xml:id="formula_4">‫ݎ‬ * = ሺ‫ݎ‬ + 1ሻ ே శభ ே .<label>(5)</label></formula><p>Then the probability of an n-gram x is calculated as</p><formula xml:id="formula_5">‫‬ሺሻ = * ||<label>(6)</label></formula><p>Evaluation with ConceptNet<ref type="foot" target="#foot_2">3</ref> : ConceptNet (v5.4) is a database of concepts and their inter-relationships, representing common sense background information. Interestingly, ConceptNet provides a numeric measure to estimate degree of association between concepts. In the following sections we use it to evaluate the strength of inferred triples.</p><p>Evaluation with Watson Concept Insights: Watson Concept Insights (WCI) provides an API <ref type="bibr" target="#b17">[18]</ref> that computes the strength of conceptual associations, which we use to evaluate inferences. The concept graph used by the WCI service has been derived from the English language Wikipedia. We also use WCI as another source of formalized knowledge to evaluate individual inferences. WCI is selected particularly for its fine grained confidence score.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experiments and Results</head><p>For the experiment we generated different collections of "subject-verb-object" triples as data sets from three different sources. Then we evaluated these dataset against their respective knowledge sources. Finally, we included human evaluation of the datasets and compare them with the results we get from the system. Dr Inventor dataset contains 572,496 triples extracted from 957 computer graphics papers published between 2001 and 2015 from SIGGraph and SIGGraph-asia following the procedure in section 3.1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Overview of Evaluation Procedure</head><p>Ten human evaluators were recruited to evaluate inferences, all being selected from the computer science discipline and include lecturers, post-doctoral researchers and postgraduate students. The respondents were given the triples in a random order and the evaluation was separated in to two parts. First raters evaluated the domain specific triples (computer graphics) from Dr Inventor corpus by randomly selecting 1000 triples from the Dr Inventor collection. Their familiarity scores were calculated using both Additive and Good-Turing smoothing methods. Then we took 20 good inferences from each method (40 triples together) and 20 bad inferences (another 40 triples) and give them the 10 evaluators. The expert evaluators rated the triples on a scale of 0 to 5, where 0 denotes unfamiliar, 2-3 medium familiarity and 5 represents high familiarity.</p><p>Second, raters evaluated domain independent triples. This evaluation used Random Lists <ref type="foot" target="#foot_3">4</ref> to generate random English nouns and verbs to form (generally) bad triples and we used the corpus of contemporary American English (COCA) to identify good triples. Since COCA contains sentences extracted from fiction, popular magazines, newspapers and academic texts, we verified that the triples extracted from this corpus were meaningful and familiar. We extracted 29 familiar triples and 31 bad triples and by combining nouns and verbs randomly. Some of the bad triples include "mirror ask butter" and "bridge phone sun".</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Results and Discussion</head><p>Evaluation results from Dr Inventor Triples. Table <ref type="table" target="#tab_1">2</ref>. shows the threshold values that were determined for 1000 evaluated triples. The score is computed using the probability distribution of N-grams and threshold is decided based on the distribution of the familiarity score over a large collection. Using the above score interpretation, 70.8% of the triples are assigned the same rating category by both methods. 125 scored "high", 455 score "medium" and 128 scored "low". So, we have 70.8% agreement between the two methods. We evaluated the resulting 40 high score triples and 40 low score triples by experts, to investigate how close our approach is to human evaluators. Table <ref type="table">3</ref>. Human evaluation results of triples rated as high (left) and low (right) by our system Fig. <ref type="figure">3</ref>. Additive and Good-Turing comparison for "high" (left) and "Low" (right).</p><p>Table <ref type="table">3</ref> shows a comparison between our proposed methods and human evaluation. Triples that are evaluated as "High" by additive smoothing and Good-Turing methods are largely evaluated in the same way by humans. The triples that are evaluated as "Low" by the system however contains some triples that are evaluated as "high" by human evaluators. In general the correlation of the human evaluation and the proposed methods perform very well, with a chance of losing some good triples. Additive Additive Good-Turing Good-Turing confidence because the inferences rated as "High" by the system are usually rated as "High" by the system and a combination of the two approaches should give us a strong degree of confidence on the evaluation by the system.</p><p>We further compared the two methods to see the consistency of their results (Fig. <ref type="figure">3</ref>). Both Additive and Good-Turing methods identified triples evaluated as "High" consistently with additive smoothing showing superiority in finding good inferences. However, Good-Turing smoothing shows superior quality in identifying unfamiliar inferences. One of the main concerns for Dr Inventor here is, those inferences that are rated as bad inferences may remove some creative but uncommon triples Evaluation Results of triples using ConceptNet and WCI. Note that the global maximum association score between two concepts is 7.127 and the global minimum is 0.007. The WCI score between two words lies in the range of [0.5, 1]. Neither ConceptNet nor WCI return 0 values, so smoothing methods are not used. Humans evaluated both the random triples and the familiar triples (Table <ref type="table">5</ref>). The human evaluation agrees100%) with the familiar triples as these triples are extracted from publicly available content. The human evaluation further aligns with unfamiliar triples (93%) are rated as low (according to the threshold defined in Table <ref type="table" target="#tab_3">4</ref>). Fig. <ref type="figure">4</ref>. Human ratings for triples considered as "High" and "Low".</p><p>It is also important to mention that (Fig. <ref type="figure">4</ref>), for both methods, human scores for triples considered as "High" are overall significantly higher than human scores for triples considered as "Low". It means that both Additive Smoothing and Good-Turing are dependable at distinguishing absolutely familiar triples to humans from absolutely meaningless triples to humans. Some examples of best triples accepted by the system include "we provide method" and "we show section" and best triples rejected by the system include "property contain penalization" and "millimeter be numeric". There are also a few worst triples (e.g. "i, k, set") wrongly accepted by the system. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Score</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>ConceptNet scores WCI scores</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion</head><p>We presented our approach to evaluate the analogical inferences generated by our Dr Inventor analogical reasoning system. The subject-verb-object triples generated from the corpus were used to support an N-gram model to assess the familiarity of the novel inferences (triples) generated by the system -where familiarity was used to estimate inference validity. We further explored ConceptNet and Watson Concept Insight to evaluate these inferences. Our evaluation demonstrated that the N-gram approach is capable of differentiating good inferences from the bad ones and produced a evaluation as the human evaluation. Our experimental results further shows the possibility of ranking inferences using scores generated by our methods to direct the focus of users to the most meaningful inferences. For future work, we will further explore a unified measure incorporating all three evaluation ratings to help improve the quality of inference -and the analogies that drive them. Introduction: The place of analogy in cognition. In : </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. A snippet of ROS Graphs for Target (Left) and Source (Right) Paper.</figDesc><graphic coords="3,124.56,412.56,347.52,157.44" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>Human evaluation of triples using ConceptNet and WCI</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Sample Mappings between abstracts of the two papers</figDesc><table><row><cell>Source</cell><cell cols="3">Target Label Sim Score Source</cell><cell>Target</cell><cell>Label Sim Score</cell></row><row><cell>Position</cell><cell cols="2">Software Noun 0.261</cell><cell>Class</cell><cell cols="2">Definition Noun</cell><cell>0.096</cell></row><row><cell cols="2">Accelerate Drive</cell><cell>Verb 0.553</cell><cell>d</cell><cell cols="2">Argument Noun</cell><cell>0.000</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Thrshold values for familiarity of Dr inventor triples.</figDesc><table><row><cell>Score</cell><cell>Additive smoothing</cell><cell>Good-Turing smoothing</cell></row><row><cell>Low</cell><cell>0 &lt; ‫݁ݎܿݏ‬ 1.67 ൈ 10 ିଵସ</cell><cell>‫݁ݎܿݏ‬ 9.91 ൈ 10 ିଵ</cell></row><row><cell>Medium</cell><cell>4.99 ൈ 10 ିଵ &lt; ‫݁ݎܿݏ‬ &lt; 1.67 ൈ 10 ିଵସ</cell><cell>2.28 ൈ 10 ିହ &lt; ‫݁ݎܿݏ‬ &lt; 9.91 ൈ 10 ିଵ</cell></row><row><cell>High</cell><cell>‫݁ݎܿݏ‬ 4.99 ൈ 10 ିଵ</cell><cell>‫݁ݎܿݏ‬ 4.99 ൈ 10 ିଵ</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head></head><label></label><figDesc>This gives us a</figDesc><table><row><cell cols="2">Smoothing</cell><cell>Score</cell><cell cols="2">Triples evaluated as "high"</cell><cell cols="2">Triples evaluated as "Low"</cell></row><row><cell cols="2">technique</cell><cell></cell><cell>No</cell><cell>Percentage</cell><cell>No</cell><cell>Percentage</cell></row><row><cell>Additi</cell><cell>ve</cell><cell>High Medium Low</cell><cell>17 2 1</cell><cell>85% 10% 5%</cell><cell>3 12 5</cell><cell>15% 60% 25%</cell></row><row><cell>Good-</cell><cell>Turing</cell><cell>High Medium Low</cell><cell>15 4 1</cell><cell>75% 20% 5%</cell><cell>3 8 9</cell><cell>15% 40% 45%</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 .</head><label>4</label><figDesc>Threshold values for familiarity of COCA triples.</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://dl.acm.org/citation.cfm?doid=1576246.1531327</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">http://dl.acm.org/citation.cfm?doid=566654.566592</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">http://conceptnet-api-1.media.mit.edu/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://www.randomlists.com/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgement. The research leading to these results has received funding from the European Union Seventh Framework Programme ([FP7/2007-2013]) under grant agreement no 611383.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Analogies in science and science teaching</title>
		<author>
			<persName><forename type="first">B</forename><surname>Simon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Susan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Physiology Education</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="167" to="169" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Structure-mapping: A theoretical framework for analogy</title>
		<author>
			<persName><forename type="first">D</forename><surname>Gentner</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">Cognitive Science</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="155" to="170" />
			<date type="published" when="1983">1983</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Analogical Reasoning</title>
		<author>
			<persName><forename type="first">D</forename><surname>Gentner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Smith</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Encyclopedia of Human Behavior</title>
				<meeting><address><addrLine>Oxford, UK</addrLine></address></meeting>
		<imprint>
			<publisher>Elsevier</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="130" to="136" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Analogical mapping by constraint satisfaction</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">.</forename><surname>Holyoak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gentner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">.</forename><surname>Kokinov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gentner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">. ;</forename><surname>Holyoak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">К</forename><surname>Holyoak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Thagard</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Cognitive Science</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page" from="295" to="355" />
			<date type="published" when="1989">1989</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">The Incremental Analogy Machine: A computational Model of Analogy</title>
		<author>
			<persName><forename type="first">M</forename><surname>Keane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brayshaw</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">EWSL</title>
		<imprint>
			<biblScope unit="page" from="53" to="62" />
			<date type="published" when="1988">1988</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Stimulating and Simulating Creativity with Dr Inventor</title>
		<author>
			<persName><forename type="first">D</forename><surname>O'donoghue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Abgaz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hurley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ronzano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Saggion</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Sixth International Conference on Computational Creativity (ICCC 2015)</title>
				<meeting>the Sixth International Conference on Computational Creativity (ICCC 2015)<address><addrLine>Park City, Utah</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Embedding a Creativity Support Tool within Computer Graphics Research</title>
		<author>
			<persName><forename type="first">A</forename><surname>Yalemisew</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Diarmuid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Donny</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Horacio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Francesco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dmitry</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ECAI 2016, Workshop Modelling and Reasoning in Context (MRC)</title>
				<meeting><address><addrLine>The Hague, Netherlands</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Component processes in analogical transfer: Mapping, pattern completion, and adaptation</title>
		<author>
			<persName><forename type="first">K</forename><surname>Holyoak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Novick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Melz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Analogical connections. Advances in connectionist and neural computation theory 2</title>
				<meeting><address><addrLine>Westport, CT, US</addrLine></address></meeting>
		<imprint>
			<publisher>Ablex Publishing</publisher>
			<date type="published" when="1994">1994</date>
			<biblScope unit="page" from="113" to="180" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Language, truth and logic</title>
		<author>
			<persName><forename type="first">J</forename><surname>Alfred</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2012">2012</date>
			<publisher>Courier Corporation</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Analogical Inference over a Common Sense Database</title>
		<author>
			<persName><forename type="first">L</forename><surname>Thomas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Eighteenth National Conference on Artificial Intelligence</title>
				<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Knowledge Extraction and Modeling from Scientific Publications</title>
		<author>
			<persName><forename type="first">F</forename><surname>Ronzano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Saggion</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">the Proceedings of the Workshop &quot;Semantics, Analytics, Visualisation: Enhancing Scholarly Data&quot; co-located with the 25th International World Wide Web Conference</title>
				<meeting><address><addrLine>Montreal, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Principles of Representation: Why You Can&apos;t Represent the Same Concept Twice</title>
		<author>
			<persName><forename type="first">C</forename><surname>Louise</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Dermot</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Topics in Cognitive Science</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="390" to="406" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">The competence of sub-optimal theories of structure mapping on hard analogies</title>
		<author>
			<persName><forename type="first">T</forename><surname>Veale</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Keane</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 15th International Joint Conference on Artifical Intelligence</title>
				<meeting>the 15th International Joint Conference on Artifical Intelligence<address><addrLine>Nagoya, Japan</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1997">1997</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="232" to="237" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">WordNet: A Lexical Database for English</title>
		<author>
			<persName><forename type="first">G</forename><surname>Miller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">38</biblScope>
			<biblScope unit="issue">11</biblScope>
			<biblScope unit="page" from="39" to="41" />
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">An Information-Theoretic Definition of Similarity</title>
		<author>
			<persName><forename type="first">D</forename><surname>Lin</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1998">1998</date>
			<pubPlace>San Francisco, CA, USA</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">A (sub)graph isomorphism algorithm for matching large graphs</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">.</forename><surname>Cordella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Foggia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Sansone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vento</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Pattern Analysis and Machine Intelligence</title>
				<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="page" from="1367" to="1372" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Watson Concept Insights: A Conceptual Association Framework</title>
		<author>
			<persName><forename type="first">M</forename><surname>Franceschini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Soares</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lastras</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 25th International Conference Companion on World Wide Web</title>
				<meeting>the 25th International Conference Companion on World Wide Web<address><addrLine>Montréal, Québec, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="179" to="182" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
