<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Supervised and Unsupervised Approaches to the Ontology-Based Disambiguation of JSON Documents</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chinmay Choudhary</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthias Nickles</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Colm O'Riordan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Engineering and Informatics National University of Ireland</institution>
          ,
          <addr-line>Galway</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper proposes and evaluates certain supervised and unsupervised approaches to Named Entity Disambiguation in JSON documents, for linking of all ambiguous JSON objects to their most appropriate candidate DBpedia ontology classes. We achieve this by taking into account knowledge about the hierarchal structure of the document and two kinds of scores, namely Sibling Relatedness and Parental Relatedness, along with textual similarity between a class and the object indicated by the Textual Similarity score.</p>
      </abstract>
      <kwd-group>
        <kwd>JSON disambiguation</kwd>
        <kwd>Linked Data</kwd>
        <kwd>Ontology Mapping</kwd>
        <kwd>Named Entity Disambiguation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        JSON (JavaScript Object Notation) is a lightweight language-independent
datainterchange format which is widely used for sharing and extraction of data on
internet. Syntax of JSON allows data to be presented in such a format that it
is human readable, yet can be easily parsed and generated by computers, by
adopting a hierarchy of objects with each consisting of a key as human-readable
text. The paper presents an approach to the disambiguation of real-world JSON
documents by linking of keys having ambiguous value-text referring to real-world
entities to appropriate DBpedia classes to which the particular entity belongs
out of all its candidate classes. This linking could eventually be used to decode
a given JSON document representing information about popular entities to
extract such information autonomously therefore possessing utility in the eld of
web data mining. Another potential use case would be the automated
generation of documents in JSON-LD [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] format (which is a formal syntax to serialize
Linked Data in JSON), from ambiguous normal JSON documents.
Data on the Semantic Web is often represented using a framework for indicating
relationships of particular entities in a speci c domain which is conceptualized
as an ontology. The data itself is represented using Resource Description
Framework (RDF) triples. RDF is a machine-readable format which makes the data
extractable using (SPARQL) queries. Ontology mapping is the process of
linking of concepts within any two given ontologies representing similar data from
two distinct heterogeneous sources, such that both concepts (being identi ed by
unique individual identi ers within respective ontologies) categorize same type
of real-world entities. One such ontology system is DBpedia which presents
entire information available on Wikipedia in structured form as Linked data by
classifying all Wikipedia articles as hierarchy of classes or concepts based on
the type of entities that these articles describe with each class having a xed
set of properties. This structured information includes attributes about each
Wikipedia page such as Title, Hyperlinks, description etc. as RDF triples
accessible through DBpedia SPARQL server or as downloadable datasets.
The hierarchal structure of JSON documents can be informally described as an
ontology-like system with objects having two types of relations namely
Parentchild and Sibling elaborated in Section 4. RDF triples describing the structure
of the JSON document presented as Example 1 are listed in Table 1.
Example 1:
"Country":{"Name": "Germany"
"Capital":"Berlin"
"Gaint-Companies":{"Auto":["BMW","Volkswagen","Mercedes"]}}
Thus the problem of disambiguation of objects of a JSON document with
ambiguous value-texts is addressed within the paper as an ontology mapping
problem, by collectively linking all the objects (including both ambiguous and
nonambiguous) with most appropriate candidate DBpedia ontology classes
simultaneously utilizing a new proposed mapping approach based on the fundamentals
of general Named Entity Disambiguation (NED) while taking into account
hierarchal structure of JSON document. NED is the process of linking ambiguous
name-mentions within a text document to appropriate real-world entities in a
knowledge base. Most common Knowledge-base is Wikipedia for which each
article becomes single Entity. The entire NED process comprises of three major
steps including the recognition of ambiguous name-mentions within a document,
identi cation of candidate entities for each such ambiguous name-mention and
disambiguation of these name-mentions by linking each one with most
appropriate respective entity out of all the candidates, each being distinct broad research
area within itself. This paper describes research work applied on JSON
document to implement nal step of NED process through a new approach.
Section 3 outlines the research problem. Section 4 describes the proposed
approach while sections 5 and 6 elaborates the testing and evaluation of it.
RDF triples
[country parent name], [country parent capital], [country parent Giant-companies],
[Giant-companies parent Auto], [Name sibling capital], [Capital sibling
GiantCompanies], [Name sibling Giant-companies], [name child country], [capital child
country], [Giant-companies child country], [Auto child Giant-companies], [Capital
sibling Name], [Giant-Companies sibling Capital], [Giant-companies sibling capital ]
Table 1. RDF triples describing the hierarchal structure of Example 1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Named Entity Disambiguation is a signi cant area of research which exists since
quite a long time. Early works within this eld include proposals of
individuallinking approaches such as [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] which link each name-mention
individually based on similarity between context of it within document and
description of entity. On the other hand modern approaches belong to collective-linking
approach category which includes approaches that link all name-mentions within
single document simultaneously by considering mutual relationships between
various entities being referred in a single document along with textual
similarity between name-mentions and their respective entities. Collective-linking
approaches can further be classi ed based on overall process adopted, as
supervised approaches such as [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and unsupervised approaches such as [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ],
[
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]
Various proposed ontology mapping approaches can be classi ed into three
major categories namely Similarity-based, Reasoning-based and Learning-based
approaches. Approaches such as [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] are the examples of
similarity-based approaches that perform mapping based on linguistic and
contextual similarity of text representing components of two ontologies, while [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ],
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] are examples of reasoning-based approaches that address the
problem as a logic-inference problem after being provided an initial set of mapping
manually, with the goal of inferring nal set of mapping. Finally learning-based
approaches utilize machine learning to compute nal mapping. Some examples of
popular tools developed for ontology mapping are COMA++, CODI,
FALCONAO, PRIOR+, LILY [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Problem De nition</title>
      <p>The research reported in this paper proposes approaches for collectively linking
keys of JSON objects within a given document with ambiguous value-text
referring to a real-world popular entity, with appropriate class of DBpedia ontology
(http://mappings.dbpedia.org/server/ontology/classes/) to which that
entity belongs, based on fundamentals of general collective NED. Thus for the
particular application domain, keys of JSON objects with ambiguous value-text
act as name-mentions while ontology classes as an entities within collection of all
DBpedia classes forming the knowledge base (KB). Proposed approaches
accomplish nal task of NED which involves identi cation of most suitable entities to
be linked to all name-mentions simultaneous out of respective candidate entities
of each, thus can only be applied on real-world JSON documents with all the
ambiguous value-texts being demarcated with a list of candidate classes for each
being identi ed beforehand.</p>
      <p>Owing to the hierarchal structure any two objects within single document can
have either one of the three types of relationships namely Sibling, Parent-Child
and Un-related, thus enabling entire structure to be represented as set of RDF
triples. Two objects can be considered to have sibling relationship if both of
them have another distinct JSON object as a common immediate superior (can
be considered as common parent) within the documents hierarchal structure.
Whereas for two given objects O1 and O2 within a single document, O1 will be
considered as parent of O2 if O1 is immediate superior of it within overall
document hierarchy (in which case O1 and O2 would have parent-child relationship).
Pairs of objects having Sibling Relationships as well as Parent-child relationships
within document given as Example 1 are listed as Table 2.</p>
      <p>Pairs of objects (represented as Pairs of objects (represented as
keys) within Example 1 having Sib- keys) within Example 1 having
ling Relationship Parent-child Relationship
{ Name &amp; Capital
{ Capital &amp; Giant-Companies
{ Name &amp; Giant-Companies
{ Country &amp; Name
{ Country &amp; Capital
{ Country &amp; Giant-Companies
{ Giant-Companies &amp; Auto</p>
      <p>In our scenario, we assume that an entire hierarchal JSON document
structure is represented as a collection of speci c RDF triples and it can also be
depicted as a large connected graph called main-graph comprising of two types of
nodes namely Object-node representing JSON objects and Class-node
representing particular DBpedia ontology class, with each object-node being connected
to at least one class-node. The graph would also consist of three kinds of edges
described as follows.
1. Sibling edge: Connects two class-nodes and indicates both represented classes
being candidates of two distinct JSON objects having sibling relationship
within document hierarchy and is weighted with Sibling Relatedness (SR)
score.
2. Parental edge: Connects two class-nodes and indicates both represented
classes being candidates of two distinct JSON objects having Parent-child
relationship and is weighted with Parental Relatedness (PR) score.
3. Candidate edge: Connects an object-node and class-node indicating it to be
a candidate and is weighted with Textual Compatibility score.</p>
      <p>The graph in gure 1 depicts hierarchal structure of JSON document given as
Example 1. The purpose of research presented within this paper is to propose
methods for computation of all three kinds of weighting scores (namely SR, PR
and TC) as well as computation of evidence weight of a sub-graphs of desired
structure extracted from main-graph, such that the sub-graph with maximum
evidence weight is the appropriate collective link of the document. Problem can
be de ned mathematically as follows.
A main-graph M representing hierarchy of a given JSON document consists of a
set of object-nodes O and a set of class-nodes C. Each member of set O consists
of a set of other sibling nodes So, children nodes Po and a set of candidate class
nodes Co.</p>
      <p>For all mi 2 M .</p>
      <p>Sibling(oi; oj ) =</p>
      <p>Candidate(ci; o) =
P arent(oi; oj ) =
1 if ci 2 Co
0 otherwise
1 if oi 2 Soj and oj 2 Soi
0 otherwise
1 if oj 2 Poi
0 otherwise
For all ci; cj 2 C</p>
      <p>W (ci; oj ) =</p>
      <p>T C(ci; oj ) if Candidate(ci; oj ) = 1
0 otherwise
(1)
(2)
(3)
(4)
W (ci; cj ) =
8 SR(ci; cj ) if Candidate(ci; oi) = 1; Candidate(ci; oi) = 1; Sibling(oi; oj ) = 1
&lt; P R(ci; cj ) if Candidate(ci; oi) = 1; Candidate(cj ; oj ) = 1; P arent(oi; oj ) = 1
: 0 otherwise
The objective of research is to formulate and test methods for the computation
of values of TC, PR, SR and Evidence Weight within equations 4, 5 and 6
respectively. Section 3 proposes methods for computation of SR, PR, TC and
Evidence Weights.</p>
    </sec>
    <sec id="sec-4">
      <title>Proposed Approaches</title>
      <p>This section describes and proposes methods to compute three kinds of scores
(SR, PR and TC) as well as methods for computation of Evidence Weight of a
sub-graph of main graph utilizing these scores.</p>
      <p>
        Sibling Relatedness Score between two ontology classes C1 and C2 referred
to as SR(C1,C2) or SR(C2,C1) indicates the chances of both classes being linked
to any two objects having sibling relationships within a given JSON document.
In other words it indicates general chances that instances of both class would
form two distinct pieces of information about a single common entity (common
parent). Based on this logic SR value between two given classes can be
computed by analysing the properties sharing a unique ontology class as common
domain and with either one of the two classes in consideration as range, based on
intuition that the instances of these classes would have higher chances of
representing individual properties of a particular instance of common domain within
a JSON document. For two classes A and B, the value of SR can be computed
irrespective of any document, as commonness of properties indicated by Google
Normalization Distance [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] through Equation 7, with jaj and jbj being number
of properties with classes A and B as ranges respectively while ja \ bj is the
number of properties having range as either A and B and a common set of
distinct domains, and T represents total number of classes within entire DBpedia
ontology.
      </p>
      <p>SR(A; B) = 1
log(max(jaj; jbj)) log(ja \ bj)</p>
      <p>log T log(min(jaj; jbj))
Parental Relatedness score between two ontology classes C1 and C2 with
C1 being parent, referred to as PR(C1,C2), indicates the chances of both classes
being linked to JSON objects within same document such that object linked to
C1 is immediate superior (parent) of object linked to C2 within overall document
hierarchy. In other words it indicates general chances that instances of C2 would
form a distinct piece of information about an instance of class C1. Thus Value of
PR(C1,C2) can be computed as a factor of the number of properties having C1
as domain as well as C2 as range with respect to the total number of properties
having either domain as C1 or range as C2. The Parental Relatedness Score
between two ontology classes A and B is computed simply as a probability of a
property having class having A as domain and B as range out of all properties
having either A as domain and B as range by applying Equation 8 with jaj
being total number of properties with domain as A and jbj being total number of
properties with range as class B whereas ja\bj representing number of properties
with domain as A and range as B. It is based on intuition that higher the chances
of an instance of B representing distinct piece of information about an instance of
class A implies higher chances of A being parent of B within a JSON document.
One is added to both numerator and denominator as Laplace's smoothing.</p>
      <p>P R(A; B) = 2 ja \ bj + 1
jaj + jbj + 1
(7)
(8)
It is important to note that Sibling and Parental Relatedness scores between
any two classes are general and are therefore applicable for any given JSON
document (not document speci c) whereas Textual Compatibility is between an
object and ontology class and thus, is document speci c.</p>
      <p>
        Textual Compatibility score between a given JSON object and a DBpedia
ontology class indicates the chances of the object to be representing information
related to an instance of the particular ontology class, based on fundamentals
of general NED [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. For an object O, the key of it and its sub-objects within
the entire document hierarchy form context of it (with key of O being
namemention), whereas for ontology class C labels of all its properties form its
entitydescription (with C being the candidate entity). Textual Compatibility Score
between C and O is computed through N-Gram similarities [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] between keys
within the context and labels within the entity-description.
      </p>
      <p>For a given key-text (k) and a property label (p), the N-Gram Similarity (SSN )
for any value of N is computed by formula shown in Equation 9.
(9)
(11)
(12)
SSN = 2 jtk \ tpj</p>
      <p>jtkj + jtpj
Here jtkj is the total number of N-grams extracted out of string k and jtpj is the
total number of N-grams extracted out of string P. (jtk \ tpj) is the total number
of common N-grams extracted out of both k and p.</p>
      <p>Apply Equation 9 values of SSN can be computed for values of N ranging from
3 to n, where n is the length of smaller string out of k and p. The Overall
Similarity Score (SS) between key-text k and property label p can be computed
by applying the heuristic formula stated as Equation 10.</p>
      <p>SS(k; p) = 2n</p>
      <p>SSn + 2(n 1)</p>
      <p>SSn1 + ::: + 23</p>
      <p>SS3
(10)
Similarity Score between key-text k and entire DBpedia ontology class C is given
by formula stated as Equation 11.</p>
      <p>SS(k; C) = max (SS(k; pi))</p>
      <p>8pi2P
Here P is the set of strings consisting of labels of all properties of class C with
the DBpedia mapping.</p>
      <p>Let K be the set consisting of n key-texts including keys of JSON object O and of
all its sub-objects within the hierarchy given by k1; k2; ::::; kn. Textual Similarity
Score between O and ontology class C is given Equation 12.</p>
      <p>T C(O; C) =</p>
      <p>Pn
i=1 SS(ki; C)
n
As explained in Section 3 verall collective linking of given JSON document is
computed through maximum evidence weighted sub-graph of desired structure
extracted from main-graph representing entire document. The structure of
subgraph should be such that it consists of all object-nodes existing within
maingraph with each one being linked to only single class-node. This paper proposes
an unsupervised and a supervised methods to compute this Evidence Weight
(EW) through three attributes namely Sum of all Textual Compatibility Score
values (P T C), Sum of all Sibling Relatedness Score values (P SR) and Sum of
all Parental Relatedness Score values (P P R).</p>
      <p>Let a given sub-graph S consists of O as a set of all object-nodes and C as a set
of all class-nodes with each object-node being linked to single class-node. Two
approaches (supervised and unsupervised) for computation of Evidence weight
of S (EW(S)) is explained as Sections 4.1 and 4.2.
4.1</p>
      <sec id="sec-4-1">
        <title>Unsupervised/Direct Method</title>
        <p>This method assumes the importance of all three attribute values in
determining the suitable DBpedia classes for all JSON objects collectively as equal. For
the same reason the value of EW is given by simply the addition all three
attributes. For the given sub-graph S the value of Evidence Weight is determined
by Equation 13, for all ca; cb 2 C; cp; cc 2 C and all on 2 O; cn 2 C:
EW (S) =</p>
        <p>X
8P arent(on;cn)=1</p>
        <p>T C(on; cn)+</p>
        <p>X
8Sibling(ca;cb)=1</p>
        <p>SR(ca; cb)+</p>
        <p>P R(cp; cc)</p>
        <p>X
8P arent(cp;cc)=1
(13)
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Supervised Method</title>
        <p>This method assigns the importance of all three attribute values in determining
the suitable DBpedia classes for all JSON objects collectively through
parameters. These parameters can be learnt from a set of appropriate sub-graphs of
the main graph identi ed as a positive or negative example by applying machine
learning algorithms such as Logistic Regression. A sub-graph can be considered
as positive example if all the nodes of it representing JSON objects with
ambiguous value-text are linked to correct DBpedia ontology classes to which the
Wikipedia entity link being referred by their value-text belong. For the given
sub-graph S the value of evidence weight is determined by Equation 14 with
parameters 1, 2 and 3 being learnt from Logistic Regression.</p>
        <p>EW (S) = 1 (X T C(on; cn))+ 2 (X SR(c1; c2))+ 3 (X P R(cp; cc)) (14)
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiments</title>
      <p>Testing of proposed approaches for nal task of collective disambiguation of
all ambiguous objects, would require a set of JSON documents with all their
ambiguous objects being demarcated and at-least two candidates of each being
identi ed. Due to unavailability of su cient sized dataset innovative method
is adopted for very close predication of legitimacy of proposed hypothesis in
real-world indirectly. Both approaches are preliminarily evaluated using a single
large JSON document JSON document (http://www.carqueryapi.com/api/0.
3/?callback=?&amp;cmd=getMakes) consisting of two relevant pieces of information
about 155 auto-mobile manufactures namely Name of Company and Country of
origin, thus having 310 ambiguous value-texts. After demarcating all
ambiguous JSON objects and identifying three candidate entities (two incorrect and
one correct links) for all objects (including both ambiguous and non-ambiguous)
manually, a main-graph representing entire document is created, from which all
possible sub-graphs of desired structure (with each object-node being lined to
only single class-node) are extracted and labelled as positive if all ambiguous
objects are linked to respective correct class-nodes and negative otherwise. Details
of nal-datasets used for training and testing of both approaches is summarized
as Table 3.</p>
      <p>For both unsupervised and supervised methods, two distinct m x 1 and m x
3 dimensional dataset matrices along with single boolean matrix of size m x 1
representing labels, are generated respectively with m being the total number of
examples, and are utilized to formulate two distinct probability matrices with
respect to each. Data-matrix corresponding to unsupervised method consists
of total weight of each sub-graph being a single example within dataset, with
examples being re-grouped together such that ones with same non-ambiguous
pairs forming one group, thereby forming 207 such groups. Probabilities of each
such sub-graph is computed with respect to other sub-graphs within the same
group through equation 15.</p>
      <p>Let EW(s) be the total evidence weight of sub-graph s belonging to group of
sub-graphs S then probability of s being selected is simply calculated as
P (s) = P8si2S EW (si)</p>
      <p>EW (s)
(15)
However for supervised method probability is computed by applying sigmoid
function after learning parameters within Equation 14 through Logistic
Regression, thus no such re-grouping is required. For the purpose of training, dataset
matrix corresponding to supervised approach is split in the ratio of 60% and
40% to be utilized as training and testing dataset.</p>
      <p>Evaluation of proposed approaches involve making predictions for all examples
being indicated by probability values (as probability matrix) after assuming a
threshold value and eventually comparing the predicted Boolean matrix with
actual one to compute values of various indicators of accuracy.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Results</title>
      <p>Entire evaluation process involves computation of four indicators namely
Accuracy, Precision, Recall and F-Score. Both unsupervised and supervised methods
are evaluated for various values of threshold distributed uniformly through-out
entire range from 0 to 1. Figures 2 and 3 demonstrate values of Accuracy and
F-Score indicators achieved with di erent values of threshold whereas Table 4
compares maximum values of all four indicators achieved for both methods.</p>
      <p>As explained in Section 4 supervised method requires training through
Logistic Regression to learn parameters 1, 2 and 3. Values of these parameters
learnt through training datasets ( rst 60% of datasets) are enlisted as Table
Method Maximum Maximum Maximum Maximum Threshold
Accuracy Precision Recall FScore Probability
of Maximum
F-Score
Unsupervised 0.86 0.35 0.9 0.5 0.2
Supervised 0.88 0.49 1.0 0.66 0.36
Table 4. Comparison of results achieved by both unsupervised and supervised
approaches
Parameters
Values
5. The obtained results suggest that supervised method performed better than
unsupervised method on all four indexes, as is evident from the table. Though
there is no state of the art existing approach available to compare these results
and draw inferences, these results can act as benchmark for future research.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>This paper introduced Parental Relatedness and Sibling Relatedness score
between two ontology classes which are candidates of two distinct JSON objects,
computed through analysis of their shared properties. Utilizing these scores as
well as Textual Compatibility score between an object and its candidate class,
two approaches to collective disambiguation of given real-world JSON documents
have been proposed (an unsupervised and a supervised approaches), based on
general NED. Results obtained by testing of these approaches on limited
semimanually created dataset are used to compare the performance of both.
Future research involves more exhaustive testing of both approaches on elaborate
datasets.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <article-title>50 ontology mapping and alignment tools</article-title>
          . http://www.mkbergman.com/1769/ 50-ontology
          <article-title>-mapping-and-alignment-tools/</article-title>
          . Accessed:
          <fpage>2017</fpage>
          -08-30.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Borgida</surname>
          </string-name>
          and
          <article-title>Luciano Sera ni. Distributed description logics: Assimilating information from peer sources</article-title>
          .
          <source>J. Data Semantics</source>
          ,
          <volume>1</volume>
          :
          <fpage>153</fpage>
          {
          <fpage>184</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Razvan</surname>
            <given-names>C Bunescu</given-names>
          </string-name>
          and
          <string-name>
            <given-names>Marius</given-names>
            <surname>Pasca</surname>
          </string-name>
          .
          <article-title>Using encyclopedic knowledge for named entity disambiguation</article-title>
          .
          <source>In Eacl</source>
          , volume
          <volume>6</volume>
          , pages
          <fpage>9</fpage>
          {
          <fpage>16</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Silvana</given-names>
            <surname>Castano</surname>
          </string-name>
          , Al o Ferrara,
          <article-title>Davide Lorusso, Tobias Henrik Nath, and Ralf Moller. Mapping validation by probabilistic reasoning</article-title>
          .
          <source>In European Semantic Web Conference</source>
          , pages
          <volume>170</volume>
          {
          <fpage>184</fpage>
          . Springer,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Rudi L Cilibrasi</surname>
          </string-name>
          and Paul MB Vitanyi.
          <article-title>The google similarity distance</article-title>
          .
          <source>IEEE Transactions on knowledge and data engineering</source>
          ,
          <volume>19</volume>
          (
          <issue>3</issue>
          ),
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Graham</given-names>
            <surname>Cormode</surname>
          </string-name>
          and
          <string-name>
            <given-names>S</given-names>
            <surname>Muthukrishnan</surname>
          </string-name>
          .
          <article-title>The string edit distance matching problem with moves</article-title>
          .
          <source>ACM Transactions on Algorithms (TALG)</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ):
          <fpage>2</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Douglas</given-names>
            <surname>Crockford</surname>
          </string-name>
          .
          <article-title>The application/json media type for javascript object notation (json</article-title>
          ).
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Mark</given-names>
            <surname>Dredze</surname>
          </string-name>
          ,
          <string-name>
            <surname>Paul McNamee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Delip</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Adam</given-names>
            <surname>Gerber</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Tim</given-names>
            <surname>Finin</surname>
          </string-name>
          .
          <article-title>Entity disambiguation for knowledge base population</article-title>
          .
          <source>In Proceedings of the 23rd International Conference on Computational Linguistics</source>
          , pages
          <volume>277</volume>
          {
          <fpage>285</fpage>
          . Association for Computational Linguistics,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. Xianpei Han,
          <string-name>
            <given-names>Le</given-names>
            <surname>Sun</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Jun</given-names>
            <surname>Zhao</surname>
          </string-name>
          .
          <article-title>Collective entity linking in web text: a graphbased method</article-title>
          .
          <source>In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval</source>
          , pages
          <volume>765</volume>
          {
          <fpage>774</fpage>
          . ACM,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Jian</surname>
            <given-names>Hu</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lujun</surname>
            <given-names>Fang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            <given-names>Cao</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hua-Jun</surname>
            <given-names>Zeng</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Hua</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Qiang</given-names>
            <surname>Yang</surname>
          </string-name>
          , and Zheng Chen.
          <article-title>Enhancing text clustering by leveraging wikipedia semantics</article-title>
          .
          <source>In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <volume>179</volume>
          {
          <fpage>186</fpage>
          . ACM,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Hongzhao</surname>
            <given-names>Huang</given-names>
          </string-name>
          , Yunbo Cao, Xiaojiang Huang,
          <string-name>
            <given-names>Heng</given-names>
            <surname>Ji</surname>
          </string-name>
          , and
          <string-name>
            <surname>Chin-Yew Lin</surname>
          </string-name>
          .
          <article-title>Collective tweet wiki cation based on semi-supervised graph regularization</article-title>
          .
          <source>In ACL (1)</source>
          , pages
          <fpage>380</fpage>
          {
          <fpage>390</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>Glen</given-names>
            <surname>Jeh</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jennifer</given-names>
            <surname>Widom</surname>
          </string-name>
          .
          <article-title>Simrank: a measure of structural-context similarity</article-title>
          .
          <source>In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          , pages
          <volume>538</volume>
          {
          <fpage>543</fpage>
          . ACM,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>Grzegorz</given-names>
            <surname>Kondrak</surname>
          </string-name>
          .
          <article-title>N-gram similarity and distance</article-title>
          .
          <source>In String processing and information retrieval</source>
          , pages
          <volume>115</volume>
          {
          <fpage>126</fpage>
          . Springer,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Sayali</surname>
            <given-names>Kulkarni</given-names>
          </string-name>
          , Amit Singh,
          <string-name>
            <given-names>Ganesh</given-names>
            <surname>Ramakrishnan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Soumen</given-names>
            <surname>Chakrabarti</surname>
          </string-name>
          .
          <article-title>Collective annotation of wikipedia entities in web text</article-title>
          .
          <source>In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          , pages
          <volume>457</volume>
          {
          <fpage>466</fpage>
          . ACM,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Jayant</surname>
            <given-names>Madhavan</given-names>
          </string-name>
          ,
          <article-title>Philip A Bernstein,</article-title>
          and
          <string-name>
            <given-names>Erhard</given-names>
            <surname>Rahm</surname>
          </string-name>
          .
          <article-title>Generic schema matching with cupid</article-title>
          .
          <source>In vldb</source>
          , volume
          <volume>1</volume>
          , pages
          <fpage>49</fpage>
          {
          <fpage>58</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Christian</surname>
            <given-names>Meilicke</given-names>
          </string-name>
          , Heiner Stuckenschmidt, and
          <string-name>
            <given-names>Andrei</given-names>
            <surname>Tamilin</surname>
          </string-name>
          .
          <article-title>Repairing ontology mappings</article-title>
          .
          <source>In AAAI</source>
          , volume
          <volume>3</volume>
          , page 6,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Christian</surname>
            <given-names>Meilicke</given-names>
          </string-name>
          , Heiner Stuckenschmidt, and
          <string-name>
            <given-names>Andrei</given-names>
            <surname>Tamilin</surname>
          </string-name>
          .
          <article-title>Reasoning support for mapping revision</article-title>
          .
          <source>Journal of logic and computation</source>
          ,
          <volume>19</volume>
          (
          <issue>5</issue>
          ):
          <volume>807</volume>
          {
          <fpage>829</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>Rada</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          and
          <string-name>
            <given-names>Andras</given-names>
            <surname>Csomai</surname>
          </string-name>
          . Wikify!
          <article-title>: linking documents to encyclopedic knowledge</article-title>
          .
          <source>In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management</source>
          , pages
          <volume>233</volume>
          {
          <fpage>242</fpage>
          . ACM,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Ali M Naderi.</surname>
          </string-name>
          <article-title>Unsupervised entity linking using graph-based semantic similarity</article-title>
          .
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Gonzalo</surname>
            <given-names>Navarro</given-names>
          </string-name>
          , Ricardo A.
          <string-name>
            <surname>Baeza-Yates</surname>
            ,
            <given-names>Erkki</given-names>
          </string-name>
          <string-name>
            <surname>Sutinen</surname>
            , and
            <given-names>Jorma</given-names>
          </string-name>
          <string-name>
            <surname>Tarhio</surname>
          </string-name>
          .
          <article-title>Indexing methods for approximate string matching</article-title>
          .
          <source>IEEE Data Eng. Bull.</source>
          ,
          <volume>24</volume>
          (
          <issue>4</issue>
          ):
          <volume>19</volume>
          {
          <fpage>27</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Xiaoman</surname>
            <given-names>Pan</given-names>
          </string-name>
          , Taylor Cassidy, Ulf Hermjakob, Heng Ji, and
          <string-name>
            <given-names>Kevin</given-names>
            <surname>Knight</surname>
          </string-name>
          .
          <article-title>Unsupervised entity linking with abstract meaning representation</article-title>
          .
          <source>In HLT-NAACL</source>
          , pages
          <volume>1130</volume>
          {
          <fpage>1139</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Lev</surname>
            <given-names>Ratinov</given-names>
          </string-name>
          , Dan Roth, Doug Downey, and
          <string-name>
            <given-names>Mike</given-names>
            <surname>Anderson</surname>
          </string-name>
          .
          <article-title>Local and global algorithms for disambiguation to wikipedia</article-title>
          .
          <source>In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language TechnologiesVolume 1</source>
          , pages
          <fpage>1375</fpage>
          {
          <fpage>1384</fpage>
          . Association for Computational Linguistics,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <given-names>Esko</given-names>
            <surname>Ukkonen</surname>
          </string-name>
          .
          <article-title>Approximate string-matching over su x trees</article-title>
          .
          <source>In Combinatorial Pattern Matching</source>
          , pages
          <volume>228</volume>
          {
          <fpage>242</fpage>
          . Springer,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>