<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A graph-based collective linking approach with Group Co-existence Strength</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chinmay Choudhay</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Colm O'Riordan</string-name>
          <email>colm.oriordang@university.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National University of Ireland (NUI)</institution>
          ,
          <addr-line>Galway</addr-line>
        </aff>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>13</lpage>
      <abstract>
        <p>This paper addresses a drawback of many existing graphbased collective entity-linking approaches by introducing the new concept of Group Co-existence Strength (GCS). Doing so, this work proposes an approach to the collective linking of text documents which extends an existing recent approach by taking into account GCS for all possible groups of candidate entities along with standard attributes. Preliminary experimental results indicate that the proposed approach leads to performance gains with selected real world data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] are examples of prominent approaches belonging to
individuallinking category, which link each name-mention individually based on similarity
between context of it within document and description of entity, commonly
referred as Compatibility(CP) whereas [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ],[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] are examples of modern collective-linking approaches adopting various
supervised and unsupervised linking methods. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is a prominent graph-based
approach that link all name-mentions within single document simultaneously
by considering semantic relationships between various pairs of entities
indicating the chances of both entities being referred in a single real-world document
depending upon how closely they are associated with common topic or eld,
referred as Semantic Relatedness (SR) along with CP, which is directly extended
within this paper.
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>Drawback of contemporary approaches</title>
      <p>Most graph-based collective linking approaches consider SR between all possible
pairs of entities that are candidates of two distinct name-mentions appearing
within the document for computation of overall linking. Thus for a set of
entities associated with entire document consisting of members such that each
member is a candidate of single distinct name-mention, value of score indicating
the suitability of the entire set being appropriate collective link is computed as
a function of SR scores for all possible pairs that can be extracted from the
particular set. There is an inherent assumption with this approach which can be
stated as follows.</p>
      <p>All members of a set of entities have higher chances of being referred together in
a single real-world document, if most of the pairs extracted from the set possess
strong semantic relationship.</p>
      <p>But this assumption does not always hold true, speci cally if there is an
outlier in the group thus limiting the accuracy of system. Consider text-document
stated as Example 1 consisting of four name-mentions namely Donald, Hillary,
Fox and America.</p>
      <p>Example 1. Donald will direct the upcoming movie from Fox with Hillary
playing lead role in it. The movie will be released across America by the end of 2017.</p>
      <p>Let there be two candidate collective links of entire document namely W1
and W2 listed as follows.
Here by common-sense and real-world knowledge it is evident that W2 is more
appropriate link than W1 but modern approaches would still link W1 as most
pairs of entities extracted from W1 have stronger semantic relationship as
compared to their counterparts in W2 (for example pair fDonald Trump,Hillary
Clintong as compared to pair fDonald Petrie and Hillary Swankg etc.).
To address this issue the paper introduces a new concept called Group
Coexistence Strength (GCS) as section 4 and proposes an NED approach taking it
into consideration as section 5.</p>
    </sec>
    <sec id="sec-3">
      <title>Group Co-existence Strength</title>
      <p>GCS of a group of entities, indicate the chances of all its members being
coreferred within any given real-world document. This strength depends on how
symmetrically the entities are distributed with respect to each other in terms
of mutual SR scores. One way to demonstrate this distribution is to plot all
members of entities on a graph with co-ordinates of each being determined by
values of Semantic Distance (computed as a factor of SR) of it from pre-decided
benchmark members of the same group. Groups with members being more
compactly plotted can be considered to be semantically stronger.For sets of candidate
collective-links W1 and W2 outlined for Example 1 the distribution plots are
represented as gure 2 and gure 1 respectively. It is evident from the gures that
W2 is more compactly distributed as compared to W1 which has an anomaly.</p>
      <p>
        GCS of any group of entities is indicated by value of indicator called Group
Strength Factor (GSF) described as section 4.1
GSF for a given set of entities of any size is the minimum value obtained out
of all Gaussian values achieved at the positions of all entities within the set,
with peak of Gaussian being at average position and standard deviation being
a xed value vector (of size equal to total number of co-ordinates). For a set of
entities S let R be set of reference entities such that R S. Then for any entity
Ei 2 S, the position of it is de ned by equal number of co-ordinates as the size
of R, with any jth co-ordinate being computed with respect to jth member of
R ( 0 j Size of R) by equation 1 with SR being Semantic Relatedness [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
and NF being a pre-determined Normalization Factor. It is important to note
that value of NF would not have any impact on overall performance, provided it
satis es the necessary condition of being common during entire linking-process.
It merely exists to provide facility to manage and keep positions of entities as
well as GSF values obtained on each, within a considerable range (in possible
cases when GSF values would go too low to be easily distinguished, compared
or distinctly plotted on graph) to be conveniently analysed.
      </p>
      <p>Eicoordinatej = N F</p>
      <p>SR(Ei; Rj )
(1)</p>
      <p>For experiments described in this paper NF is considered to be one.
Co-ordinates determining position of peak is given as average of values of same
co-ordinate for all Entities belonging to set S computed through equation 2.</p>
      <p>P eakCoordinatej =</p>
      <p>PAllE2S (Ecoordinatej )
n
(2)
Having positions of all entities belonging to S and peak (as values of representing
co-ordinates), GSF score of S is determined by applying equation 3 with N
representing Normal distribution of position of entities belonging to S, around
the Peak (computed through equation 2) with a xed value of standard deviation
( ).</p>
      <p>GSF =</p>
      <p>min (E
AllE2S</p>
      <p>N (P eak; ))
(3)
For the purpose of experimentation, surely larger the size of R would have more
accurate positioning (with more co-ordinates), thus more accurate nal-linking,
though at the cost of lower time-bound e ciency. Once having decided the size
of R to be considered during entire linking process, any sub-set of S of that
size being used as R would give similar results as computation of GCF involves
symmetry of mutual distribution of all entities with respect to each other.
Within this paper the value of is randomly considered to be 0.1 (As it does not
matter what speci c value is taken provided it is same for all examples during
both training and testing) whereas size of R is considered to be 1, thus all entities
being plotted on 1-D axis.</p>
      <p>Basic intuition behind GSF is simply the fact that Gaussian value obtained on
the outliers will be relatively much lower as they are positioned at a considerable
distance from the peak on the overall plot, thus penalizing the entire group.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Linking Approach</title>
      <p>The overall collective linking of name-mentions in a given document (let having
N name-mentions) simultaneously while taking into account sum of GSF values
of all possible groups of entities of a particular size (referred to as GSFn with n
being the size), for all possible sizes greater than two that can be extracted from
set of entities being candidate collective-link, involves computation of a term
called Linking Factor (LF) for all such candidates by applying heuristic formula
stated as equation 4. LF value of a particular candidate collective-link (as a set of
N entities corresponding to each name-mention) depends upon GSFn (2 &lt; n
N ) along with sum of Semantic Relatedness scores (P SR) between all possible
pairs of entities as well as sum of values of Compatibility scores (P CP ) between
all name-mentions and their respective candidate entities (forming candidate
collective link).</p>
      <p>LF =</p>
      <p>N
X( 1
n=3
n3 + 2
n2 + 3
n + 4)</p>
      <sec id="sec-4-1">
        <title>GSFn</title>
        <p>+'2</p>
        <p>X SR + '1</p>
        <p>X CP
(4)</p>
        <p>Here N is the total number of name-mentions appearing within document
while 1; 2; 3; 4, '1 and '2 are parameter that can be learnt using a set of
training dataset. Equation 4 is formulated based on intuition that impact of
GCS for groups of all sizes on collective-linking process should not be same,
thus GSFn being normalized by a quadratic equation of n with optimum degree
3 to avoid both under- tting and over- tting. For a given document consisting
of a set of name-mentions appearing within it and a group of sets of entities
as candidate collective-links (having equal number of entities as name-mentions
with each entity being associated with single distinct name-mention), LF score
of all such candidates can be computed to identify the one with maximum score
as most appropriate.
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimentation</title>
      <p>As already explained in section 1, the proposed approach can be applied to rank
respective candidates of all name-mentions appearing in a single document for
the purpose of collectively linking all such name-mentions to their respective
most appropriate entities simultaneously. Thus dataset utilized for the purpose
of training and testing the approach should consist of text-documents with all
name mentions demarcated and candidates for each being identi ed beforehand.
Section 6.1 describes the structure and process of generation of nal datasets
whereas subsequent sections elaborate on computation, training and testing
procedures.
6.1</p>
      <sec id="sec-5-1">
        <title>Dataset</title>
        <p>
          First phase of experimentation involves extraction of information from original
IITB helpfulness dataset [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] to formulate three distinct nal datasets to be
utilized for nal training and testing of proposed approach. IITB dataset is
comprised of a collection of text-documents related to varied range of subjects
such as sports, science, politics etc. with details of all name-mentions within all
documents including Title of correct Wikipedia link to be each, is represented a
single large JSON document. As proposed approach identi es most appropriate
collective link of all name-mentions simultaneously after learning the parameters
of equation 4, three distinct nal datasets having most suitable speci c structure
are created by modifying original dataset through elaborate process. Following
two sub-sections describe Structure and Process of creation of Final Datasets
respectively.
        </p>
        <p>Final datasets As already explained nal training requires parameters of
equation 4 to be learnt through Logistic Regression which fundamentally requires
a set of positive and negative training example for its implementation. Final
datasets are constituted by such examples with each having label as either
positive or negative with a single example consisting of a collection of top 100
name-mentions appearing in a single text-document ranked according to their
relevance, with each being paired up with one of its candidate entities. Examples
with all name-mentions being paired to their respective correct links as per
information provided with original IITB helpfulness datasets JSON document can
be considered as having positive label while others as having negative labels.
Various text-documents within IITB dataset contain varied number of
namementions being appeared in the content, with minimum number being as 100.
Thus for all the documents only top 100 high-relevance name-mentions are
being considered while ignoring others, for the purpose of maintaining homogeneity
between all examples of datasets, essential for training and testing convenience.
Relevance of each name-mention within speci c text-document for the purpose
of collective linking is indicated within original IITB dataset as relevance
index. For each text-document all name-mentions appearing within it are sorted
according torelevance index and top 100 members are retained while ignoring
others.</p>
        <p>Characteristic feature that mainly distinguishes three datasets is the degree of
overlap among examples contained by each one of them. For a collection of sets
of entity-mention pairs forming a single dataset overlap of that dataset refers to
the percentage of common members belonging to any two given candidate sets
of entities that can possibly be a collective link of single common text-document.
Details of all three datasets is summarized as Table 1.</p>
        <p>Dataset 1
Dataset 2
Dataset 3
Process of creation of Datasets As the proposed approach identi es most
appropriate candidate entity to be linked to each of the name-mentions within
single document simultaneously, evaluation of it requires at least one
incorrect and one correct candidate entity that can be linked to each name-mention
within all text-documents. All name-mentions are provided by the correct link
as Wikipedia title within original IITB dataset while incorrect candidate is
extracted from See Also section of that correct link. All other Wikipedia page
hyper-links within See Also section of Wikipedia article of a correct link of
given name-mention are sorted according of similarity of Bag of Words (BOW)
extracted from these with BOW extracted from contents of correct link in
decreasing order. Hyper-link of Wikipedia page on the top of the list is considered
as second incorrect candidate of particular name-mention.</p>
        <p>Having two candidates for each name-mentions, nal datasets are created by
pairing up these name-mentions with each of its candidates and re-arranging all
pairs with name-mentions appearing in single text-document as a large
collection. Each such collection being a single possible collective link of the particular
text-document forms single example with label as positive if all name-mentions
are paired with correct link and negative otherwise. Three distinct methods
adopted to perform re-arrangement classi es three distinct datasets.
6.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Computation</title>
        <p>For a given set of entity-mention pair being a possible collective link, Equation 4
computes Linking factor by taking into account three distinct parameters namely
sum of Semantic Relatedness (SR) scores of all possible pairs of entities that can
be extracted from the set, sum of Compatibility (CP) scores of all possible
entitymentions pairs forming the set and sums of Group Strength Factors (GSF) of all
possible group of entities of a speci c size n (n&gt;2) that can be extracted from
set, for all possible values of n. The processes adopted for computation of these
scores are explained as follows.</p>
      </sec>
      <sec id="sec-5-3">
        <title>1. Compatibility (CP) :</title>
        <p>It is computed between context of name-mention and entity description. For this
experimentation context of a name-mention is considered as twenty words before
and after it within text- le content and entity description is simply the content
of respective Wikipedia article. For a name-mention NM and a Wikipedia entity
W, let BNM and BW be Bags of N-grams extracted from their context and
description respectively with value of N ranging from 1 to 3.</p>
        <p>Compatibility between NM and W is given by equation 5.</p>
        <p>CP (N M; W ) = T F IDFNM</p>
        <p>VW=NM</p>
        <p>T</p>
        <p>Where T F IDFNM consists of TFIDF scores of all N-grams within BNM
with respect to all the text-documents within original IITB dataset. VW=NM is
a Boolean vector of length equal to length of BNM with values obtained from
equation 6.</p>
        <p>For all i= 1 to length of VW=NM</p>
        <p>VW=NMi =
1 if BNMi 2 BW
0 otherwise</p>
      </sec>
      <sec id="sec-5-4">
        <title>2. Semantic Relatedness (SR) :</title>
        <p>
          There are numerous approaches to compute Semantic Relatedness between Wikipedia
Entities but the most common one is proposed within [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] which uses the
intersection and union of hyper-links shared between two given entities for
computation by applying equation 7. For this experimentation same-method is utilized
adopting common practice.
        </p>
        <p>SR(x; y) = 1
log(max(jXj; jY j)) log(X \ Y )</p>
        <p>log jW j log(min(jXj; jY j))</p>
        <p>The components of formula are described as follows.
(5)
(6)
(7)
{ X \ Y : Number of hyperlinks shared by entities x and y</p>
      </sec>
      <sec id="sec-5-5">
        <title>3. Group Strength Factor (GSF):</title>
        <p>For any group of entities of size greater than two GSF is computed by applying
Equation 3. Ideally application of equation 4 for computation of Linking factor
(LF) for a given set of entity-mention pairs requires GSF values of all possible
groups of entities of size three or more that can be extracted from the set
being taken into account. Since there are 100 entities within each example, total
number of GSF computations that need to be performed for each example is as
follows.
This reduces the time-e ciency of overall training and testing to extremely low,
thus making the evaluation of hypothesis in stipulated time-period infeasible.
Considering this limitation, for the purpose of this experimentation GSF for the
groups of entities with maximum size as 10 only is taken into consideration.
Maximum size is considered to be 10 because it is the maximum value for which
experimentation process held feasibility within decided time-constraint.
As explained in section 6.2 a single example of nal dataset is formed by
collections of all name-mentions (top 100 based on relevance in case of this particular
experimentation) appearing in a single text-document, with each being
pairedup with one of its candidate entities. For each such example all 10 distinct values
namely sum of SR values, sum of CP values and sums of GSF values of group of
size ranging from 3 to 10 are represented as single 1*10 vector. Thus an example
e is represented as vector Ve given be equation 8.</p>
        <p>Ve = [ GSF10 GSF9 ::: GSF3 SR CP ]
Thus entire dataset consisting of m examples can be represented as an m*10
matrix Md given by equation 9 and an m*1 Boolean vector holding labels of all
m examples.</p>
        <p>Md = [ Ve1 Ve2 ::::: Vem ]T
(8)
(9)
6.4</p>
      </sec>
      <sec id="sec-5-6">
        <title>Training and Testing</title>
        <p>Final Collective linking is performed by computing Linking Factor (LF) for
each training example given by formula described as equation 2.2. For the case
of current experimentation process, since maximum size of group of entities is
considered to be 10, the formula can written as equation 10.</p>
        <p>LF = (103
1 + 92
2 + 9
3 +</p>
        <p>4)
(33</p>
        <p>Value of Linking factor (LF) for all examples within a given dataset d can
be represented as a single m*1 matrix called LFMatrix. After performing
mathematical derivations on equation 10 it can be proved that LFMatrix of d can be
computed by applying equation 11.</p>
        <p>LF M atrix = (Md</p>
        <sec id="sec-5-6-1">
          <title>M ultiplier) P</title>
          <p>Here Md is matrix de ned as equation 8. P and Multiplier as given as
equations 12 and 13. P is the parameter matrix that needs to be learnt through
Logistic Regression. It is initialized with random values and is subsequently
updated after each iteration until optimization.</p>
          <p>P = [ 1 2 3 4 1 2 ]</p>
          <p>T
M ultiplier = 6666 ..
To evaluate performance of proposed approach, given dataset is split in the
ratio of 60% and 40%, with rst 60% examples being utilized to learn parameters
within equation 4 (represented as single matrix P in equation 12) through
Logistic Regression whereas the testing is performed on last 40% of dataset. Training
and testing is performed on all three datasets distinctively and results obtained
by each are discussed as section 7.
7</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Preliminary Results and Future Work</title>
      <p>
        Having probability matrix for a given test-dataset as described in section 6.2,
considering a xed threshold value of 0.5, predictions are made for each
example thus obtaining a predicted Boolean matrix to be compared with
actual Boolean matrix. Table 2 compares the average results achieved on three
(10)
(11)
(12)
(13)
datasets with results of Wiki cation approach [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and approach [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] which are
benchmark individual-linking and collective-linking graph-based approaches
respectively. Though the results are yet to be compared with various state of the
art approaches, preliminary results indicate that proposed approach performed
signi cantly better than both benchmark approaches.
      </p>
      <p>Future work would include much more exhaustive testing and evaluation of
proposed approach on larger datasets.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bunescu</surname>
            ,
            <given-names>R.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pasca</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Using encyclopedic knowledge for named entity disambiguation</article-title>
          .
          <source>In: Eacl</source>
          . vol.
          <volume>6</volume>
          , pp.
          <volume>9</volume>
          {
          <issue>16</issue>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Dredze</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McNamee</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rao</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gerber</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Entity disambiguation for knowledge base population</article-title>
          .
          <source>In: Proceedings of the 23rd International Conference on Computational Linguistics</source>
          . pp.
          <volume>277</volume>
          {
          <fpage>285</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Ganea</surname>
            ,
            <given-names>O.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ganea</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lucchi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eickho</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hofmann</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Probabilistic bagof-hyperlinks model for entity linking</article-title>
          .
          <source>In: Proceedings of the 25th International Conference on World Wide Web</source>
          . pp.
          <volume>927</volume>
          {
          <fpage>938</fpage>
          .
          <string-name>
            <surname>International World Wide Web Conferences Steering Committee</surname>
          </string-name>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hachey</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Curran</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          :
          <article-title>Graph-based named entity linking with wikipedia</article-title>
          .
          <source>In: WISE</source>
          . pp.
          <volume>213</volume>
          {
          <fpage>226</fpage>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Collective entity linking in web text: a graph-based method</article-title>
          .
          <source>In: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval</source>
          . pp.
          <volume>765</volume>
          {
          <fpage>774</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ho</surname>
            <given-names>art</given-names>
          </string-name>
          , J.,
          <string-name>
            <surname>Yosef</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bordino</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , Furstenau, H.,
          <string-name>
            <surname>Pinkal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spaniol</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taneva</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thater</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weikum</surname>
          </string-name>
          , G.:
          <article-title>Robust disambiguation of named entities in text</article-title>
          .
          <source>In: Proceedings of the Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <volume>782</volume>
          {
          <fpage>792</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zeng</surname>
            ,
            <given-names>H.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Enhancing text clustering by leveraging wikipedia semantics</article-title>
          .
          <source>In: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval</source>
          . pp.
          <volume>179</volume>
          {
          <fpage>186</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kulkarni</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramakrishnan</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chakrabarti</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Collective annotation of wikipedia entities in web text</article-title>
          .
          <source>In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          . pp.
          <volume>457</volume>
          {
          <fpage>466</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Mihalcea</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Csomai</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Wikify!: linking documents to encyclopedic knowledge</article-title>
          .
          <source>In: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management</source>
          . pp.
          <volume>233</volume>
          {
          <fpage>242</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Milne</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Witten</surname>
            ,
            <given-names>I.H.</given-names>
          </string-name>
          :
          <article-title>Learning to link with wikipedia</article-title>
          .
          <source>In: Proceedings of the 17th ACM conference on Information and knowledge management</source>
          . pp.
          <volume>509</volume>
          {
          <fpage>518</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Moro</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raganato</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Navigli</surname>
          </string-name>
          , R.:
          <article-title>Entity linking meets word sense disambiguation: a uni ed approach</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>2</volume>
          ,
          <issue>231</issue>
          {
          <fpage>244</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Naderi</surname>
            ,
            <given-names>A.M.:</given-names>
          </string-name>
          <article-title>Unsupervised entity linking using graph-based semantic similarity (</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cassidy</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hermjakob</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knight</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Unsupervised entity linking with abstract meaning representation</article-title>
          .
          <source>In: HLT-NAACL</source>
          . pp.
          <volume>1130</volume>
          {
          <issue>1139</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Pappu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blanco</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mehdad</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stent</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thadani</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Lightweight multilingual entity extraction and linking</article-title>
          .
          <source>In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining</source>
          . pp.
          <volume>365</volume>
          {
          <fpage>374</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ratinov</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Downey</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anderson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Local and global algorithms for disambiguation to wikipedia</article-title>
          . In:
          <article-title>Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1</article-title>
          . pp.
          <volume>1375</volume>
          {
          <fpage>1384</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riedel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Joint inference of entities, relations, and coreference</article-title>
          .
          <source>In: Proceedings of the 2013 workshop on Automated knowledge base construction</source>
          . pp.
          <volume>1</volume>
          {
          <issue>6</issue>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Yamada</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ito</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Usami</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Takagi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Takeda</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Takefuji</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Evaluating the helpfulness of linked entities to readers</article-title>
          .
          <source>In: Proceedings of the 25th ACM Conference on Hypertext and Social Media</source>
          . pp.
          <volume>169</volume>
          {
          <issue>178</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Learning to link entities with knowledge base. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics</article-title>
          . pp.
          <volume>483</volume>
          {
          <fpage>491</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nie</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rouhani-Kalleh</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vasile</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ga ney</surname>
          </string-name>
          , S.:
          <article-title>Resolving surface forms to wikipedia topics</article-title>
          .
          <source>In: Proceedings of the 23rd International Conference on Computational Linguistics</source>
          . pp.
          <volume>1335</volume>
          {
          <fpage>1343</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>