<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ATMC team at M-WePNaD task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Agust n D. Delgado</string-name>
          <email>agustin.delgado@lsi.uned.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Nacional de Educacin a Distancia (UNED) Juan del Rosal</institution>
          ,
          <addr-line>16, 28040 - Madrid</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>128</fpage>
      <lpage>137</lpage>
      <abstract>
        <p>This paper presents our participation in the task Multilingual Web Person Name Disambiguation (M-WePNaD) at IBEREVAL 2017 workshop. Given a ranking of search results written in di erent languages retrieved by a search engine when looking for a person name, the goal of the task is to group the web pages according to the individual they refer to. We have grouped the search results by means of a clustering algorithm which does not need any kind of prior information. On the other hand, we deal with multilingualism by two di erent ways. The rst one just use a machine translation tool. The second one is a method to compare search results written in di erent languages which is based on giving a special role to those features written the same way in several languages. Both approaches get similar results, but the second one is more e cient because it avoids additional preprocessing caused by the translation of the search results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The disambiguation of person names on the Web has been addressed in last years
due to two main reasons: (i) Person names are a kind of named entities (NEs)
specially ambiguous, so that their disambiguation has been studied in several
scenarios like Cross-Document Coreference Resolution [
        <xref ref-type="bibr" rid="ref11 ref3">3, 11</xref>
        ], Entity Linking and
wiki cation [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ] or author name disambiguation [
        <xref ref-type="bibr" rid="ref10 ref16">10, 16</xref>
        ]; and (ii) The search
scenario on the Web presents several challenges: web pages do not talk about
an speci c topic; search results could not have in common an speci c structure
as happens with news, scienti c papers or references; and the proposed methods
must be e cient due to users expect quick responses to their queries.
      </p>
      <p>
        Person name disambiguation on the Web has been addressed as a clustering
problem composed by two phases. The goal of the rst phase is to represent the
search results by means of suitable features to identify and distinguish di erent
individual with the same name. On the other hand, the second phase is to apply
a clustering algorithm to group the search results according to the individual
they refer to. In particular, the best systems of the state-of-the-art represent the
search results with a rich selection of features from di erent nature and groups
the web pages by means of the Hierarchical Agglomerative Clustering (HAC)
algorithm after learning a similarity threshold by means of training data [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>However, some authors have pointed out that the results obtained by HAC are
very sensitive with respect to little variations of the similarity threshold, so this
methodology is not robust.</p>
      <p>
        On the other hand, this problem has been addressed assuming that all the
search results are written in the same language. However, the search engines are
able to retrieve web pages written in several languages and there are increasingly
web pages written in di erent languages due to the popularization of the
Internet in non-English speaking countries [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. There are few proposals that take
into account the presence of multilingualism in this problem. For instance, in
[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] is presented a method based on extracting biographical information of the
individuals, like birth dates and places. For this purpose, the authors propose
to learn several patterns of each biographical fact in several languages by means
of training data. However, this approach needs enough training data for each
biographical fact in each language, which requires a huge human e ort. On the
other hand, in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] the authors claim that Latent Dirichlet Allocation (LDA) is
able to deal with the problem for any language. These authors used a
collection that contains news written in English, Spanish, Bulgarian and Romanian
to check the suitability of their approach in several languages. Nevertheless, the
web pages associated to each entity are written in the same language, so the
disambiguation process is not multilingual.
      </p>
      <p>
        In this paper, we present several methods to deal with multilingualism in
person name disambiguation on the Web. To this end, we have used a data set
called MC4WePS1 [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] provided by the M-WePNaD organizers. First, we detail
our approaches in Section 2. Next, we present the results in Section 3 and we
discuss them in Section 4. Finally, Section 5 presents some conclusions and future
lines of work.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <p>This section presents four methods to solve the M-WePNaD task, which could
be divided into two kinds: those that just take into account the original
content of the web pages; and those that employ a machine translation tool. First,
Subsection 2.1 describes the clustering algorithm used to group the web pages
according to the individual they refer to. Right after, we detail the translation
process and the preprocessing of the search results in Subsections 2.2 and 2.3
respectively. Finally, Subsection 2.4 presents several approaches to compare search
results written in di erent languages.
2.1</p>
      <sec id="sec-2-1">
        <title>Clustering Algorithm</title>
        <p>
          We have used the algorithm Adaptive Threshold Clustering (ATC) [
          <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
          ] to group
the search results. ATC is composed by three phases and its grouping strategy is
the following: the goal of the two rst phases is to obtain initial cohesive clusters
with a high value of precision, while the third phase merges them in order to
improve the recall score. The phases of ATC are brie y described as follows:
        </p>
        <sec id="sec-2-1-1">
          <title>1 http://nlp.uned.es/web-nlp/resources</title>
          <p>
            { Phase 1 (grouping by links): the search results are grouped if they are linked
or they share some link under the assumption that they refer the same
individual in that case. Therefore, each web page is represented by its URL
and its links. Note that the M-WePNaD organizers provide the URL of all
the search results in their metadata les.
{ Phase 2 (UPND algorithm): the search results are grouped by means of
the algorithm UPND [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]. In this phase, the web pages are represented by
means of their capitalized 3-grams, which are described as a sequence of three
consecutive words with their rst letter written in uppercase. This kind of
features has shown suitable to distinguish between di erent individuals [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ].
{ Phase 3 (fusion of clusters): merge of the most similar clusters generated in
the previous phases. The clusters are represented as bag of words by means of
their centroids. However, some features are ltered of the centroids according
to the following properties: (i) they have a low document frequency within
the cluster; and (ii) they appear in most of the clusters. In this phase the
search results are represented by means of their 1-grams due to these features
allow to represent as much search results as possible unlike the capitalized
3-grams.
          </p>
          <p>
            We have used the con guration of ATC described in [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]: these authors show
that the results are not a ected with respect to the function used to weight the
capitalized 3-grams, so they are weighted with the binary function because it is
the most simple one. However, in the case of the 1-grams, the TF-IDF function
gets better results. On the other hand, ATC compares search results Wi and Wj
by means of their cosine similarity sim(Wi; Wj ) and a mathematical function
(Wi; Wj ) called adaptive threshold which returns a similarity threshold which
depends on the search results characteristics and their number of shared features.
The web pages are merged if sim(Wi; Wj ) &gt; (Wi; Wj ). This way, ATC is able
to estimate the number of clusters, so it does not need any prior information to
group the search results unlike HAC or k-means algorithms.
          </p>
          <p>
            On the other hand, the presence of web pages from social media platforms
could lead to obtain worse performance [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]. Thus, we have applied an heuristic
to treat social media platforms [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ], which do not allow comparisons between
web pages from the same social platform. In addition, this heuristic is extended
to web pages of people search engines in the phase 1 because these web pages
usually contain links to pro les of di erent social platforms of people with the
same name, so they could lead to merge incorrectly web pages when they are
represented by their links.
2.2
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Translation Process</title>
        <p>Some experiments are based on the use of a machine translation tool. This
Subsection describes the translation process conducted for these runs for each
person name.</p>
        <p>First, we have to select the anchor language to translate the web pages to. As
the computational cost must be light in a web search scenario, we have decided
to translate as few web pages as possible. Thus, we identify the anchor language
as the most frequent language of the search results contained in the ranking.
Although the M-WePNaD organizers provide the language of each search result
annotated by experts, we have used a language identi cation tool available in the
Internet2 that uses a Naive Bayes classi er which looks to sequences of characters
within the text to detect the language. We have evaluated the performance of
the language detector taking into account the language annotations from the
experts obtaining 96.17% accuracy.</p>
        <p>On the other hand, we have used the translation service provided by the
Russian technology company Yandex 3. This tool is able to translate documents
from 94 di erent languages, including those identi ed by the language
detector. This tool uses statistical techniques to translate the documents by means
of several dictionaries and modeling each language with web pages written in
several languages, for instance, taking the version of the web site of companies
in di erent languages and comparing them. This translator could not perform
correctly due to mistakes made by the language detector. For instance, a Spanish
web page that has been detected as Catalan is not entirely translated because
the translator does not nd Catalan words in the text with the exception of the
shared vocabulary between the two languages.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Preprocessing</title>
        <p>The preprocessing of the search results when using the machine translation tool
for each person name is the following: we obtain the plain text of the search
results using the parsers provided by the library TiKa Apache4. This tool is also
able to obtain the links of the search results used in the phase 1 of ATC. Right
after we identify the language of each search result by means of the language
detection tool. Next, we translate to the anchor language those search results
written in other languages. Then, we split the plain texts into sentences and we
delete the stop words of the anchor language because all the search results are
written in the same language after the translation step. In addition, we delete
the person name due to it is the query so we assume that all the search results
contain them.</p>
        <p>Those experiments which do not use the machine translation tool conduct the
same preprocessing with the exception of two di erences: (i) we do not translate
any search result; and (ii) we delete the stop words of the language identi ed by
the language detector.</p>
        <p>After the preprocessing phase, we extract the textual features of each sentence
used by ATC: capitalized 3-grams and 1-grams. Finally, we remove those features
which only appear in one search result of the ranking.</p>
        <sec id="sec-2-3-1">
          <title>2 https://code.google.com/p/language-detection/ 3 https://www.yandex.com/ 4 https://tika.apache.org/</title>
          <p>2.4</p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>Approaches</title>
        <p>
          We have conducted several experiments based on ATC which take di erent
representations of the search results or apply di erent policies when comparing
them:
{ Run 1 (ATC): the search results are represented by means of their original
textual features without using any translation resource.
{ Run 2 (ATC+TRAD): we translate to the anchor languages those search
results written in other languages.
{ Run 3 (ATC+CENT TRAD): we translate separately to the anchor language
only the 1-grams contained in the centroids used to represent the clusters in
the last phase of the algorithm.
{ Run 4 (ATMC): we compare the search results taking into account those
features written the same way in di erent languages without using any
translation resource. We have called this run Adaptive Threshold for Multilingual
Clustering (ATMC) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Below we detail this method.
        </p>
        <p>
          Runs 2 and 3 allow us to study the suitability of applying a machine
translation tool to compare the search results. In particular, Run 3 allows us to study
the suitability of translating some words separately with respect to translate the
whole document as Run 2 does. On the other hand, Runs 1 and 4 do not use
any translation resource to make the disambiguation process lighter in terms of
cost because it avoids an additional phase dedicated to translate the web pages.
This is desirable in problems related to searching on the Web due to users want
quick responses to their queries. Run 1 just applies the ATC algorithm [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] using
the features extracted of the original content the web pages, while Run 4
compares the documents written in di erent languages providing more importance
to those features which are written the same way in both languages.
        </p>
        <p>ATMC compares search results written in di erent languages giving a special
role to those of their features orthographically identical in several languages. This
usually happens with NEs as organizations or person names. However, it also
happens with other kind of information which is not usually detected as NEs,
for instance, titles of lms, books, TV shows, papers, and so on, which could be
useful to identify an individual.</p>
        <p>Let be W = fW1; W2; : : : ; WN g the set of search results returned by a search
engine when looking for a person name. We denote as Fi to the set of features
of the search result Wi that is written in the language li. On the other hand,
L(W) = SN</p>
        <p>i=1flig denotes the set of languages of the search results contained
in W. We can tag each feature f 2 F with the set of languages of the search
results where it appears computing L(f ) = fli 2 L(W)jf 2 Fig L(W). Then,
any feature f 2 Fi must hold that li 2 L(f ). Given two features f; f 0 2 F , they
are comparable features if L(f ) \ L(f 0) 6= ;. The set of comparable features of
Wi and Wj are de ned as follows:</p>
        <p>Fi;j = ffi 2 Fijlj 2 L(fi)g</p>
        <p>Fi
(1)
Fj;i = ffj 2 Fj jli 2 L(fj )g</p>
        <p>Fj
(2)</p>
        <p>This de nition can be easily generalized for clusters of search results just
taking into account the set of languages of the web pages contained in each
cluster. In addition, note that li = lj implies that Fi = Fi;j and Fj = Fj;i, so
this guarantees that comparing web pages taking Fi;j and Fj;i has no e ect in
the monolingual scenario. On the other hand, li 6= lj implies that we do not
compare the search results taking into account features that we already know
are not shared by both web pages by means of the language detection, so it
is more probable that they can be grouped than using all their features. This
means that if we only use comparable features to compare the search results
then we give more bene t to those comparisons between web pages written in
di erent languages. In order to avoid this e ect as much as possible, we propose
to balance the comparison of the search results taking into account all their
features and their comparable features by means of the following formulas:
simML(Wi; Wj ) =
i;j sim(Fi; Fj ) + (1</p>
        <p>i;j ) sim(Fi;j ; Fj;i)
ML(Wi; Wj ) =
i;j
(Fi; Fj ) + (1
i;j )
(Fi;j ; Fj;i)
where i;j = jFi;jj+jFj;ij is the proportion of comparable features with respect
jFij+jFjj
to all the features of the compare search results. A high value of i;j means that
most of features are comparable, so the similarity and the adaptive threshold
values would be similar to the ones used by ATC assumming a monolingual
scenario. On the other hand, if i;j has a low value, then few features are
comparable so they are more weighted when comparing search results written in
di erent languages.
(3)
(4)
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>The baseline ALL IN ONE improves the results of ONE IN ONE which means
that most individuals in the collection have associated several web pages. The
tables also show that the proposed approaches improve the results of both
baselines. However, the results between the four approaches are close. This could
be explained because of two reasons: several person names in the collection are
monolingual (9 of 35) and we translate as less web pages as possible, which
modi es the representation of few web pages. In particular, the results of ATC
are slightly worse than the ones obtained by the experiments that use the
machine translation tool (ATC+TRAD and ATC+CENT TRAD) and ATMC.
On the one hand, this means that the translation process has a positive
impact. In particular, ATC+CENT TRAD is more suitable because it translate
a lower amount of text. This experiment obtains a lower reliability score with
respect to ATC+TRAD but it gets better results of sensibility. This is explained
because when the words are translated separately (ATC+CENT TRAD) the
translator always return the same output, but when we translate the whole
texts (ATC+TRAD), the translation of each word could be di erent
depending on the context, so the documents share more vocabulary in the case of
ATC+CENT TRAD which leads to a higher number of groupings. Finally, ATMC
slightly improves ATC+ORIGINAL although both approaches use the original
features. This means that the proposed method to compare web pages written
in di erent languages is suitable. In particular, ATC and ATMC get the same
reliability score, but ATMC obtains higher sensibility, which means that ATMC
is able to group correctly a higher number of search results without loss of
precision. In addition, ATC+TRAD and ATC+CENT TRAD do not improve the
results of ATMC although they use a machine translation tool. Then, ATMC
is a better choice because it does not need an additional preprocessing step for
translating the search results which necessarily increase the processing time of
the disambiguation process. Note that this is desirable in a scenario involving
searching on the Web due to users expect response as soon as possible.</p>
      <p>Regarding the results of both tables, the baseline ALL IN ONE is the only
experiment that improves its results when considering all the web pages,
including the not related search results. These web pages have been identi ed by the
annotators according to several criteria, for instance, they do not mention any
individual with the person name given as query, or they refer to other categories
of NEs which are not person names, as happens with John Fitzgerald Kennedy
International Airport instead of the former president of the United States. These
not related search results are grouped by the annotators in the same cluster for
each person name although they could refer to di erent people, so this situation
bene ts ALL IN ONE but has a negative impact to ONE IN ONE. On the other
hand, the proposed approaches do not identify and group the not related search
results, so they get worse results when considering all the web pages.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>This paper has described our participation in the M-WePNaD task at IBEREVAL
2017 workshop, which goal is to address person name disambiguation on the
Web in a multilingual scenario. We have proposed four approaches to address
the multilingualism in the problem. Two of them are based on the use of a
machine translation tool while the other ones do not use any translation resource
in order to avoid additional preprocessing steps. On the one hand, the use of a
translator improves slightly the results obtained using the original features. On
the other hand, we have seen that the comparable features are useful to
compare web pages written in di erent languages without the need of translation
resources. As future work, we want to explore how to enrich the representation
by means of comparable features. For instance, this representation could be
extended identifying features written similarly in di erent languages in addition
to those written orthographically the same. Those features could be detected
by means of NEs alignment techniques and cognate identi cation methods. In
addition, a future line of work is to detect not related search results due to they
have a negative impact in our methods when we consider the whole ranking of
web pages. This kind of search results could be identi ed by means of checking if
they mention the person name given as query, and those mentions are not other
categories of NEs than person names.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgment</title>
      <p>This work has been part-funded by the Spanish Ministry of Science and
Innovation (MAMTRA-MED Project, TIN2016-77820-C3-2-R and MED-RECORD
Project, TIN2013-46616-C2-2-R).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Enrique</given-names>
            <surname>Amigo</surname>
          </string-name>
          &amp;
          <article-title>Julio Gonzalo &amp; Felisa Verdejo: A General Evaluation Measure for Document Organization Tasks Proceedings of the 36th</article-title>
          <source>International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , pp.
          <volume>643</volume>
          {
          <issue>652</issue>
          ,
          <year>2013</year>
          . http://doi.acm.
          <source>org/10</source>
          .1145/2484028.2484081.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Javier</given-names>
            <surname>Artiles</surname>
          </string-name>
          :
          <article-title>Web People Search</article-title>
          .
          <source>PhD Thesis</source>
          , E.T.S. Ingenier a Informatica,
          <string-name>
            <surname>UNED</surname>
          </string-name>
          ,
          <year>2009</year>
          . http://e-spacio.uned.es/fez/eserv/tesisuned:IngInfJartiles/Documento.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Amit</given-names>
            <surname>Bagga</surname>
          </string-name>
          &amp;
          <article-title>Breck Baldwin: Entity-based Cross-document Coreferencing Using the Vector Space Model</article-title>
          .
          <source>Proceedings of the 17th International Conference on Computational Linguistics - vol. 1</source>
          , pp.
          <volume>79</volume>
          {
          <issue>85</issue>
          ,
          <year>1998</year>
          . University of Amsterdam (
          <year>2015</year>
          ). http://dx.doi.org/10.3115/980451.980859.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Richard Berendsen: Finding People, Papers, and
          <article-title>Posts: Vertical Search Algorithms</article-title>
          and Evaluation.
          <source>PhD Thesis</source>
          . University of Amsterdam (
          <year>2015</year>
          ). http://doi.acm.
          <source>org/10</source>
          .1145/2484028.2484081.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Agust</surname>
            n
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Delgado</surname>
          </string-name>
          &amp;
          <article-title>Raquel Mart nez &amp; V ctor Fresno &amp; Soto Montalvo: A Data Driven Approach for Person Name Disambiguation in Web Search Results</article-title>
          .
          <source>Proceedings of the 25th International Conference on Computational Linguistics</source>
          , pp.
          <volume>301</volume>
          {
          <issue>310</issue>
          ,
          <year>2014</year>
          . http://aclweb.org/anthology/C/C14/C14-1030.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Agust</surname>
            n
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Delgado</surname>
          </string-name>
          &amp;
          <article-title>Raquel Mart nez</article-title>
          &amp; Soto
          <string-name>
            <surname>Montalvo</surname>
          </string-name>
          &amp;
          <article-title>V ctor Fresno: An Unsupervised Algorithm for Person Name Disambiguation in the Web</article-title>
          .
          <source>Procesamiento del Lenguaje Natural</source>
          ,
          <volume>53</volume>
          :
          <fpage>51</fpage>
          {
          <fpage>58</fpage>
          ,
          <year>2014</year>
          . http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/5042.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Agust</surname>
            n
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Delgado</surname>
          </string-name>
          &amp;
          <article-title>Raquel Mart nez</article-title>
          &amp; Soto
          <string-name>
            <surname>Montalvo</surname>
          </string-name>
          &amp;
          <article-title>V ctor Fresno: Tratamiento de redes sociales en desambiguacin de nombres de persona en la web</article-title>
          .
          <source>Procesamiento del Lenguaje Natural</source>
          ,
          <volume>57</volume>
          :
          <fpage>117</fpage>
          -
          <lpage>124</lpage>
          ,
          <year>2016</year>
          . http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/5344.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Agust</surname>
            n
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Delgado</surname>
          </string-name>
          &amp;
          <article-title>Raquel Mart nez</article-title>
          &amp; Soto
          <string-name>
            <surname>Montalvo</surname>
          </string-name>
          &amp;
          <article-title>V ctor Fresno: Person Name Disambiguation in the Web Using Adaptive Threshold Clustering</article-title>
          .
          <source>Journal of the Association for Information Science and Technology</source>
          ,
          <year>2017</year>
          . https://doi.org/10.1002/asi.23810.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Agust</surname>
            n
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Delgado</surname>
          </string-name>
          : Desambiguacion de nombres de
          <article-title>persona en la Web en un contexto multilingue</article-title>
          .
          <source>PhD Thesis</source>
          , E.T.S. Ingenier a Informatica,
          <string-name>
            <surname>UNED</surname>
          </string-name>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Johanna</surname>
            <given-names>Gei</given-names>
          </string-name>
          &amp;
          <article-title>Michael Gertz: With a Little Help from My Neighbors: Person Name Linking Using the Wikipedia Social Network</article-title>
          .
          <source>Proceedings of the 25th International Conference Companion on World Wide Web</source>
          , pp.
          <volume>985</volume>
          {
          <issue>990</issue>
          ,
          <year>2016</year>
          . http://dx.doi.org/10.1145/2872518.2891109.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Chung Heong</surname>
            <given-names>Gooi</given-names>
          </string-name>
          &amp;
          <article-title>James Allan: Cross-Document Coreference on a Large Scale Corpus</article-title>
          .
          <source>Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics</source>
          , pp.
          <volume>9</volume>
          {
          <issue>16</issue>
          ,
          <year>2004</year>
          . http://aclweb.org/anthology/N/N04/N04-1002.pdf
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Toni</surname>
          </string-name>
          <article-title>Grutze &amp; Gjergji Kasneci &amp; Zhe Zuo &amp; Felix Naumann: Bootstrapping Wikipedia to answer ambiguous person name queries</article-title>
          .
          <source>Workshops Proceedings of the 30th International Conference on Data Engineering Workshops</source>
          , pp.
          <volume>56</volume>
          .
          <fpage>61</fpage>
          .
          <year>2014</year>
          . http://dx.doi.org/10.1109/ICDEW.
          <year>2014</year>
          .
          <volume>6818303</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Zhengyan</surname>
            <given-names>He &amp; Houfeng</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          &amp;
          <article-title>Sujian Li: The Task 2 of CIPS-SIGHAN 2012 Named Entity Recognition and Disambiguation in Chinese Bakeo</article-title>
          .
          <source>Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing</source>
          , pp.
          <volume>108</volume>
          {
          <fpage>114</fpage>
          .
          <year>2012</year>
          . http://www.aclweb.org/anthology/W12-6321.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Zornitsa</surname>
            <given-names>Kozareva</given-names>
          </string-name>
          &amp;
          <article-title>Sujith Ravi: Unsupervised Name Ambiguity Resolution Using a Generative Model</article-title>
          .
          <source>Proceedings of the First Workshop on Unsupervised Learning in NLP</source>
          , pp.
          <volume>105</volume>
          {
          <issue>112</issue>
          ,
          <year>2011</year>
          . http://dl.acm.org/citation.cfm?id=
          <volume>2140458</volume>
          .
          <fpage>2140471</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Gideon</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Mann</surname>
          </string-name>
          &amp; David Yarowsky:
          <article-title>Unsupervised Personal Name Disambiguation</article-title>
          .
          <source>Proceedings of the Seventh Conference on Natural Language Learning at HLTNAACL 2003 - Volume 4</source>
          , pp.
          <volume>33</volume>
          {
          <fpage>40</fpage>
          . http://dx.doi.org/10.3115/1119176.1119181.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Fakhri</surname>
            <given-names>Momeni</given-names>
          </string-name>
          &amp;
          <article-title>Philipp Mayr: Using Co-authorship Networks for Author Name Disambiguation</article-title>
          .
          <source>Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries (JCDL</source>
          <year>2016</year>
          ), pp.
          <volume>261</volume>
          {
          <fpage>262</fpage>
          . http://doi.acm.
          <source>org/10</source>
          .1145/2910896.2925461.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Soto</surname>
            <given-names>Montalvo</given-names>
          </string-name>
          &amp;
          <article-title>Raquel Mart nez</article-title>
          &amp; Leonardo Campillos &amp;
          <string-name>
            <surname>Agust</surname>
            n
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Delgado</surname>
          </string-name>
          &amp;
          <article-title>V ctor Fresno &amp; Felisa Verdejo: MC4WePS: a multilingual corpus for web people search disambiguation Language Resources and Evaluation (</article-title>
          <year>2016</year>
          ). http://dx.doi.org/10.1007/s10579-016-9365-4.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18. Daniel Pimienta &amp; Daniel Prado &amp; Alvaro Blanco:
          <article-title>Twelve years of measuring linguistic diversity in the Internet: balance and perspectives</article-title>
          .
          <source>UNESCO publications for the World Summit on the Information Society</source>
          (
          <year>2009</year>
          ). http://unesdoc.unesco.org/images/0018/001870/187016e.pdf.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>