<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the M-WePNaD Task: Multilingual Web Person Name Disambiguation at IberEval 2017</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Soto Montalvo</string-name>
          <email>soto.montalvo@urjc.es</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raquel Mart nez</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>V ctor Fresno</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Agust n D. Delgado</string-name>
          <email>agustin.delgadog@lsi.uned.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arkaitz Zubiaga</string-name>
          <email>a.zubiaga@warwick.ac.uk</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Richard Berendsen</string-name>
          <email>richard.berendsen@luminis.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Luminis Amsterdam</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>NLP&amp;IR Group</institution>
          ,
          <addr-line>UNED</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Warwick</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>113</fpage>
      <lpage>127</lpage>
      <abstract>
        <p>Multilingual Web Person Name Disambiguation is a new shared task proposed for the rst time at the IberEval 2017 evaluation campaign. For a set of web search results associated with a person name, the task deals with the grouping of the results based on the particular individual they refer to. Different from previous works dealing with monolingual search results, this task has further considered the challenge posed by search results written in different languages. This task allows to evaluate the performance of participating systems in a multilingual scenario. This overview summarizes a total of 18 runs received from four participating teams. We present the datasets utilized and the methodology de ned for the task and the evaluation, along with an analysis of the results and the submitted systems.</p>
      </abstract>
      <kwd-group>
        <kwd>person name disambiguation on the web</kwd>
        <kwd>document clustering</kwd>
        <kwd>multilingualism</kwd>
        <kwd>web search</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>It is increasingly usual for people to turn to Internet search engines to look for
information about people. According to Google Trends, three out of the top 10
Google Searches in 2016 were linked to person names5. However, person names
tend to be ambiguous and hence a search for a particular name likely includes
results for different individuals. In these cases, a list of individuals included
in the results along with a breakdown of different individuals would come in
handy for the user who is looking for a particular individual. This task was rst
introduced in the WePS (Web People Search) campaigns6, and attracted
substantial interest in the scienti c community, as manifested in a number of shared
tasks that tackled it, particularly the WePS-1, WePS-2, WePS-3 campaigns [1{
3]. These campaigns provided several annotated corpora becoming a referent for
5 https://trends.google.com/trends/topcharts#geo&amp;date=2016
6 http://nlp.uned.es/weps/
this problem and allowing a comparative study of the performance of different
systems. However, all those campaigns presented a monolingual scenario where
the query results were written in only one language.</p>
      <p>Despite the multilingual nature of the Web7, existing work on person name
disambiguation has not considered yet search results written in multiple
languages. The objective of M-WePNaD task is centered around a multilingual
scenario where the results for a query, as well as each individual, can be written
in different languages.</p>
      <p>The remainder of this paper is organized as follows. Section 2 presents the
task. Next, Section 3 describes the datasets we released for training and testing,
and we brie y discuss the differences between the two sets. Section 4 brie y
describes the evaluation measures. Section 5 summarizes the proposed approaches
of the participants. Section 6 presents and discusses the results. Finally,
conclusions are presented in Section 7.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Task Description</title>
      <p>The M-WePNaD task is a person name disambiguation task on the Web focused
on distinguishing the different individuals that are contained within the search
results for a person name query. The person name disambiguation task can
be de ned as a clustering problem, where the input is a ranked list of n search
results, and the output needs to provide both the number of different individuals
identi ed within those results, as well as the set of pages associated with each
of the individuals.</p>
      <p>
        The heterogeneous nature of web results increases the difficulty of this task.
For instance, some web pages related to a certain individual could be professional
sites (e.g. corporation web pages), while others may contain personal information
(e.g. blogs and social pro les) and both kinds of web pages could have very little
vocabulary in common. Particularly, [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] concluded that the inclusion of content
from social networking platforms increases the difficulty of the task.
      </p>
      <p>While previous evaluation campaigns had been limited to monolingual
scenarios, the M-WePNaD task was assessed in a multilingual setting, considering
the realistic scenario where a search engine returns results in different languages
for a person name query. For instance, web pages with professional information
for an individual who is not a native English speaker may be written in English,
while other personal web pages may be written in their native language.
Celebrities who are known internationally are also likely to have web pages in different
languages.</p>
      <p>
        We compiled an evaluation corpus called MC4WePS [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which was manually
annotated by three experts. This corpus was used to evaluate the performance of
multilingual disambiguation systems, enabling also evaluation for different
document genres as the corpus includes not only web pages but also social media
posts. The corpus was split into two parts, one for training and one for testing.
7 The most used language on the Web is English, followed by Chinese and Spanish.
Participants had nearly two months to develop their systems making use of the
training corpus. Afterwards, the test corpus was released, whereupon
participants ran their systems and sent the results back to the task organizers. The
organizers also provided the performance scores for different baseline approaches.
Participants were restricted to the submission of up to ve different result sets.
In this overview we present the evaluation of these submissions, which we list in
two different rankings.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Data Sets</title>
      <p>The MC4WePS corpus was collected in 2014, issuing numerous search queries
and storing those that met the requirements of ambiguity and multilingualism.</p>
      <p>Each query includes a rst name and last name, with no quotes, and searches
were issued in both Google and Yahoo. The criteria to choose the queries took
into account:
{ Ambiguity: non-ambiguous, ambiguous or highly ambiguous names. A
person's name is considered highly ambiguous when it has results for more than
10 individuals. Cases with 2 to 9 individuals were considered ambiguous,
while those with a single individual were deemed non-ambiguous.
{ Language: results can be monolingual, where all pages are written in the
same language, or multilingual, where pages are written in more than one
language. Additionally, for each cluster of pages belonging to the same
individual, we considered whether the results were monolingual or multilingual.
This was due to the fact that even though the results for a person name
query are multilingual, the clusters for each different individual could be
monolingual or multilingual.</p>
      <p>
        The MC4WePS dataset contains search results of 100 person names with
a number of search results between 90 and 110 each. It is worth noting that
different person names in the corpus have different degrees of ambiguity; in
addition a web page can be multilingual, and not all the content in the corpus
are regular HTML web pages, but also other kinds of documents are included,
such as social media posts or pdf documents. A detailed description of the corpus
can be found in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>There can be overlaps between clusters as a search result could refer to two or
more different individuals with the same name, for instance social pro le pages
with lists of different individuals with the same name. When a search result
does not belong to any individual or the individual cannot be inferred, then this
is annotated as \Not Related" (NR). For each query, these search results are
grouped as a single cluster of NR results in the gold standard annotations.</p>
      <p>The MC4WePS corpus was randomly divided into two parts: training set
(65%) and test set (35%).
3.1</p>
      <sec id="sec-3-1">
        <title>Training Set</title>
        <p>We provided participants with a single set for training, which includes 65
different person names, randomly sampled from the entire dataset. The list of names
and their characteristics can be seen in Table 1. The second part of the table
contains in the last row the average values for the different data of the whole
training set.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Test Set</title>
        <p>The test corpus consists of 35 different person names, whose characteristics can
be seen in Table 2. The last row shows the average values for the different data
of the whole test set.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Comparing Training and Test Sets</title>
        <p>As can be seen in Table 1 and Table 2, the training and test sets have a
comparable average composition with regard to the percentages of NR search results and
social web pages. The two sets are also similar in terms of the distribution of the
most common language for a given person name. In the test set, the percentages
are 28.57% ES, 68.57% EN, and 2.86% FR; whereas in the training set they are
30.76% ES, 67.69% EN, and 1.55% FR.</p>
        <p>
          The main difference between both sets lies in the degree of ambiguity of the
person names. Based on the threshold de ned by Montalvo et al. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] that
determines search results pertaining to more than 10 individuals are very ambiguous,
the training set contains 54% ambiguous person names and 46% very ambiguous
names; on the other hand, the test set contains 40% ambiguous person names
and 60% very ambiguous names. This means that the test set is less balanced
when it comes to the ambiguity of the names than the training set; the test set
contains more very ambiguous names.
3.4
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>Format and Distribution</title>
        <p>The datasets are structured in directories. Each directory corresponds to a
speci c search query that matches the pattern \name-lastname", and includes the
search results associated with that person name. Each search result is in turn
stored in a separate directory, named after the rank of that particular result in
the entire list of search results. A directory with a search result contains the
following les:
{ The web page linked by the search result. Note that not all search results
point to HTML web pages, but there are also other document formats: pdf,
doc, etc.
{ A metadata.xml le with the following information:</p>
        <p>URL of search result.</p>
        <p>ISO 639-1 codes for languages the web page is written in. It contains a
comma-separated list of languages where several were found.</p>
        <p>Download date.</p>
        <p>Name of annotator.
{ A le with the plain text of the search results, which was extracted using
Apache TiKa (https://tika.apache.org/).</p>
        <p>Figure 1 shows an example of the metadata le for a search result for the
person name query Julio Iglesias.</p>
        <p>
          The access to training and test sets was restricted to registered participants.
The blind version of the test dataset did not include the ground truth les.8
8 The MC4WEPS corpus is available at http://nlp.uned.es/web-nlp/resources.
We use three metrics for the evaluation: Reliability (R), Sensitivity (S) and
their harmonic mean F0:5 [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. These metrics generalize the B-Cubed metrics [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]
when there are overlapping clusters, as it is the case with the MC4WePS corpus.
In particular, Reliability extends B-Cubed Precision and Sensitivity extends
BCubed Recall.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Overview of the Submitted Approaches</title>
      <p>Thirteen teams signed up for the M-WePNaD task, although only four of them
managed to participate in the task on time, submitting a total of 18 runs.</p>
      <p>In what follows, we analyze their approaches from two perspectives: search
result representation (including whether or not translation resources were used),
and the clustering algorithms.</p>
      <p>
        { The ATMC UNED team [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] presented four runs that have in common the use
of clustering algorithms able to estimate the number of clusters with no need
of information from training data. Three of the four runs use the ATC
algorithm [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], an algorithm that works in two phases: a phase of cluster creation
followed by a phase of cluster fusion. Run 4 uses the ATCM algorithm [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
which identi es those features written the same way in several languages
(called comparable features ) and gives them a special role when comparing
search results written in different languages without the need of translation
resources. The author explores four different representation approaches: the
textual features of the document with no translation (Run 1), a translated
version of the document (Run 2), a halfway approach that uses the original
document's textual features in the phase of cluster creation and uses a
translation tool to translate the centroid features in the fusion cluster phase (Run
3), and an approach that combines the original document's textual features
in addition to a representation based on using only the comparable features
of web pages written in different language. On the other hand, none of the
four approaches identi es and groups the not related search results, so all of
them get worse results when considering all the web pages. Regarding the
treatment of overlapping clusters, none of the four approaches deals with
them, so a web page can only be in one cluster. Finally, the author applies
an heuristic described in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] to treat in a special way the web pages from
social media platforms and people search engines.
{ The LSI UNED team's approach [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is mainly based on the application of
word embeddings to represent the documents retrieved by the search engine
as vectors. Then, these vectors are used in a clustering algorithm (developed
by the authors and adapted to the characteristics of the corpus), where a
similarity threshold determines when the grouping is carried out. To
obtain the word embeddings, rst they performed a removal of stopwords and
extracted the named entities, using pre-trained word embeddings for
representation. The tools they used were the Stanford Named Entity Recognizer9
and ConVec10, a publicly available collection of word vectors generated from
Wikipedia concepts. To obtain the document representation, the authors
calculated the average vector of all the vectors corresponding to the words
within the document. They calculated the similarity between each document
and the rest of the documents related to the same person name by means
of cosine similarity. The similarity weight associated to each document was
the average of the similarity between that document, and the rest of
documents related to the same person name. Finally, the authors considered
that all the documents with a similarity weight above a speci c threshold
should be gathered in the same initial cluster. This team initially submitted
four runs corresponding to different values of ( 1 = 0 : 70, 2 = 0 : 75,
3 = 0 : 80, and 4 = 0 : 85). Finally, a fth run was also evaluated using
a different con guration of the system, in which all the words within the
documents (except stop words) were considered in order to represent them,
and not only named entities, as in the previous runs. None of their submitted
runs deals with the multilingual nature of the task nor the overlap between
clusters.
{ The Loz Team [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] submitted ve runs that experimented with different
settings a hierarchical agglomerative clustering (HAC) algorithm using the
Euclidean distance as a similarity measure. They tested three different ways
of representing the content: (1) a binary representation capturing presence
or not of each word, (2) a weighted representation capturing the number of
occurrences of each word, and (3) a TF-IDF metric. They also tested two
different stoppage criteria, namely k = 5 and k = 15. With these different
settings, the authors tested the following ve combinations: (1) weighted
representation + k = 5, (2) binary representation + k = 15, (3) TF-IDF +
k = 15, (4) binary representation + k = 5, and (5) TF-IDF + k = 5. As
in the previous team, none of the ve runs developed by this team tackled
the challenges posed by the multilingual nature of the dataset or the overlap
between clusters.
9 https://nlp.stanford.edu/software/CRF-NER.shtml
10 https://github.com/ehsansherkat/ConVec
{ The PanMorCresp Team [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] submitted four runs based on the HAC
algorithm. For all runs, the les with the plain text version of the search result
are used. For each query, vector representations of the text are generated
independently. The text is split into tokens on blank characters. The tokens
are lowercased. Some runs use additional token normalization. Next, words
that occur only once in the collection are removed. After creating the
vocabulary, binary document vectors are created, indicating the presence or absence
of words in a document. The cosine similarity is used to compute
similarities between document vectors. None of the runs use any translation or other
language-speci c decisions. None of the runs try to detect non-related search
results. The four runs then investigate the effect of other typical choices one
has to make when employing HAC. First, token normalization. Run 3 and
Run 4 eliminate punctuation. Second, which words to use in the vocabulary;
besides effectiveness, computational efficiency plays a role here. Run 1 and
Run 2 include the 4,000 most frequent terms. Run 3 and Run 4 remove stop
words and include the 7,500 most frequent remaining terms. Third, how to
compute cluster similarities. Run 1 uses complete linkage, Run 2 uses the
average similarity between documents in both clusters, and Run 3 and Run 4
use single linkage. Fourth, how to de ne the stopping criterion. Run 1 makes
the stopping criterion depend on the query. It computes the average
similarity between documents and divides this by a factor n. On the training set,
this parameter was tuned to n = 2. Run 2, Run 3, and Run 4 use a global
stopping criterion. Run 2 and Run 3 tune a minimal similarity threshold on
the training corpus. For Run 3 the resulting threshold was 0:65; for Run 2 it
is not given. Run 4 uses an exact number of clusters as a stopping criterion.
6
      </p>
    </sec>
    <sec id="sec-5">
      <title>Results and Discussion</title>
      <p>We produced two different rankings of the participants after evaluating all the
submissions:
{ Evaluation results by not considering the Not Related results. This means
that all the results of this kind and the corresponding cluster were not taken
into account.
{ Evaluation results considering all web results. This means that all the results
and the clusters were taken into account.
The results obtained by all their runs are quite similar. However, Run 1 uses
features from the original content of the web pages and gets worse results with
respect to Run 2 and Run 3, which use a machine translation tool. Run 4
compares the web pages written in different language with their comparable features
and gets similar results than Run 2 and Run 3 without the need of translation
resources. The main advantage of this last approach is that it avoids additional
preprocessing steps dedicated to translating the web pages, which is desirable in
problems which have to be solved in real time. The ATMC UNED team has not
proposed any method to group not related web pages, so their Sensitivity and
the F-measure results are worse when considering them in the evaluation.</p>
      <p>The results obtained by the LSI UNED team overcome the results obtained
by the baselines but are lower than results obtained by the ATMC UNED team.
Going into detail, using all the words in the documents (Run 5) is under the run
that only considers named entities and uses the same threshold (Run 2). This
implies that the addition of all the possible words in the documents introduces
more noise than valuable information. On the other hand, in general, if using
named entities and increasing the threshold value, the Reliability increases while
the Sensibility decreases. Finally, considering all web pages can be seen as a
more difficult task, but the number of unrelated web pages in the corpus is small
and hence the results are quite similar between these two settings which use a
threshold-based clustering approach.</p>
      <p>The PanMorCresp Team Run 1 and Run 2 perform about equally well
regardless of whether or not related pages are taken into account in the evaluation.
Run 1 achieves good Reliability, which ts well with the fact that complete
linkage was used. This comes at the cost of a low Sensitivity. For Run 2, the picture
is reversed. Run 3 and Run 4 obtain a higher score than Run 1 and Run 2.
Punctuation removal, stop word removal and the larger vocabulary may play a
role in this. In addition, HAC single linkage was used in both of these runs. Run
4 is the best of the PanMorCresp Team runs; the only difference with regard to
Run 3 is that it uses a xed number of clusters (9) as a stopping criterion. The
score of Run 4 beats both baselines and is on par with scores obtained with the
other approaches save the scores obtained by the ATMC UNED runs.</p>
      <p>Out of the ve runs submitted by the Loz Team, only Run 1 managed to
outperform the ALL-IN-ONE baseline. The rest of the runs only managed to
outperform the ONE-IN-ONE baseline, performing worse than the ALL-IN-ONE
baseline. One of the main reasons why these approaches did not perform as well
may be due to the fact that the multilingualism and overlaps between clusters
have not been considered, posing a signi cant limitation for this task. Their best
performing approach (Run 1) uses a weighted representation of words, which
shows that considering the frequency of words in documents leads to better
performance than the sole use of a binary representation capturing the presence
or not of words as well as TD-IDF. They also found that considering ve clusters
as the stopping criterion, instead of 15, leads to an increased Sensitivity score,
which is however at the expense of a little drop in Reliability.
7</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>The M-WePNaD shared task on Multilingual Web Person Name Disambiguation
took place as part of the IberEval 2017 evaluation campaign. This shared task
was the rst to consider multilingualism in the person name disambiguation
problem, following a series of WePS shared tasks where the corpora were limited
to documents in English. The M-WePNaD shared task provided the opportunity
for researchers to test their systems on a benchmark dataset and shared task,
enabling comparison with one another.</p>
      <p>Despite a larger number of teams registering initially for the task, four of
them managed to submit results on time, amounting to 18 different submissions.
Only two of the four participants, namely the champions and the runners-up,
made use of more sophisticated clustering algorithms, whereas the other two
relied on the Hierarchical Agglomerative Clustering (HAC) algorithm. Only one
of the teams presented an approach that does not require any prior knowledge
to x thresholds, which came from the team that quali ed in the top position.
We argue that this is a desirable characteristic for web page clustering, owing
to the heterogeneous nature of the Web, which poses an additional challenge for
learning generalizable patterns.</p>
      <p>With respect to the approaches used for web page representation, most of the
teams relied on traditional techniques based on bag-of-words and vector space
models, with the exception of the runners-up, who used word embeddings.</p>
      <p>While the novel aspect proposed in this shared task has been the
multilingual nature of the dataset, only one team has proposed approaches that
explicitly tackles multilingualism, ATMC UNED, particularly it has explored three
approaches. The results obtained for these approaches slightly outperform the
one that does not consider multilingualism. On the other hand, the dataset also
included web pages from social media, unlike in previous shared tasks. However,
only one of the teams, ATMC UNED, has taken this into account when
developing their system. None of the systems has dealt with unrelated results and
overlapping clusters.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work has been part-funded by the Spanish Ministry of Science and
Innovation (MAMTRA-MED Project, TIN2016-77820-C3-2-R and MED-RECORD
Project, TIN2013-46616-C2-2-R).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>J.</given-names>
            <surname>Artiles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          and
          <string-name>
            <surname>S. Sekine.</surname>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>The SemEval-2007 WePS Evaluation: Establishing a Benchmark for the Web People Search Task</article-title>
          .
          <source>In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)</source>
          , pages
          <fpage>64</fpage>
          {
          <fpage>69</fpage>
          , Prague, Czech Republic,
          <year>June 2007</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>J.</given-names>
            <surname>Artiles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          and
          <string-name>
            <surname>S. Sekine.</surname>
          </string-name>
          (
          <year>2009</year>
          )
          <article-title>Weps 2 Evaluation Campaign: Overview of the Web People Search Clustering Task</article-title>
          .
          <source>In 2nd Web People Search Evaluation Workshop (WePS</source>
          <year>2009</year>
          ),
          <source>18th WWW Conference</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>J.</given-names>
            <surname>Artiles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Borthwick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sekine</surname>
          </string-name>
          and
          <string-name>
            <surname>E. Amigo.</surname>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>WePS-3 Evaluation Campaign: Overview of the Web People Search Clustering and Attribute Extraction Tasks</article-title>
          .
          <source>In Third Web People Search Evaluation Forum (WePS-3)</source>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>R.</given-names>
            <surname>Berendsen</surname>
          </string-name>
          ,
          <article-title>Finding people, papers, and posts: Vertical search algorithms</article-title>
          and evaluation,
          <source>Ph.D. thesis</source>
          , Informatics Institute, University of Amsterdam (
          <year>2015</year>
          ). URL: http://dare.uva.nl/document/2/165379
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>S.</given-names>
            <surname>Montalvo</surname>
          </string-name>
          , R. Mart nez, L.
          <string-name>
            <surname>Campillos</surname>
            ,
            <given-names>A. D.</given-names>
          </string-name>
          <string-name>
            <surname>Delgado</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Fresno</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Verdejo</surname>
          </string-name>
          .
          <article-title>MC4WePS: a multilingual corpus for web people search disambiguation, Language Resources and Evaluation (</article-title>
          <year>2016</year>
          ). URL: http://dx.doi.org/10.1007/s10579- 016-9365-4.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>E.</given-names>
            <surname>Amigo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Verdejo</surname>
          </string-name>
          . A
          <article-title>General Evaluation Measure for Document Organization Tasks</article-title>
          .
          <source>In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR</source>
          <year>2013</year>
          ), pp.
          <fpage>643</fpage>
          -
          <lpage>652</lpage>
          . Dublin, Ireland,
          <year>2013</year>
          . URL: http://doi.acm.
          <source>org/10</source>
          .1145/2484028.2484081.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>A.</given-names>
            <surname>Bagga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Baldwin</surname>
          </string-name>
          ,
          <article-title>Entity-based cross-document coreferencing using the vector space model</article-title>
          .
          <source>In Proceedings of the 17th International Conference on Computational Linguistics - Volume</source>
          <volume>1</volume>
          , COLING'98,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computational Linguistics, Stroudsburg, PA, USA,
          <year>1998</year>
          , pp.
          <fpage>79</fpage>
          -
          <lpage>85</lpage>
          . URL http://dx.doi.org/10.3115/980451.980859.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>A.</given-names>
            <surname>Delgado</surname>
          </string-name>
          .
          <article-title>ATMC team at M-WePNaD task</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2017</year>
          ), Murcia, Spain,
          <source>September 19, CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2017</year>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>A.D. Delgado</surname>
            , R. Mart nez,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Montalvo</surname>
            and
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Fresno</surname>
          </string-name>
          .
          <article-title>Person Name Disambiguation in the Web Using Adaptive Threshold Clustering</article-title>
          .
          <source>Journal of the Association for Information Science and Technology</source>
          ,
          <year>2017</year>
          . URL: https://doi.org/10.1002/asi.23810.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>A.D. Delgado</surname>
          </string-name>
          : Desambiguacion de nombres de
          <article-title>persona en la Web en un contexto multilingue</article-title>
          .
          <source>PhD Thesis</source>
          , E.T.S. Ingenier a Informatica,
          <string-name>
            <surname>UNED</surname>
          </string-name>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>A.D. Delgado</surname>
            , R. Mart nez,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Montalvo</surname>
            and
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Fresno</surname>
          </string-name>
          . Tratamiento de redes sociales en desambiguacion de nombres de persona en la web.
          <source>Procesamiento del Lenguaje Natural</source>
          ,
          <volume>57</volume>
          :
          <fpage>117</fpage>
          -
          <lpage>124</lpage>
          ,
          <year>2016</year>
          . URL: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/5344.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>A.</given-names>
            <surname>Duque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Araujo</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Mart</surname>
          </string-name>
          nez-Romo.
          <article-title>LSI UNED at M-WePNaD: Embeddings for Person Name Disambiguation</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2017</year>
          ), Murcia, Spain, September 19,
          <string-name>
            <given-names>CEUR</given-names>
            <surname>Workshop</surname>
          </string-name>
          <article-title>Proceedings</article-title>
          . CEURWS.org,
          <year>2017</year>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. L. Lozano, Jorge
          <string-name>
            <surname>Carrillo-de-Albornoz</surname>
            and
            <given-names>E. Amigo. UNED</given-names>
          </string-name>
          <string-name>
            <surname>Loz</surname>
          </string-name>
          <article-title>Team at MWePNaD</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2017</year>
          ), Murcia, Spain,
          <source>September 19, CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2017</year>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>P.</given-names>
            <surname>Panero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Moreno</surname>
          </string-name>
          , T. Crespo, Jorge
          <string-name>
            <surname>Carrillo-de-Albornoz</surname>
            and
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Amigo. UNED PanMorCrepsTeam at M-</surname>
          </string-name>
          <article-title>WePNaD</article-title>
          .
          <source>In: Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval</source>
          <year>2017</year>
          ), Murcia, Spain, September 19,
          <string-name>
            <given-names>CEUR</given-names>
            <surname>Workshop</surname>
          </string-name>
          <article-title>Proceedings</article-title>
          . CEURWS.org,
          <year>2017</year>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>