<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic Tag Recommendation for the UN Humanitarian Data Exchange</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ghadeer Abuoda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chad Hendrix</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stuart Campo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Science and Engineering, Hamad Bin Khalifa University</institution>
          ,
          <country country="QA">Qatar</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>United Nations Ofice for the Coordination of Humanitarian Afairs (OCHA), Centre for Humanitarian Data</institution>
          ,
          <country country="NL">Netherlands</country>
        </aff>
      </contrib-group>
      <fpage>4</fpage>
      <lpage>10</lpage>
      <abstract>
        <p>We have recently seen a rapid growth of data portals and dataset repositories being made available on the Web. While these repositories have been critical for advancing research, much work remains to improve finding appropriate datasets and relevant sources. Search engines, the primary tools for dataset discovery, are mainly keyword-based over published metadata of the datasets, whether within dataset repositories or over the Web. However, in most cases, the available metadata may not encompass the essential information the user needs to decide whether the dataset fits a given task. Therefore, data publishers should annotate their datasets with informative metadata when they add them to a dataset repository. Tags are a particular form of metadata that the data publisher uses to describe their view of how the dataset should be categorized. An interesting problem is how to automate the process of recommending tags to data publishers when they add new data to a dataset repository. In this paper, we develop an approach for automatic tag recommendation for dataset repositories. We investigate how to exploit the features of the dataset and the tagging history in the repository to build an efective tag recommendation model. We further demonstrate the integration of the model in the The Humanitarian Data Exchange, a real-world dataset repository in the social and humanitarian domains.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Dataset Repository</kwd>
        <kwd>Dataset Tagging</kwd>
        <kwd>Keyword Search</kwd>
        <kwd>Tag Recommendation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Nowadays, many dataset repositories and data portals are created by diferent organizations to
facilitate sharing and distribution of datasets. Online platforms like CKAN,1 Quandl Kaggle,2
and Microsoft Azure Marketplace3 are examples of dataset repositories that host datasets for
data-driven research in a wide range of domains. The data in these repositories is usually
tabular (e.g., CSV files), and the goal of the repositories is to enable data scientists to find, access,
integrate, and analyze combinations of datasets based on their needs. The first step in this
process is to find the datasets relevant to a task, which requires information retrieval. Currently,
dataset repositories use search engines that were mainly developed for unstructured textual
documents. To improve retrieval quality, dataset repositories typically allow data publishers
to add metadata with their datasets, i.e., structured information about the data [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The search
engines rely on this metadata in addition to the content of the datasets to guide the users toward
relevant datasets. Thus, high-quality metadata plays an important role in enabling users to find
datasets relevant to their needs [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
      </p>
      <p>
        One type of metadata that data publishers often use to annotate and label their datasets
is tags [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In particular, publishers can assign freely chosen keywords to datasets with the
purpose of referencing these datasets later on with the help of these assigned tags. A dataset
publisher can define their tags to describe a dataset as a whole or emphasize a certain topic
that is only relevant to the dataset. A fundamental issue that underlies the efectiveness of
user-defined tags is the quality and the relevance of these tags [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. On the one hand, these tags
represent a more flexible way of describing content than a fixed taxonomy with a controlled
vocabulary, which means that tags should be freely chosen by data publishers. On the other
hand, tags should be correctly formed and spelled, relevant to the content and its terms, and not
repetitive or ambiguous. To balance these conflicting goals, a tag recommendation method can
assist data publishers in the tagging process to improve the quality of the available metadata
about their datasets [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Good tag recommendation can benefit not only search, but also other
services that rely on tags such as content recommendation and categorization.
      </p>
      <p>In this paper, we present a tag recommendation model and show how we applied it
efectively to improve information retrieval for datasets in the Humanitarian Data Exchange (HDX)
platform, in service of the data scientists who use this platform. The main idea of our model is
to analyze the metadata and tagging history associated with existing datasets to find candidate
tags. We propose a way of integrating the model in the dataset upload page, which
encourages data publishers to attach informative tags to their dataset when they first upload them.
Automatic tag recommendation raises user confidence when interacting with the platform: (i)
dataset publishers feel more confident that they are not guessing how they can tag; the HDX
platform makes them feel supported, and (ii) users who come to the HDX platform looking for
information have more confidence because they have a more accurate picture of the datasets.</p>
      <p>As an example of our model recommendations on the HDX platform, consider the dataset
Nigeria: 2018 Education Secondary Data Review (SDR)4 published by the Nigeria Education
in Emergencies Working Group. The dataset contains assessment reports for humanitarian
missions in the education domain. The dataset is currently tagged with only two tags,
“EDUCATIONNEEDS” and “ASSESSMENT”. Our model recommends additional tags and ranks them
based on similarity to the dataset, with the top three tags being “Nigeria complex emergency”,
“education”, and “education cluster”. This gives the dataset publisher meaningful tagging options
and higher confidence when tagging their dataset.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        In dataset repositories, tags form the source for enriching taxonomies in evolving and
dynamic content published in these repositories [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Moreover, many metadata standards were
developed to aid researchers in sharing research (data, code, publications) that rely on tagging
techniques [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Additionally, dataset-centered search engines rely extensively on metadata
4https://data.humdata.org/dataset/nigeria-2018-education-secondary-data-review-sdr
generally and tags specifically for dataset discovery and retrieval. For instance, Google dataset
search engine [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] crawls the web for all datasets and collect the associated metadata. These
standards and tools are efective only if the metadata and tags are mainly correct and maintained.
However, in practice, most datasets have incomplete or non-existent metadata [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Therefore,
there is a need for work like ours to automate the creation of metadata.
      </p>
      <p>
        Tag recommendation services have a direct benefit to IR services such as search [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and
query expansion [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. There are many well-studied approaches for tag recommendation, such
as content-based methods, collaborative filtering methods, and hybrid methods [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Regardless
of the type of tag recommendation method, the challenge in tag recommendation is always in
ifnding the appropriate set of tags that better describe a given resource.
      </p>
      <p>
        Text analysis has long been recognized as a useful technique for extracting informative tags
for web resources. In this approach, each resource (in our case a dataset) is represented as
a document through a vector of all word occurrences weighted by term frequency-inverse
document frequency (TF-IDF) or statistical topic modeling techniques. Various tag
recommendation techniques have been proposed relying on diferent representations of the resources
and computing the similarity between diferent resources in addition to mining the historical
occurrence of tags [
        <xref ref-type="bibr" rid="ref11 ref12 ref13 ref14">11, 12, 13, 14</xref>
        ]. Next, we present how we use text analysis techniques to
recommend tags in HDX.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Tagging on the Humanitarian Data Exchange (HDX)</title>
      <p>The Humanitarian Data Exchange (HDX)5 is an open platform for sharing data across crises
and organizations. The HDX platform is managed by the Centre for Humanitarian Data of the
United Nations Ofice for the Coordination of Humanitarian Afairs (OCHA). The platform hosts
more than 17,000 datasets shared by hundreds of organizations covering humanitarian crises
around the world. The goal of the HDX platform is to make humanitarian data easy to find
and use for analysis. HDX platform has a search-engine interface that allows users to search
datasets via keywords or a faceted search on features such as location, organization, licenses,
etc. The returned datasets are presented in a structure-aware fashion, exposing attributes of the
datasets (number of downloads, tags, dataset owner, format, etc.) and enabling users to explore
diferent quick charts of the datasets or develop their own visualizations. A keyword search
relying on user-generated metadata is the most common way to find a specific dataset on the
platform. One crucial factor in defining the quality of the search results on the HDX platform is
the quality and richness of the metadata, mainly the tags provided by dataset publishers.</p>
      <p>On the HDX platform, the tag usually refers to a concept (e.g., health, education, camps), a
specific crisis (e.g., Syria, Darfur), the type of the crisis (e.g., earthquake, hurricane), and/or the
organization that collected the dataset (e.g., UNICEF, Education Above All). At the time of this
work, there was no defined list of tags, and data publishers could use free text to tag datasets.</p>
      <p>The HDX technical team reported that, in a particular sample of 19,171 search queries, only
8,114 resulted in actual downloads of HDX datasets. One possible interpretation of this gap
between search requests and dataset downloads is that users may not be satisfied with the
search results or could not find the information they expected. Since tags play an important
role in search quality, we propose a method for improving the tagging process on the HDX
platform with the goal of improving search quality and user engagement.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Our Proposed Tag Recommender</title>
      <p>Our model takes as input the set of tagged datasets in the repository, and an input target dataset
. The model should provide a list of top  candidate tags, sorted according to their relevance
to dataset . In this work, we investigate recommending tags that are relevant to target dataset
 by utilizing various types of information: (i) previously assigned tags in the repository, (ii)
terms extracted from textual features of the datasets in the repository (e.g., title, description,
etc.), and (iii) terms extracted from the target dataset.</p>
      <p>Developing our model on the HDX platform required us to address several challenges. First,
the amount of metadata available varies widely between datasets. In some datasets, the metadata
can contain around 1,000 diferent terms, while other datasets can barely reach 40 terms.
Thus, our approach needs to enrich the metadata of datasets with a few terms by choosing
the important words in the datasets’ content. Second, data publishers use numbers, special
characters, and hyperlinks in the description of their dataset. This content afects the ability to
match with predefined tags and to define similarity in any approach (e.g., “Syria crisis-2011” is
diferent from “Syria crisis”). Third, data publishers sometimes provide the description of their
datasets in PDF files, not free text. In some cases, the attached PDF file reflects the project in
which this dataset was collected, not a description of the dataset itself. Fourth, the tags used for
HDX may refer to the same concept with diferent terms (e.g., education vs. learning; sex/age
rate vs. demographics; displaced people location vs. displacement and shelter). Moreover, the
valid list of tags contains more specific concepts (e.g., education in emergencies, education
facilities). Fifth, data publishers use acronyms as tags. They use diferent acronyms to refer
to the same concept (e.g., using both ‘3W’ or ‘3Ws’ to refer to a ‘who-is-doing-what-where’
dataset). Alternatively, they may use acronyms in a way that will change the meaning and
make finding a match in the valid tag list even harder (e.g., using ‘pin’ to mean ‘people in need’).
Finally, the tags can be variations on the same term (e.g., refugee vs. refugees).</p>
      <p>To address these challenges, we developed a model that analyzes the metadata of a dataset
through diferent phases using of-the-shelf text processing techniques. The main phases of our
recommendation model are depicted in Fig. 1, and are summarized in the rest of this section.
Acquiring Metadata Using the HDX API,6 we extract the metadata collection associated
with a group of datasets of interest. For example, we may be interested in the education
domain and thus focus on datasets annotated with the “education” tag. The metadata elements
extracted for each dataset are: the title of the dataset, the tags assigned to the dataset by the
data publisher, the organization that provided this dataset, the source of the dataset is diferent
from the organization, the URL which enables us to crawl the content of public datasets to
enrich the metadata with information from the dataset header, the countries mentioned in the
metadata object, whether the dataset has geodata, and the free-form note describing the dataset.
The output of this phase is a record of terms extracted from the HDX metadata for each dataset.
Preprocessing and Cleaning This phase includes tokenization, stemming (e.g., “refugees”
and “refugee” become the same token), and removing numbers/special
characters/links/stopwords/non-English terms.</p>
      <p>
        Candidate Tag Extraction The set of terms extracted from the metadata of the dataset
collection is our vocabulary. We extract candidate tags from this vocabulary that could be an
individual term or a pair of co-occurring terms. An important step in our work was to
evaluate diferent methods for defining candidate tags (results in the next section). We evaluated:
(1) scoring each vocabulary term based on Term Frequency (TF) and using the terms with the
top TF scores as candidate tags, (2) combining TF with Inverse Document Frequency and using
the top TF-IDF terms, and (3) extracting frequent co-occurring terms from the vocabulary using
N-grams to help decide which N-terms can be chunked together to form a single tag.
Tag Expansion Our metadata acquisition step extracts the set of previously used tags in the
repository. In the tag expansion phase, we enrich these tags by adding related terms. We expand
by adding related terms from WordNet7 (e.g., “teaching”, “pedagogy”, and “didactics" are added
to the “education” tag). We also consider enriching the tags using similar terms based on the
word2vec model [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>Computing Similarity The model identifies a set of candidate tags from the vocabulary. It
also uses a similar process to identify a set of candidate tags for the target dataset . We need
a similarity measure between the tags in the two sets. We explore diferent representation
techniques such as vector encoding, TF, TF–IDF, and distributed representations (i.e., word2vec).
We compute the cosine similarity between the representation of the candidate tags from the
vocabulary and the target dataset.</p>
      <p>Tags Recommendation We now reach the key step in our approach: recommending tags for
the target dataset. Our model ranks the candidate tags by their similarity to the target dataset 
and recommends the top  candidate tags.</p>
      <p>Setting Thresholds We need to compute TF and TF–IDF thresholds to guide the creation of the
vocabulary. These thresholds define the cut-of points that determine which terms to eliminate
from the vocabulary. In order to determine the thresholds in our model, we test diferent cut-of
6https://github.com/OCHA-DAP/hdx-python-api
7https://wordnet.princeton.edu/
tfidf
tf
tf+ngram
tfidf+ngram
tf+word2vec
tfidf+word2vec
tfidf+ngram+word2vec</p>
      <p>62.2
58.0
values and observe their efect on vocabulary size. There is typically a cut-of value where going
higher leads to a significant reduction in vocabulary size, and this is the cut-of value we use.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental Evaluation</title>
      <p>Datasets and Tag Selection We used a subset of HDX datasets to develop and evaluate our
model. There were around 3000 private and public datasets that are annotated with the tag
“education”. We sampled 80% of these datasets to build the vocabulary of the model while the
remaining 20% were used for evaluating the recommended tags.</p>
      <p>
        Evaluation Our model recommends  tags for each dataset (we set  in the range 3-5). Our
evaluation metric is the percentage of these tags that is already used in tagging the dataset.
This is a recall metric [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. The vocabulary consists of around 1800 terms. Term frequency
and document frequency vary widely, and thresholds TF=20 and DF=30% worked best. Fig. 2
shows the recall of diferent methods of building the vocabulary. Using frequency to identify
candidate tags achieves around 30% recall. Adding N-grams (N=2) boosts recall by around 20
percentage points. Using word2vec is not efective, even when combined with N-grams. Thus,
we recommend using TF-IDF and N-grams, but not the more complex word2vec.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>We presented an approach to automatically recommend tags for datasets on the HDX platform.
The efectiveness of our model lies in using existing metadata in the dataset repository in
addition to the textual features of a dataset to recommend informative tags. Our goal is for
better tags to lead to better search results and user engagement on the HDX platform.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Khalsa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cotroneo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>A survey of current practice of data search services</article-title>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chapman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Simperl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Koesten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Konstantinidis</surname>
          </string-name>
          , L.
          <string-name>
            <surname>-D. Ibáñez</surname>
            , E. Kacprzak,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Groth</surname>
          </string-name>
          ,
          <article-title>Dataset search: a survey</article-title>
          ,
          <source>The VLDB Journal</source>
          <volume>29</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Noy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Burgess</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Brickley</surname>
          </string-name>
          ,
          <article-title>Google dataset search: Building a search engine for datasets in an open web ecosystem</article-title>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Raferty</surname>
          </string-name>
          , Tagging,
          <source>KO KNOWLEDGE ORGANIZATION 45</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Belém</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Gonçalves</surname>
          </string-name>
          ,
          <article-title>A survey on tag recommendation methods</article-title>
          ,
          <source>Journal of the Association for Information Science and Technology</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Nargesian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. Q.</given-names>
            <surname>Pu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. Ghadiri Bashardoost</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Organizing data lakes for navigation</article-title>
          ,
          <source>in: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Rocca-Serra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gonzalez-Beltran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ohno-Machado</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. Alter,</surname>
          </string-name>
          <article-title>The data tags suite (dats) model for discovering data access and use requirements</article-title>
          ,
          <source>GigaScience</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Tygel</surname>
          </string-name>
          ,
          <article-title>Semantic tags for open data portals: Metadata enhancements for searchable open data</article-title>
          , Federal University of Rio de Janeiro (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>M.-H. Hsu</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-H. Chen</surname>
          </string-name>
          ,
          <article-title>Eficient and efective prediction of social tags to enhance web search</article-title>
          ,
          <source>Journal of the American Society for Information Science and Technology</source>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>V.</given-names>
            <surname>Oliveira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Gomes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Belém</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Brandao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ziviani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gonçalves</surname>
          </string-name>
          ,
          <article-title>Automatic query expansion based on tag recommendation</article-title>
          ,
          <source>in: Proceedings of the 21st ACM international conference on Information and knowledge management</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. H.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>An eficient tag recommendation method using topic modeling approaches</article-title>
          ,
          <source>in: Proceedings of the International Conference on Research in Adaptive and Convergent Systems</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Krestel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fankhauser</surname>
          </string-name>
          , W. Nejdl,
          <article-title>Latent dirichlet allocation for tag recommendation</article-title>
          ,
          <source>in: Proceedings of the third ACM conference on Recommender systems</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>W.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kataria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Caragea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Giles</surname>
          </string-name>
          , L. Rokach,
          <article-title>Recommending citations: translating papers into references</article-title>
          ,
          <source>in: Proceedings of the 21st ACM international conference on Information and knowledge management</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>B.</given-names>
            <surname>Sigurbjörnsson</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. Van Zwol</surname>
          </string-name>
          ,
          <article-title>Flickr tag recommendation based on collective knowledge</article-title>
          ,
          <source>in: Proceedings of the 17th international conference on World Wide Web</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Corrado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          ,
          <source>in: Advances in neural information processing systems</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>K. M. Ting</surname>
          </string-name>
          , Precision and Recall,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>