<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Assessing the Semantic Dificulty of Queries*</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Discussion Paper</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guglielmo Faggioli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Marchesin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Padova</institution>
          ,
          <addr-line>Padova</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <abstract>
        <p>Traditional Information Retrieval (IR) models, also known as lexical models, are hindered by the semantic gap, which refers to the mismatch between diferent representations of the same underlying concept. To address this gap, semantic models have been developed. Semantic and lexical models exploit complementary signals that are best suited for diferent types of queries. For this reason, these model categories should not be used interchangeably, but should rather be properly alternated depending on the query. Therefore, it is important to identify queries where the semantic gap is prominent and thus semantic models prove efective. In this work, we quantify the impact of using semantic or lexical models on diferent queries, and we show that the interaction between queries and model categories is large. Then, we propose a labeling strategy to classify queries into semantically hard or easy, and we deploy a prototype classifier to discriminate between them.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The semantic gap is a long-standing problem in Information Retrieval (IR) that refers to the
diference between the machine-level description of document and query contents and the
human-level interpretation of their meanings [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In other words, it represents the mismatch
between users’ queries and the way retrieval models understand such queries [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        The semantic gap afects any domain, but it is prominent in medical search [
        <xref ref-type="bibr" rid="ref2 ref4 ref5">4, 5, 2</xref>
        ]. For
instance, a query containing the word “tumor” might not be efectively answered if the retrieval
model does not identify the synonymy relationship between “tumor” and, for example,
“neoplasm”. Conversely, given a query containing the term “cold”, a retrieval model might retrieve
erroneous documents if it does not distinguish between the diferent meanings the term “cold”
assumes depending on the context. These queries are known as semantically hard queries [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Traditional IR models, which are known as lexical models, fail to efectively address
semantically hard queries. Semantic models were thus introduced to bridge the semantic gap [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and
to overcome the limitations of lexical models. However, semantic models have been shown
to provide complementary signals to lexical models that prove efective for semantically hard
queries, but less for other queries [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Thus, it becomes necessary to identify what category of
models – between lexical and semantic – best suits a user query given the document collection
at hand. In other words, we need to understand what are the inherent features of query and
documents that make lexical or semantic models more efective. To this end, we address the
following research questions: RQ1 How and to what extent does the semantic gap impact
query performance? RQ2 What features determine the prominence of the semantic gap within
queries? For RQ1, we investigate and compare the impact of lexical and semantic models on
diferent topics. How large is the interaction between topics and model categories? To what
extent does this interaction reflect in the diferent topic formulations (i.e., queries)?
For RQ2, we explore a set of well-known features that relate to lexical and semantic models.
In particular, we seek to understand whether pre-retrieval features can be used to categorize
queries as semantically easy or hard.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Experimental Analysis</title>
      <p>
        We consider two collections in the following analyses: OHSUMED [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and TREC-COVID (Round
1) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Regarding lexical models, we consider TF-IDF [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], BM25 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], QLM [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], DFR [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], and
DFI [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. As for semantic ones, we consider W2V [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], NVSM [17], and the three variants of
SAFIR [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We evaluate models using AP. Table 1 reports the performance on both collections.
      </p>
      <sec id="sec-2-1">
        <title>2.1. RQ1: Topic and Category Interaction</title>
        <p>
          Several works have shown that queries strongly interact with retrieval models in
determining their performance [18, 19, 20]. This means that two models might have similar average
performance on a set of queries but, when looked at the query-level, their performance might
vary greatly. Such consideration also applies to lexical and semantic models. Some queries
are best suited to semantic models, while others to lexical ones [
          <xref ref-type="bibr" rid="ref6 ref8">8, 6</xref>
          ]. We are thus interested
in quantifying the interaction between queries and model categories. To determine whether
the models category – that is, lexical or semantic – has a significant efect on performance,
we conduct an ANOVA on the runs obtained with the considered retrieval models. ANOVA is
a well-known statistical technique that allows identifying statistically significant diferences
among experimental conditions. Several works in IR applied ANOVA to determine the efect of
diferent factors on the overall performance of an IR system [18, 21, 19, 22]. ANOVA models
the explained variable, which in our case is Average Precision (AP), as a linear combination of
the efect of each factor in the experimental setup, plus an error component. The error term
accounts for the variance in the data unexplained by the model. From the ANOVA on our data,
we observe that the efect of the sole models category is not significant (p-value &gt;0.05) – which
means that lexical and semantic categories are not statistically significantly diferent. We cannot
say that either lexical or semantic models perform best in absolute terms. The topic-category
interaction is significant and the 2 value for the strength of association of 34.7% indicates a
large efect. This means that the category significantly impacts on how good the results on a
specific topic will be. Such a finding suggests that the semantic gap is an inherent property of the
topics, less related to the specific retrieval models and more on their category. To further support
this intuition, the interaction between the topic and the category is larger than the efect of the
sole model. Thus, if we understand when a topic is lexical or semantic, we can achieve large
performance improvements. As for TREC-COVID, each topic is represented by four diferent
formulations: query, description, narrative and concatenation of query and description. Each
formulation of a topic can only be used in relation to that topic thus formulations have to be
treated as a nested factor inside the topic. From the results on TREC-COVID we observe that
both the topic and its formulations have a large efect. The importance of the formulation factor
indicates that, with an appropriate topic formulation, the performance on the topic can change
greatly. ANOVA shows that the interaction between the topic and the models category is large
(2 =39%) – larger than the efect of both the sole category (2.1%) and the model (30.4%). Also
the interaction between the topic formulation and the models category is large (2 =19.7%),
although not as large as the one between topic and category. This suggests that the semantic
gap relates more to the underlying information need than the diferent topic formulations.
        </p>
        <p>We hypothesize that the relation between topics and model categories, highlighted by ANOVA,
links to the semantic gap and to the association of a topic with its relevant documents. For
instance, if a topic has many relevant documents containing synonyms of the query terms,
then a semantic model might be best suited. In fact, in this case, most of the topic formulations
do not contain all the possible query synonyms and will thus be afected by the semantic gap.
Conversely, topics that can be easily represented by few keywords – likely present in relevant
documents – have less ambiguous formulations, which are best suited to lexical models.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. RQ2: Features Importance for the Semantic Gap</title>
        <p>Section 2.1 showed the impact of choosing the proper category depending on the target query.
If we could classify queries as semantically hard or easy, we might also adopt an IR model from
the right category. To train a classifier for doing that, we need i) to label queries as “semantic”
or “lexical”, and ii) to find a set of features that correlate with such aspects of the queries.</p>
        <p>
          The first aspect we address is the labeling of queries as “semantic” or “lexical”. The absence
of a rigorous definition of semantically hard or easy for a query prevents us from manually
labeling queries as “semantic” or “lexical”. Therefore, we propose to label queries according to
how the two models categories perform on them. To this end, we first compute the average
performance of each model. Then, for each query, we perform the following three steps. First,
we compute for each model the relative improvement over its average performance. Secondly,
we determine whether the relative improvement is, on average, greater for lexical or semantic
models. Finally, we label the considered query as “semantic” if the improvement over the
average model performance is greater for semantic models than for lexical ones; vice versa,
we label the query as “lexical”. Note that we do not consider absolute performances to label
queries, since even a poorly performing lexical method like TF-IDF (cfr. Table 1) might prove
efective when the query is semantically easy. Thus, we focus on relative improvements, which
provide more robust signals to performance outliers. To address the second aspect of RQ2,
we explore two diferent sets of pre-retrieval features: Lexical- and Semantic-oriented features.
Lexical-oriented features are based on query and corpus statistics and depend on the distribution
of terms within the collection. Regarding semantic-oriented features, we first perform semantic
indexing on OHSUMED and TREC-COVID collections as in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Then, we adopt features similar
to those proposed by Mothe and Tanguy [23], but, instead of considering only query-based
features, we take into account both query- and corpus-based features. The considered features
are reported and described in the original paper [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Consequently, we employ three
wellknown classification models to understand the efectiveness of the considered pre-retrieval
features when used to classify queries into lexical and semantic categories. The adopted models
are: Decision Tree (DTr), Support Vector Machine (SVM), and Multi-Layer Perceptron (MLP). To
perform experiments, we label queries using the process described above. For each classifier,
we perform grid search with cross-validation to obtain the best hyper-parameters. We adopt
5-fold cross-validation for TREC-COVID, whereas we use 3-fold cross-validation for OHSUMED
to avoid obtaining single-class folds due to the low number of samples. The results of the
diferent classifiers are reported in Table 2, where we report mean and standard deviation over
the diferent folds. To determine results significance (marked as †), we apply a randomization
test with Bonferroni correction for multiple comparisons [24].
        </p>
        <p>The preliminary – yet promising – results highlight that the considered lexical- and
semanticoriented features relate with models categories. Therefore, they can be used as a starting
point to investigate the presence of the semantic gap within test collections and to build better
approaches for category selection.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Conclusion</title>
      <p>We investigated the impact of the semantic gap on query performance, which features can be
used to determine this gap, and whether we can exploit them to classify query as semantically
easy (“lexical”) or hard (“semantic”). Using ANOVA we studied the interaction between IR
models and information need, observing that the semantic gap relates more to the underlying
information need than the diferent topic formulations. Then, we proposed a labeling strategy,
based on relative improvements, to annotate queries as “semantic” or “lexical”. Finally, we
explored two diferent sets of pre-retrieval features and we deployed a prototype classifier
to understand the efectiveness of such features when used to classify queries. We obtained
promising results, which suggest a link between the used features and the models categories.
Vector Space, in: Proc. of the 1st International Conference on Learning Representations,
ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, 2013.
[17] C. Van Gysel, M. De Rijke, E. Kanoulas, Neural vector spaces for unsupervised information
retrieval, ACM Trans. Inf. Syst. 36 (2018) 1–25.
[18] D. Banks, P. Over, N.-F. Zhang, Blind Men and Elephants: Six Approaches to TREC data,</p>
      <p>Information Retrieval 1 (1999) 7–34.
[19] N. Ferro, G. Silvello, Toward an Anatomy of IR System Component Performances, J. Assoc.</p>
      <p>Inf. Sci. Technol. 69 (2018) 187–200.
[20] J. S. Culpepper, G. Faggioli, N. Ferro, O. Kurland, Topic dificulty: Collection and query
formulation efects, ACM Transactions on Information Systems 40 (2021).
[21] E. Voorhees, D. Samarov, I. Soborof, Using Replicates in Information Retrieval Evaluation,</p>
      <p>ACM Trans. Inf. Syst 36 (2017) 12:1–12:21.
[22] G. Faggioli, N. Ferro, System efect estimation by sharding: A comparison between anova
approaches to detect significant diferences, in: Proc. of the 43rd European Conference on
IR Research, ECIR 2021, Virtual Event, March 28 - April 1, 2021, Springer International
Publishing, Cham, 2021, pp. 33–46.
[23] J. Mothe, L. Tanguy, Linguistic Features to Predict Query Dificulty, in: Proc. of the
Predicting query dificulty-methods and applications workshop, co-located with the ACM
Conference on research and Development in Information Retrieval, SIGIR 2005, 2005, pp.
7–10.
[24] P. Sedgwick, Multiple significance tests: the bonferroni correction, Bmj 344 (2012).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Faggioli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Marchesin</surname>
          </string-name>
          ,
          <article-title>What makes a query semantically hard?</article-title>
          ,
          <source>in: Proc. of the Second International Conference on Design of Experimental Search &amp; Information REtrieval Systems</source>
          , Padova, Italy,
          <source>September 15-18</source>
          ,
          <year>2021</year>
          , volume
          <volume>2950</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>61</fpage>
          -
          <lpage>69</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Koopman</surname>
          </string-name>
          , G. Zuccon,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bruza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sitbon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lawley</surname>
          </string-name>
          ,
          <article-title>Information retrieval as semantic inference: a Graph Inference model applied to medical search</article-title>
          ,
          <source>Inf. Retr. Journal</source>
          <volume>19</volume>
          (
          <year>2016</year>
          )
          <fpage>6</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. I. Grosky</surname>
          </string-name>
          ,
          <article-title>Narrowing the semantic gap - improved text-based web document retrieval using visual features</article-title>
          ,
          <source>IEEE Trans. Multimedia</source>
          <volume>4</volume>
          (
          <year>2002</year>
          )
          <fpage>189</fpage>
          -
          <lpage>200</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Edinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bedrick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. H.</given-names>
            <surname>Ambert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. R.</given-names>
            <surname>Hersh</surname>
          </string-name>
          ,
          <article-title>Barriers to Retrieving Patient Information from Electronic Health Record Data: Failure Analysis from the TREC Medical Records Track</article-title>
          ,
          <source>in: AMIA</source>
          <year>2012</year>
          ,
          <article-title>American Medical Informatics Association Annual Symposium</article-title>
          , AMIA,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Koopman</surname>
          </string-name>
          , G. Zuccon,
          <article-title>Why Assessing Relevance in Medical IR is Demanding</article-title>
          ,
          <source>in: Proc. of the Medical Information Retrieval Workshop</source>
          at SIGIR co
          <article-title>-located with the 37th annual international ACM SIGIR conference (ACM SIGIR</article-title>
          <year>2014</year>
          ), volume
          <volume>1276</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>16</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Agosti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Marchesin</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Silvello, Learning Unsupervised Knowledge-Enhanced Representations to Reduce the Semantic Gap in Information Retrieval</article-title>
          ,
          <source>ACM Trans. Inf. Syst</source>
          .
          <volume>38</volume>
          (
          <year>2020</year>
          )
          <volume>38</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          :
          <fpage>48</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          , Semantic Matching in Search, Found.
          <source>Trends Inf. Retr</source>
          .
          <volume>7</volume>
          (
          <year>2014</year>
          )
          <fpage>343</fpage>
          -
          <lpage>469</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Marchesin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Purpura</surname>
          </string-name>
          , G. Silvello,
          <article-title>Focal elements of neural information retrieval models. An outlook through a reproducibility study</article-title>
          ,
          <source>Inf. Process. Manag</source>
          .
          <volume>57</volume>
          (
          <year>2020</year>
          )
          <fpage>102109</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>W.</given-names>
            <surname>Hersh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Buckley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. J.</given-names>
            <surname>Leone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hickam</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ohsumed:</surname>
          </string-name>
          <article-title>An interactive retrieval evaluation and new large test collection for research</article-title>
          ,
          <source>in: Proc. of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval</source>
          . Dublin, Ireland,
          <fpage>3</fpage>
          -
          <issue>6</issue>
          <year>July 1994</year>
          , Springer London, London,
          <year>1994</year>
          , pp.
          <fpage>192</fpage>
          -
          <lpage>201</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Voorhees</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bedrick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Demner-Fushman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. R.</given-names>
            <surname>Hersh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Soborof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Wang</surname>
          </string-name>
          , TREC-COVID:
          <article-title>Constructing a Pandemic Information Retrieval Test Collection</article-title>
          ,
          <source>SIGIR Forum 54</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Metzler</surname>
          </string-name>
          , T. Strohman, Search Engines: Information Retrieval in Practice, Addison-Wesley, Reading (MA), USA,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Robertson</surname>
          </string-name>
          , U. Zaragoza,
          <article-title>The Probabilistic Relevance Framework: BM25 and Beyond, Found</article-title>
          . Trnd. Inf. Retr.
          <volume>3</volume>
          (
          <year>2009</year>
          )
          <fpage>333</fpage>
          -
          <lpage>389</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhai</surname>
          </string-name>
          ,
          <article-title>Statistical Language Models for Information Retrieval. A Critical Review</article-title>
          ,
          <source>Found. Trnd. Inf. Retr</source>
          .
          <volume>2</volume>
          (
          <year>2008</year>
          )
          <fpage>137</fpage>
          -
          <lpage>213</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Amati</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. J. van Rijsbergen</surname>
          </string-name>
          ,
          <article-title>Probabilistic Models of Information Retrieval based on measuring the Divergence From Randomness</article-title>
          ,
          <source>ACM Trans. Inf. Syst</source>
          <volume>20</volume>
          (
          <year>2002</year>
          )
          <fpage>357</fpage>
          -
          <lpage>389</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15] İ. . Kocaba ş, B. T.
          <article-title>Din ç er, B. Karao ğ lan, A nonparametric term weighting method for information retrieval based on measuring the divergence from independence</article-title>
          ,
          <source>Information Retrieval</source>
          <volume>17</volume>
          (
          <year>2014</year>
          )
          <fpage>153</fpage>
          -
          <lpage>176</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          , Eficient Estimation of Word Representations in
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>