<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Com, Tokyo, Japan, August</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>ery Segmentation via RNNs Encoder-Decoder Framework</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yiu-Chang Lin</string-name>
          <email>yiuchang.lin@rakuten.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Di Fabbrizio</string-name>
          <email>difabbrizio@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ankur Datta</string-name>
          <email>ankur.datta@rakuten.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Rakuten Institute of Technology</institution>
          ,
          <addr-line>Boston, Massachusetts - USA 02110</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <volume>5</volume>
      <abstract>
        <p>Query segmentation is the task of segmenting a Web query into adjacent phrases, typically keywords that form noun phrases or concepts that are relevant to the search task. In this paper, we describe a research study and some preliminary experiment results for query segmentation via a Recurrent Neural Network encoder-decoder framework on a public benchmark dataset (Webis-QSeC-10). The resulting segmented queries can be used for several downstream tasks such as improving the performance of relevance ranking in search, better understanding of the query intent, and suggesting queries for auto-completion.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>Query segmentation aims to detect semantically consistent phrases
that identify entities and concepts in Web search queries e.g., "[air
conditioner][remote control]", "[compact][microwave oven]",
and "[iphone 7][cover]". Such phrases are the semantic structural
units of a search task and can be exploited by search engines as
indivisible units in order to improve retrieval precision or
reformulate phrase-level query. It is often the case that short text as in Web
queries do not follow grammar rules hence traditional methods
based on well-formed English are not applicable.</p>
      <p>Query segmentation is one of the most important tasks toward
query understanding, a key component of modern search engines
for precisely inferring the users’ intent through queries since query
segments can be further re ned into named-entities and semantic
relations linking head-phrases with modi ers.</p>
      <p>
        Both supervised and unsupervised learning techniques have
been used to solve the query segmentation task in the past. In the
supervised learning category, Support Vector Machines ranker [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
was used to learn a structured classi er that makes a segmentation
decision (yes or no) between each pair of continuous tokens [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
Another well-known model that has been successfully applied to
a variety of sequence labeling task is Conditional Random Fields
(CRFs) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. CRFs model the conditional probability distribution
over a label sequence given an input query where each token in the
query is assigned to a label from the possible values of a pre-de ned
label sets [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. However, such supervised methods require a huge
amount of human segmentation labels which are usually expensive
to obtain and, furthermore, careful feature engineering plays an
important role in achieving high segmentation accuracy.
      </p>
      <p>
        On the other hand, in the unsupervised learning family, several
methods have been proposed to either automatically collect
segmented queries or train segmentation models from query log data.
For example, in the e-commerce domain, query terms are aligned
to product attribute terms via user’s click data and the ambiguities
are resolved using frequency and similarity statistics [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
Statistical methods based on point-wise mutual information (PMI) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ],
n-gram frequency [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], or Multi-Word Expression probability [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]
are also popular. One unsupervised approach using generative
language models and Wikipedia as external resource has been reported
to have competitive performance [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. Another unsupervised
probabilistic model was proposed to exploit user click-throughs for query
segmentation and the model parameters were estimated by e cient
expectation—maximization (EM) algorithm [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        Recently, Deep Neural Networks (DNNs) models have shown its
powerful capability to achieve excellent performance on various
di cult Natural Language Processing learning tasks. Especially
in end-to-end sequence learning tasks, the Encoder-Decoder
network [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] that makes minimal assumptions on the sequence
structure is widely used in machine translation [
        <xref ref-type="bibr" rid="ref1 ref4 ref5">1, 4, 5</xref>
        ]. In this paper,
we propose to treat query segmentation as a machine translation
task and apply the Encoder-Decoder framework to generate query
segments. Preliminary results on the Webis-QSeC-10 1 dataset are
reported.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>DATA</title>
      <p>
        The Webis Query Segmentation Corpus (Webis-QSeC-10) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
consists of 53,437 web queries and each query has at least 10
segmentations provided by 10 di erent annotators crowdsourced via
Amazon's Mechanical Turk (AMT). A sample of 4,850 queries is
published as the training set and the remaining 48,587 queries serve as
the testing set, with a 1:9 train/test split ratio. The Webis-QSeC-10
is sampled from the subset of the AOL query log [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] which consists
of only queries with length from 3 to 10 words. Since 1-word queries
cannot be segmented anymore and 2-word queries are typically
handled well by proximity features, queries with just 1 or 2 word
are excluded. The sampling maintains the query length distribution
and the query frequency distribution of the entire AOL query log.
1https://www.uni-weimar.de/de/medien/professuren/medieninformatik/webis/
corpora/webis-qsec-10/
An example query with its segmentations from the training set is
shown below, where 1004073900 is the unique query id followed
by a list of vote and segmentation pairs indicating the 10 di erent
decisions the AMT workers made for that query.
      </p>
      <p>1004073900
(5, ’graffiti fontsjalphabet’),
(3, ’graffitijfontsjalphabet’),
(2, ’graffiti fonts alphabet’)</p>
      <p>
        Since each query is segmented by at least 10 annotators and not
all of them always agree with each other, to select the reference
annotation, we apply the break fusion strategy described in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
The underlying idea is that annotators should at least agree on
speci c important segments even if there is no absolute majority
on the entire query segmentation. Break fusion simply follows the
majority of annotators at each single break position of a query. A
break is inserted in case of a tie vote. Considering the following
example annotation,
5 graffiti fontsjalphabet
3 graffitijfontsjalphabet
2 graffiti fonts alphabet
at the rst break position (between gra ti and fonts), 7 (5+2)
annotators agree with no break. Similarly, 8 (5+3) annotators agree with
inserting a break at the second break position (between fonts and
alphabet). Therefore the nal reference is
      </p>
      <p>graffiti fontsjalphabet
3</p>
    </sec>
    <sec id="sec-3">
      <title>METHODS</title>
      <p>
        In this section, we describe one baseline method [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and two models,
Conditional Random Fields (CRFs) and Recurrent Neural Networks
encoder-decoder framework, which are used in this paper for the
query segmentation experiment.
3.1
      </p>
      <p>
        Wikipedia Titles and Strict Noun Phrases
Baseline
This baseline method is simply treating only Wikipedia titles and
strict noun phrases as query segments. If the query contains more
than one overlapping Wikipedia title, the decision rule proposed
in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is used, which basically assigns each title a score based on
the frequencies in the Google n-gram corpus and multiplied by its
length. For strict noun phrases, similarly, the multiplication of their
Web frequencies and length is assigned as the score. Finally, the
segmentation with the highest score is chosen.
3.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Conditional Random Fields</title>
      <p>Conditional Random Fields have been widely used in NLP
structured prediction tasks, especially sequence labeling such as
partof-speech (POS) tagging and named-entity recognition (NER).
Formally, let the input sequence x = x1; x2; :::; xn and label sequence
y = y1; y2; :::; yn , we want to model the conditional distribution
P (yjx) so that the optimal label sequence can be predicted by
solving y = argmaxP (yjx). The probabilistic model for sequence CRFs
y
de nes a family of conditional probability P (yjx; λ ) over all possible
label sequences y given x with the following form:
P (yjx; λ ) =
exp Pin=1 Pj λj fj (yi 1; yi ; x; i )</p>
      <p>Z (x)
n
Z (x) = X X X λj fj (yi 1; yi ; x; i )</p>
      <p>y2Y i=1 j
where λ is the model parameters, fj is the feature function and the
numerator of P (yjx; λ ) is composed of potential functions. λ can
be obtained by maximizing the logarithm of the likelihood of the
training data with L1 or L2 regularization terms,</p>
      <p>L (λ) = X logP (yjx; λ )</p>
      <p>i</p>
      <p>In order to apply CRFs to the query segmentation task, we
introduce the standard Begin, Inside, Outside (BIO) tagging schema
to maps a segmented query to a sequence of tags. Table 1 shows
some example queries from Webis-QSeC-10 training set with their
corresponding BIO tags.</p>
      <p>segmented query BIO tagging
gra ti fonts j alphabet gra ti (B) fonts (I) alphabet (B)
stainless steel j chest freezers stainless (B) steel (I) chest (B) freezers (I)
rutgers j online j graduate classes rutgers (B) online (B) graduate (B) classes (I)
review j on j breezes review (B) on (B) breezes (B)
Table 1: Example queries from Webis-QSeC-10 training set
and their corresponding BIO tags.
3.3</p>
    </sec>
    <sec id="sec-5">
      <title>Recurrent Neural Networks</title>
      <p>
        The fundamental idea of Recurrent Neural Networks is that the
network contains a feed-back connection as shown in the left part
of Figure 1, so that it can make use of sequential information. RNNs
perform the same task for every element in a sequence x, with the
output o being dependent on the computations from the previous
state s. This characteristic enables the networks to do sequence
processing and learn sequential structure information. Theoretically,
RNNs are capable of capturing arbitrarily long distance
dependencies, but in practice, they are limited to looking back only a few
steps, known as the gradient vanishing/exploding problem [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>The right part of Figure 1 shows a typical RNN and its forward
computation structure after being unfolded into a full network
within the sequence window t 1, t , and t + 1. Assume that the
input sequence x is a sentence consisting of n words, x1; x2; :::; xn .
xt is the input token at position t and it can be represented as a
typical one-hot vector or a word embedding of dimension d. st , the
corresponding hidden state or "memory", is calculated based on
the previous hidden state and the input at the current step. In this
case, we would like to predict the next word given x1; x2; :::; xt 1
so ot would be a vector of probabilities across the vocabulary. The
following equations explicitly explain the computation of RNNs.
st = f (U xt + W st 1)
ot = so f tmax (V st )
where the function f is a nonlinearity mapping such as tanh or
ReLU. U , V and W are matrices (model parameters) and can be
optimized through back propagation. Usually, s 1, which is required
to calculate the rst hidden state, is initialized to a zero vector.
Encoder
RNN
V
s
U
o
x</p>
      <p>W
unfold</p>
      <p>W</p>
      <p>W</p>
      <p>W</p>
      <p>W
V
st-1
U
ot-1
xt-1</p>
      <p>V
st
U
ot
xt</p>
      <p>V
st+1
U
ot+1
xt+1
Figure 2 shows a typical encoder decoder framework, a model
consisting of two separate RNNs called the encoder and the decoder.
The encoder reads an input sequence one item at a time, and
outputs a vector at each step (ignored in Figure 2). The nal output of
the encoder serves as the context vector and the decoder uses this
context vector to generate a sequence of outputs. In the context of
machine translation, the encoder rst processes a variable-length
input word sequence from the source language and builds a
xedlength vector representation (context vector). Conditioned on this
encoded representation, the decoder produces a variable-length
word sequence in the target language. In an ideal case, the context
vector can be considered as the meaning of the sequence in latent
semantic space, and this idea can be extended beyond sequences.
For example, in image captioning tasks, the encoder decoder
framework takes the image as input and produces a text description as
output. In the reverse direction, image generation tasks take a text
description as input and output a generated image.</p>
      <p>To t the query segmentation task into encoder decoder
framework, we treat the original query as an input sequence from one
language and the segmented query as an output sequence from the
other language. The vocabulary size is therefore the same for both
languages except that the target language has one additional break
token, i.e.,</p>
      <p>V ocabt ar get = V ocabsour ce + f”j”g</p>
      <p>In practice, the queries and their segmentations combined are
treated as a parallel corpus for training. In testing phase, the encoder
rst calculates the context vector and then generates output tokens
one at a time from V ocabt ar get .
4</p>
    </sec>
    <sec id="sec-6">
      <title>EXPERIMENTS</title>
      <p>
        The Webis-QSeC-10 corpus [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] comprises 53,437 web queries and
each of them has at least 10 segmentations. The reference
segmentation is obtained as described in Section 2. There are 4,850 queries
in the training set and 48,587 queries in the testing set. To quantify
the segmentation result of di erent algorithms, we adopt query
context
graffiti
fonts
      </p>
      <p>alphabet
Decoder
RNN
graffiti
fonts</p>
      <p>
        alphabet
level and break level accuracy [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] as the evaluation matrices. At
query level, given a query q, its reference segmentation S and the
output segmentation S 0 from the model, the query accuracy is 1 if
S 0 = S and 0 otherwise. At break level, a decision whether a break
needs to be inserted is made for every two consecutive words in the
query. The break accuracy is de ned as the ratio of correct decisions
over all break positions in q with respect to S 0. Theoretically, there
exists 2k 1 valid segmentations for each q, and (k22 k ) potential
segments that contain at least two keywords from q.
4.1
      </p>
    </sec>
    <sec id="sec-7">
      <title>Model Parameters</title>
      <p>
        In our experiment, we use CRFsuite 2 [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] for optimizing the CRF
model parameters and the following set of word uni-gram and
bi-gram features are utilized:
For RNN encoder decoder, the following loss function, optimizer
and parameters are used:
uni-gram: x 2, x 1, x , x1, x 2
bi-gram: x 1x , xx1
      </p>
      <sec id="sec-7-1">
        <title>Word representation: 1-hot vector</title>
        <p>RNN hidden layer size: 1024
RNN number of layers: 2
RNN activation function: tanh
Loss function: Negative log likelihood loss
Optimizer: Adam optimizer
Learning rate: 0.0001
Dropout rate: 0.05</p>
        <p>Epochs: 50,000
4.2</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>RNNs Encoder-Decoder Loss</title>
      <p>
        Parameters optimization is obtained by Adam optimizer with
negative log likelihood as the loss function. Adam optimizer (Adaptive
Moment Estimation) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is an algorithm for rst-order
gradientbased optimization of stochastic objective functions through
computing adaptive learning rates for each parameter. Adam keeps an
exponentially decaying average of both past gradients and squared
gradients. The loss function value on the training set is recorded
every 200 epochs and it shows that the training loss decreases steadily
with the number of epochs and eventually converges at the end
(Figure 3).
      </p>
      <sec id="sec-8-1">
        <title>2http://www.chokkan.org/software/crfsuite/</title>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>4.3 Results</title>
      <p>Table 2 shows query-level and break-level accuracy of Wikipedia
titles (WT), Wikipedia titles + strict noun phrases (WT+SNP),
Conditional Random Fields and RNN encoder-decoder. WT+SNP has
the best accuracy among the four methods at both levels. CRF
performs better than WT baseline in terms of query level and break
level accuracy. The RNN encoder-decoder framework, however, in
this case does not perform as expected as it does in other tasks such
as machine translation and image captioning.</p>
    </sec>
    <sec id="sec-10">
      <title>5 DISCUSSION</title>
      <p>The rst two methods (WT and WT+SNP) in Table 2 are
unsupervised but require external knowledge resource, e.g., Wikipedia titles,
Google n-gram frequencies and Web n-gram frequencies. On the
other hand, both CRFs and RNNs encoder-decoder are supervised
machine learning methods relying on human annotation. Since
the training set only consists of 4,850 annotated queries, which is
1=9 the size of testing set in Webis-QSeC-10, supervised methods
cannot bene t from a large amount of training data. In addition
to the small size of training set, short-query length is also another
key factor that limits the power of RNNs encoder-decoder in query
segmentation. Web queries are typically short and less structured
compared to standard sentences in machine translation corpus.
Therefore, RNNs’ remarkable capacity of capturing long-distance
dependency is not that e ective in this task. Although CRFs
outperforms RNNs encoder-decoder, one disadvantage of CRFs is that
it requires human-designed features as opposed to RNNs which
require no feature engineering.
6 CONCLUSION AND FUTURE WORK
Query segmentation is crucial for a search engine to better
understand query intent and return higher quality search results. This
paper provides a study on tting query segmentation task into a
RNN encoder-decoder framework and describes preliminary
experimental results compared with other baselines. The RNNs does not
perform as expected due to the lack of training data and the short
nature of query length. However, three feasible future directions
might be helpful for improving RNNs encoder decoder framework
on query segmentation.</p>
      <p>
        The rst direction is to automatically collect a large amount of
segmented queries via user implicit feedback from query logs as
proposed in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. This will solve the challenge of little training
data mentioned in Section 5. Another direction is to replace the
RNN units in the encoder decoder framework with GRUs [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] or
LSTMs [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and add an attention mechanism [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] at the encoder,
giving the decoder a way to "pay attention" to di erent parts of
the input while decoding. Finally, substituting pre-trained word
embedding for the current one-hot word vector will both reduce
the input dimension and provide the network with richer word
representation.
      </p>
    </sec>
    <sec id="sec-11">
      <title>ACKNOWLEDGMENTS</title>
      <p>The authors would like to thank Dr. Martin Potthast from
BauhausUniversität Weimar for kindly providing the full Webis Query
Segmentation Corpus used for our modeling experiments. The authors
would also like to thank the anonymous reviewers for their feedback
and helpful advice.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Dzmitry</given-names>
            <surname>Bahdanau</surname>
          </string-name>
          , Kyunghyun Cho, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Neural machine translation by jointly learning to align and translate</article-title>
          .
          <source>arXiv preprint arXiv:1409.0473</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          , Patrice Simard, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Frasconi</surname>
          </string-name>
          .
          <year>1994</year>
          .
          <article-title>Learning long-term dependencies with gradient descent is di cult</article-title>
          .
          <source>IEEE transactions on neural networks 5</source>
          ,
          <issue>2</issue>
          (
          <year>1994</year>
          ),
          <fpage>157</fpage>
          -
          <lpage>166</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Shane</given-names>
            <surname>Bergsma</surname>
          </string-name>
          and Qin Iris Wang.
          <year>2007</year>
          .
          <article-title>Learning Noun Phrase Query Segmentation.</article-title>
          .
          <source>In EMNLP-CoNLL</source>
          , Vol.
          <volume>7</volume>
          .
          <fpage>819</fpage>
          -
          <lpage>826</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Kyunghyun</given-names>
            <surname>Cho</surname>
          </string-name>
          , Bart Van Merriënboer,
          <string-name>
            <surname>Dzmitry Bahdanau</surname>
            , and
            <given-names>Yoshua</given-names>
          </string-name>
          <string-name>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>On the properties of neural machine translation: Encoder-decoder approaches</article-title>
          .
          <source>arXiv preprint arXiv:1409.1259</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Kyunghyun</given-names>
            <surname>Cho</surname>
          </string-name>
          , Bart Van Merriënboer,
          <string-name>
            <surname>Caglar Gulcehre</surname>
            , Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and
            <given-names>Yoshua</given-names>
          </string-name>
          <string-name>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Learning phrase representations using RNN encoder-decoder for statistical machine translation</article-title>
          .
          <source>arXiv preprint arXiv:1406.1078</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Junyoung</given-names>
            <surname>Chung</surname>
          </string-name>
          , Caglar Gulcehre, KyungHyun Cho, and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Empirical evaluation of gated recurrent neural networks on sequence modeling</article-title>
          .
          <source>arXiv preprint arXiv:1412.3555</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Matthias</given-names>
            <surname>Hagen</surname>
          </string-name>
          , Martin Potthast, Anna Beyer, and
          <string-name>
            <given-names>Benno</given-names>
            <surname>Stein</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Towards optimum query segmentation: in doubt without</article-title>
          .
          <source>In Proceedings of the 21st ACM international conference on Information and knowledge management. ACM</source>
          ,
          <volume>1015</volume>
          -
          <fpage>1024</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Matthias</given-names>
            <surname>Hagen</surname>
          </string-name>
          , Martin Potthast, Benno Stein, and
          <string-name>
            <given-names>Christof</given-names>
            <surname>Bräutigam</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Query segmentation revisited</article-title>
          .
          <source>In Proceedings of the 20th international conference on World wide web. ACM</source>
          ,
          <volume>97</volume>
          -
          <fpage>106</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Sepp</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jürgen</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural computation 9</source>
          ,
          <issue>8</issue>
          (
          <year>1997</year>
          ),
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Thorsten</given-names>
            <surname>Joachims</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Optimizing search engines using clickthrough data</article-title>
          .
          <source>In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM</source>
          ,
          <volume>133</volume>
          -
          <fpage>142</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Diederik</given-names>
            <surname>Kingma</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jimmy</given-names>
            <surname>Ba</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Adam: A method for stochastic optimization</article-title>
          .
          <source>arXiv preprint arXiv:1412.6980</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Julia</surname>
            <given-names>Kiseleva</given-names>
          </string-name>
          , Qi Guo, Eugene Agichtein, Daniel Billsus, and
          <string-name>
            <given-names>Wei</given-names>
            <surname>Chai</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Unsupervised query segmentation using click data: preliminary results</article-title>
          .
          <source>In Proceedings of the 19th international conference on World wide web. ACM</source>
          ,
          <volume>1131</volume>
          -
          <fpage>1132</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Giridhar</given-names>
            <surname>Kumaran and Vitor R Carvalho</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Reducing long queries using query quality predictors</article-title>
          .
          <source>In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. ACM</source>
          ,
          <volume>564</volume>
          -
          <fpage>571</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14] John La erty,
          <string-name>
            <surname>Andrew</surname>
            <given-names>McCallum</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Fernando</given-names>
            <surname>Pereira</surname>
          </string-name>
          ,
          <article-title>and others. Conditional random elds: Probabilistic models for segmenting and labeling sequence data</article-title>
          .
          <source>In Proceedings of the eighteenth international conference on machine learning, ICML</source>
          , Vol.
          <volume>1</volume>
          .
          <fpage>282</fpage>
          -
          <lpage>289</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Yanen</given-names>
            <surname>Li</surname>
          </string-name>
          , Bo-Jun Paul Hsu, ChengXiang Zhai, and
          <string-name>
            <given-names>Kuansan</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Unsupervised query segmentation using clickthrough for information retrieval</article-title>
          .
          <source>In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM</source>
          ,
          <volume>285</volume>
          -
          <fpage>294</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Nikita</surname>
            <given-names>Mishra</given-names>
          </string-name>
          , Rishiraj Saha Roy, Niloy Ganguly, Srivatsan Laxman, and
          <string-name>
            <given-names>Monojit</given-names>
            <surname>Choudhury</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Unsupervised query segmentation using only query logs</article-title>
          .
          <source>In Proceedings of the 20th international conference companion on World wide web. ACM</source>
          ,
          <volume>91</volume>
          -
          <fpage>92</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Naoaki</given-names>
            <surname>Okazaki</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>CRFsuite: a fast implementation of Conditional Random Fields (CRFs)</article-title>
          . (
          <year>2007</year>
          ). http://www.chokkan.org/software/crfsuite/
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Nish</surname>
            <given-names>Parikh</given-names>
          </string-name>
          , Prasad Sriram, and Mohammad Al Hasan.
          <year>2013</year>
          .
          <article-title>On segmentation of ecommerce queries</article-title>
          .
          <source>In Proceedings of the 22nd ACM international conference on Conference on information &amp; knowledge management. ACM</source>
          ,
          <volume>1137</volume>
          -
          <fpage>1146</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Greg</surname>
            <given-names>Pass</given-names>
          </string-name>
          , Abdur Chowdhury, and
          <string-name>
            <given-names>Cayley</given-names>
            <surname>Torgeson</surname>
          </string-name>
          .
          <article-title>A picture of search.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Ilya</surname>
            <given-names>Sutskever</given-names>
          </string-name>
          , Oriol Vinyals, and
          <string-name>
            <surname>Quoc</surname>
            <given-names>V</given-names>
          </string-name>
          <string-name>
            <surname>Le</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Sequence to sequence learning with neural networks</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          .
          <volume>3104</volume>
          -
          <fpage>3112</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Bin</given-names>
            <surname>Tan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Fuchun</given-names>
            <surname>Peng</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Unsupervised query segmentation using generative language models and wikipedia</article-title>
          .
          <source>In Proceedings of the 17th international conference on World Wide Web. ACM</source>
          ,
          <volume>347</volume>
          -
          <fpage>356</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Xiaohui</given-names>
            <surname>Yu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Huxia</given-names>
            <surname>Shi</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Query segmentation using conditional random elds</article-title>
          .
          <source>In Proceedings of the First International Workshop on Keyword Search on Structured Data. ACM</source>
          ,
          <volume>21</volume>
          -
          <fpage>26</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>