<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>USTB at INEX2014: Social Book Search Track</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bo-Wen Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xu-Cheng Yin</string-name>
          <email>xuchengyin@ustb.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiao-Ping Cui</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jiao Qu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bin Geng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fang Zhou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hong-Wei Hao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Technology, University of Science and Technology Beijing (USTB)</institution>
          ,
          <addr-line>Beijing 100083</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <fpage>536</fpage>
      <lpage>542</lpage>
      <abstract>
        <p>In this paper, we describe our participation in the INEX 2014 Social Book Search(SBS) Track Suggestion Task. We investigate the contribution of user-generated data and construct a social book search system based on several important techniques. We perform re-ranking on Galago searching results on enriched XML index by 11 di erent strategies and combine the results with learning to rank. We nd that 1) enriched index improves the e ectiveness, 2) tag is the best-performed social feature on re-ranking and 3) Random Forest shows the best performance on combining in this case.</p>
      </abstract>
      <kwd-group>
        <kwd>XML retrieval</kwd>
        <kwd>social re-ranking</kwd>
        <kwd>semantic search</kwd>
        <kwd>learning to rank</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <p>As we can referred to [2], there are several elds in documents XML shown
meaningful numeric information which cannot be understood by searching
engine, such as &lt;tag count="3"&gt; ction&lt;/tag&gt; and &lt;dewey&gt;519&lt;/dewey&gt;.
According to the method from Bogers, we expand and enrich the documents XML
with replacing the numeric information with textual information.</p>
      <sec id="sec-2-1">
        <title>2.2 Indexing and Searching</title>
        <p>
          Galago 1 is an open-source search engine. The probability of the query content
produced by language models are used to rank the documents. Based on the
assumption that the priori probabilities of documents are the same, documents
are ranked according to P (QjD), which is calculated by Equation (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ), where
fqi;D means the amount of times the word/phrase qi appear in document D.
With Dirichlet Smoothing, the probability estimate is calculated by Equation
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          ). In this way, documents are scored by Equation (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ).
        </p>
        <p>P (QjD) = Y p(qijD) = Yn fqi;D
n
i=1 i=1 jDj
P (QjD) = Y p(qijD) = Yn fqi;D +
n
i=1 i=1 jDj +
cqi
jCj
n n
log P (QjD) = log Y p(qijD) = X log
i=1 i=1
fqi;D +
jDj +
cqi
jCj
2.3</p>
      </sec>
      <sec id="sec-2-2">
        <title>Re-ranking and Combining</title>
        <p>These methods are inspired by Social Feature Re-ranking Method proposed by
Toine Bogers in 2012 [3]. In order to improve the initial ranking, we perform
re-ranking by 11 di erent strategies after analyzing the structure of XML:
TagRerank (T ), Item-Rerank (I), Deep-Rerank (D), Node-Rerank (N ),
RatingBayesRerank (B), RatingReview-Rerank (R), Tag-Node-Rerank (T N ), Item-Tag-Rerank
(IT ), Deep-Tag-Rerank (DT ), Item-Tag-Node-Rerank (IT N ),
Deep-Tag-NodeRerank (DT N ). These methods includes the following stages:</p>
        <p>
          1)Similarity Calculation. Features like I, D focus on the eld &lt;tag&gt; and
&lt;BrowseNode&gt;. For example , the tag vector of document i and j are !ti =
[1; 0; 0] and !tj = [0; 0; 2]. That means 1 user tag document i with tag 1 while 2
user tag document j with tag 3. In this way, we can build a feature matrix for
features like T ,N . The feature matrix of T N is the connection of two matrices.
Equation (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) is used to calculate the T , N , T N similarities of two documents.
        </p>
        <p>
          Features like I, D focus on the eld &lt;similar-product&gt;, the similarities of
two documents based on the feature I is calculated by the Equation (
          <xref ref-type="bibr" rid="ref5">5</xref>
          ).
! !
simij(f ) = cos &lt; !fi ; !fj &gt;= !fi !fj
jfi jjfj j
1 http://www.galagosearch.org/
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
simij (I) =
8&gt;1; i is j's similar product or
&lt;
        </p>
        <p>j is i's similar product
&gt;:0; else
Considering the asymmetry, the method D concerns similar products of similar
products. So the values of elements in similarity matrix is calculated by the
Equation 6[1].</p>
        <p>simij (D) =
81; simij (I) = 1 or
&gt;
&gt;
&gt;
&lt;</p>
        <p>9 k 6= i; k 6= j;
&gt; s:t: simik(I) = simjk(I) = 1:
&gt;
&gt;:0; else</p>
        <p>
          As we know similarity matrices SIM (I) and SIM (D) are sparse, so we use
the multi-feature like IT , DT , IT N , DT N to ll-in. For example, the similarity
based on feature IT is calculated by Equation (
          <xref ref-type="bibr" rid="ref7">7</xref>
          ). The other similarities are
calculated in the same way[1].
        </p>
        <p>simij (IT ) =
(1;</p>
        <p>
          simij (I) = 1:
simij (T ); else
2) Re-ranking. We re-rank the top 1000 list of initial ranking for the
abovementioned features by Equation (8). For feature R, we use Equation (9) [7] and
for B, we use Equation (10).
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
(
          <xref ref-type="bibr" rid="ref6">6</xref>
          )
(
          <xref ref-type="bibr" rid="ref7">7</xref>
          )
(8)
score0(i) =
score(i) + (
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
score(i)
        </p>
        <p>(10)
1 + BA(i)
1 + BAmax
where BA(i) is the Bayesian average rating of document i, which can be referred
to [6].</p>
        <p>3) Combining. We take Ranklib 2 as toolkit and use Coordinate Ascent,
Random Forest and Rank Net as training models to train the models to combine
features.
2 http://people.cs.umass.edu/~vdang/ranklib.html
where Ri is the set of all ratings given by users for the document i, and jreviews(i)j
is the number of reviews.</p>
        <p>Experiments on Re-ranking Models
In order to choose the most e ective feature and select the optimized parameter
, in the rst round, we train our re-ranking model on SBS2011-2012 and test
on SBS2013. The results are shown in Table 1.</p>
        <p>As we can see from the table, the feature T shows the best performance with
an improvement of 5.4%. All features shows improvements of di erent degree. So
we use all features to combine the results. The results of three chosen training
models are shown in Table 2. As can be seen from the table, Random Forest is</p>
        <sec id="sec-2-2-1">
          <title>Method</title>
        </sec>
        <sec id="sec-2-2-2">
          <title>Initial</title>
          <p>Tag
Item
Deep</p>
          <p>Node
RatingBayes
RatingReview</p>
          <p>Tag-Node
Item-Tag</p>
          <p>Deep-Tag
Item-Tag-Node
Deep-Tag-Node
the initial ranking result. Another one was the tag re-ranking result. Another
three were the di erent combining results based on three di erent
learning-torank training model. The other one was Similar Query Re-ranking method, which
is described in Run 6. Since the Random Forest was well-performed among all,
the results of Random Forest was used to re-rank in Similar Query Re-ranking
method. All runs used enriched new documents XML.</p>
          <p>Run 1(newXml.feedback)This run took Galago as toolkit and used pseudo
feedback to search.</p>
          <p>Run 2(newXml.rerank T)This run applied Re-ranking Model based on the
eld &lt;tag&gt;.</p>
          <p>Run 3(newXml.rerank all.L2R Coordinate)This run applied all Re-ranking
strategies and combining them by Coordinate Ascent method.</p>
          <p>Run 4(newXml.rerank all.L2R RandomForest)This run applied all Re-ranking
strategies and combining them by Random Forest method.</p>
          <p>Run 5(newXml.rerank all.L2R RankNet)This run applied all Re-ranking
strategies and combining them by Rank Net method.</p>
          <p>Run 6(SimQuery.rerank all.L2R RandomForest)This run rstly applied all
Re-ranking strategies and combining them by Random Forest method. As we
know, sometimes users search topics similar to topics which used to appear.
We bravely make a assumption that for two similar queries i,j, if document A
is useful to query i, it's useful for query j too. A weight is multiplied to the
normalization score of document D if 1) document A appears in the result list of
query i; 2) query i and previous query j are similar to each other according to the
calculation above; and 3) document D is useful for previous query j. The weight
w is calculated by Equation w = sim(qi; qj ) ssccoorree((qqij;;DD))) , where score(qj ; D) and
score(qj ; D) are the normalization scores of document D with query i and j.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>The runs submitted to the INEX 2014 Social Book Search track were evaluated
using graded relevance judgments. The relevance value were labeled manually
according to the behaviours of topic creators, for example, if creator adds book to
catalogue after it's suggested, the book is treated as highly relevant. A decision
tree was built to help the labeling 3. All runs were evaluated using NDCG@10,
MRR, MAP, R@1000 with NDCG@10 as the main metric. Table 4 shows the
o cial evaluation results.</p>
      <p>We see that, apart from Similar Query method, the best-performing run on
all 680 topics was run 4 with an NCDG@10 of 0.142. Run 4 used all re-ranking
models and combined them with Random Forest. Again we see that re-ranking
model does improve over the initial results by searching engine. Run 4, improves
over the initial ranking by about 10%. Run 6, from Similar Query method, have
a best-performance just because there are many similar query topics in SBS 2014
with previous years.
6</p>
    </sec>
    <sec id="sec-4">
      <title>Discussion &amp; Conclusion</title>
      <p>On both training and the testing set the best results are from combining all
reranking results in Random Forest. This shows a good use of social information
can improve the results of Social Book Search. The high evaluation value of
Similar Query method indicates the amount of similar topics not the e ectiveness
of the model. We fail to make use of the catalog of topic creators to improve the
results. It is worth discussing whether the information is useful or not.
3 https://inex.mmci.uni-saarland.de/tracks/books/INEX14_SBS_results.jsp#mapping</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bo-Wen</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Xu-Cheng Yin,
          <string-name>
            <surname>Xiao-Ping</surname>
            <given-names>Cui</given-names>
          </string-name>
          , Bin Geng, Jiao Qu,
          <string-name>
            <surname>Fang Zhou</surname>
            ,
            <given-names>Li</given-names>
          </string-name>
          <string-name>
            <surname>Song</surname>
          </string-name>
          and
          <string-name>
            <surname>Hong-Wei Hao</surname>
          </string-name>
          .
          <article-title>Social Book Search Reranking with Generalized ContentBased Filtering</article-title>
          . Submitted to CIKM'
          <volume>14</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>T.</given-names>
            <surname>Bogers</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Larsen</surname>
          </string-name>
          . Rslis at inex 2013:
          <article-title>Social book search track</article-title>
          .
          <source>In INEX'13 Workshop</source>
          Pre-proceedings. Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>T.</given-names>
            <surname>Bogers</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Larsen</surname>
          </string-name>
          . Rslis at inex 2012:
          <article-title>Social book search track</article-title>
          .
          <source>In INEX'12 Workshop</source>
          Pre-proceedings, pages
          <fpage>97</fpage>
          -
          <lpage>108</lpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kazai</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koolen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kamps</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doucet</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Landoni</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Overview of the INEX 2011 Book and Social Search Track</article-title>
          . In: INEX 2011 Workshop pre-proceedings.
          <source>INEX Working Notes Series</source>
          (
          <year>2011</year>
          )
          <fpage>1136</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>M.</given-names>
            <surname>Koolen</surname>
          </string-name>
          , G. Kazai,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Doucet</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Landoni</surname>
          </string-name>
          .
          <article-title>Overview of the inex 2012 books and social search track</article-title>
          .
          <source>In Focused Retrieval of Content and Structure</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>29</lpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Marijn</given-names>
            <surname>Koolen</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          .
          <article-title>Comparing topic representations for social book search</article-title>
          .
          <source>In INEX'13 Workshop</source>
          Pre-proceedings. Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>R. D. Ludovic</surname>
            Bonnefoy and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Bellot</surname>
          </string-name>
          .
          <article-title>Do social information help book search</article-title>
          ? In INEX'12 Workshop Pre-proceedings, pages
          <fpage>109</fpage>
          -
          <lpage>113</lpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>