<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Integrating Social Features and Query Type Recognition in the Suggestion Track of CLEF 2015 Social Book Search Lab</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shih-Hung Wu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yi-Hsiang Hsieh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liang-Pu Chen</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tsun Ku</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chaoyang University of Technology</institution>
          ,
          <country country="TW">Taiwan, R.O.C</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Contact author)</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute for Information Industry</institution>
          ,
          <country country="TW">Taiwan, R.O.C</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Social Book Search (SBS) Lab is part of CLEF 2015 lab series. This is the third time that the CYUT CSIE team attends the SBS track. Based on a full-text search engine, we build a social feature re-ranking system and introduce more knowledge on understanding the queries. We defined a set of rules to filtering out unnecessary books from the recommendation list. The official run results show that the system performance is improved from our previous system.</p>
      </abstract>
      <kwd-group>
        <kwd>Query type recognition</kwd>
        <kwd>social features</kwd>
        <kwd>social book search</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The paper reports our system in the suggestion track of CLEF 2015 Social Book
Suggestion (SBS) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. This is the third time that we attend the SBS track since 2013
INEX [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Based on our social feature re-ranking system [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we improve our system
by involving some knowledge on understanding the queries.
      </p>
      <p>We believe that the result of traditional information retrieval technology is not
enough for the users who need more personal recommendation in the SBS task.
Recommendation from other users are more appealing; it might contain more personal
feelings and cover more subtle reasons that traditional information retrieval system
cannot cover. Our system integrates the social feature into the traditional information
retrieval technology to give better recommendation on books. In this task,
usergenerated metadata is used as the social feature.</p>
      <p>According to our observation on the topics in the previous INEX SBS Track, we
found that queries can be separated into different types. Simply treating the keywords
in the topic as search terms will not get good results. Some queries require higher
level of knowledge to deal with. System needs to understand the information need
behind the keyword, for example, the knowledge on the types of literature. We
analysis the topics and find several types in them. Due to the time limitation, we only
implement a module to recognize one special type of topics and a filtering module to
modify the recommendation result.</p>
      <p>The structure of this paper is as follows. Section 2 is the data set description,
section 3 shows our architecture and the details of our method, section 4 is the
experiment results, and final section gives conclusions and future works.</p>
      <p>
        The document collection in this task is provided by the CLEF 2015 Social Book
Suggestion track. The documents are the XML format metadata of about 2.8 million
books and the data size is 25.9GB. These documents are collected from Amazon.com
and LibraryThing [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The XML tags used in the data set is listed in Table 1.
2
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Dataset</title>
      <p>Collection
book
dimensions
reviews
editorialreviews
images
creators
blurbers
dedications
epigraphs
firstwords
lastwords
quotations
series
awards
browseNodes
characters
places
subjects
2.2</p>
      <sec id="sec-2-1">
        <title>Test Topic</title>
        <p>Topics provided by CLEF 2015 Social Book Suggestion track are collected from
LibraryThing. A topic describes the information needed for a user. Figure 1 and Figure 2
give partial view of an example, the XML tags used are：&lt;topic id&gt;, &lt;title&gt;,
&lt;mediated_query&gt;, &lt;group&gt;, &lt;narrative&gt;, &lt;catalog&gt;, &lt;book&gt;, &lt;LT_id&gt;, &lt;entry_date&gt;, and
&lt;rating&gt;. Where title means the title of a post on LibraryThing forum and narrative is
the content of the post. While mediated_query is added as an interpretation of the
query. Group means the user group in the forum of the user who post this query.
&lt;topics&gt;
&lt;topic id="1196"&gt;
&lt;title&gt;The Best Peace Corps Novel&lt;/title&gt;
&lt;mediated_query&gt;books about work for Peace Corps &lt;/mediate
d_query&gt;
&lt;group&gt;Returned Peace Corps Volunteer Readers&lt;/group&gt;
&lt;narrative&gt; I'm looking for people's concept of what is
the best novel for the Peace Corps Volunteer - pre, during, o
r post service. This could be a novel that typifies life in th
e country of service. It could be a novel that typifies the wo
rk volunteers do. It could be a novel that makes for the perfe
ct reading while in service. Anything will do, just give rea
sons. It might lead other PCVs/RPCVs to interesting reading.
Let's try novels, and then head into non-fiction later... I'l
l start: I could not have survived my 2 years of service if I
had not read Chingiz Aitmitov's The Day Lasts More than A Hun
dred Years and Bulgakov's The Master and Margarita . They re
ally made most of my concerns about my own sanity living in th
e crumbling remnants of Soviet Central Asia vanish into vapor,
as I was able to learn that not only was surreality the norm
for this part of the world but also my own preconceptions abou
t the concrete, rational world that I thought I knew might be
questionable. &lt;/narrative&gt;</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>CYUT CSIE System Methodology</title>
      <sec id="sec-3-1">
        <title>System Architecture</title>
        <p>The index and search engine in use is the Lucene system, which is an open source full
text search engine provided by Apache software foundation. Lucene is written in
JAVA and can be called easily by JAVA program to build various applications.</p>
        <p>
          Table 1 shows all the tags of the book metadata. According to Bogers and Larsen
[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], there are 19 tags more useful in the social book search. They are &lt;isbn&gt;, &lt;title&gt;,
&lt;publisher&gt;, &lt;editorial&gt;, &lt;creator&gt;, &lt;series&gt;, &lt;award&gt;, &lt;character&gt;, &lt;place&gt;,
&lt;blurber&gt;, &lt;epigraph&gt;, &lt;firstwords&gt;, &lt;lastwords&gt;, &lt;quotation&gt;, &lt;dewey&gt;,
&lt;subject&gt;, &lt;browseNode&gt;, &lt;review&gt;, and &lt;tag&gt;. Our system also focuses on the same 19
tags.
        </p>
        <p>
          In the pre-processing step, the content in the &lt;dewey&gt; tag is restored to strings
according to the 2003 list of Dewey category descriptions [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] to make string matching
easier. For example: &lt;dewey&gt;004&lt;/dewey&gt; will be restored to &lt;dewey&gt;Data
processing Computer science&lt;/dewey&gt;. The content of &lt;tag&gt; is also expanded according
to the count number to emphasize its importance. For example: &lt;tag
count="3"&gt;fantasy&lt;/tag&gt; will be expanded as &lt;tag&gt;fantasy fantasy fantasy&lt;/tag&gt;. In
additional to the 19 tags, our system also indexes the content of &lt;review&gt; as
independent indexes files and names it as reviews.
        </p>
        <p>
          Fig.1 and 2 shows all the XML tags of the query topics. According to Koolen et al.
[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], an Indri [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] based system using all the contents of &lt;Title&gt;, &lt;Query&gt;, &lt;Group&gt;,
and &lt;Narrative&gt; as query terms will give better result. We also use the contents of the
four tags as our system input queries.
        </p>
        <sec id="sec-3-1-1">
          <title>Stemming</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>Query</title>
          <p>Re-Ranking</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>Results</title>
          <p>Stop words
filtering</p>
        </sec>
        <sec id="sec-3-1-4">
          <title>Content-based</title>
        </sec>
        <sec id="sec-3-1-5">
          <title>Retrieval</title>
          <p>Yes</p>
        </sec>
        <sec id="sec-3-1-6">
          <title>Search Engine</title>
        </sec>
        <sec id="sec-3-1-7">
          <title>Filtering</title>
          <p>Document
Collection</p>
        </sec>
        <sec id="sec-3-1-8">
          <title>Indexing</title>
        </sec>
        <sec id="sec-3-1-9">
          <title>Type2</title>
          <p>No</p>
        </sec>
        <sec id="sec-3-1-10">
          <title>Search Engine</title>
          <p>
            According to our observation on the topics in INEX 2012 SBS Track, we find that
there are some queries that are different from others, we call them the Type2 queries
[
            <xref ref-type="bibr" rid="ref11">11</xref>
            ]. Type2 queries are the queries that contain the names of some books that the
original users want to find similar ones. Therefore, the books in the topics should not be
part of the recommendation. Since the book names are given explicitly, our system
originally will find exactly the same books as the top recommendation. To recognize
type2 queries, we define a list of phrases to identify such queries and filter out the
books in the queries from the recommendation lists. The phrases are listed in the
appendix in the previous paper [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ]. Figure 4 gives an example of Type2 queries taken
from INEX 2013 SBS topics, in which contains a key phrase “I’m reading”. We find
that there are 174 queries in the INEX 2013 SBS track that can be classified as Type2
queries. Therefore, we add a module in our system to identify the Type2 queries and
filtering out the books mentioned in the topics.
The Re-ranking part is similar to that in our previous work [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ]. We integrate the
usergenerated metadata into the traditional content-based search result by re-ranking the
results. The social features are provided by the amazon users, and our system use
them to give more weight on certain books. Three numbers are available:
 User rating: users might evaluate a book from 1 to 5, the higher the better.
 Helpful vote: other users might endorse one comment by voting it as helpful.
 Total vote: the total number of helpful or not.
          </p>
          <p>We designed 3 different ways to use these social features in re-ranking.
1) User rating method</p>
          <p>Increase the weight of content-based retrieval result by adding the summation of
user rating. As shown in formula (1):
Scorere−ranked(i) = α ∗ Scoreorg(i) + (1 − α) ∗ Scoreuser rating(i)
(1)
2) Average User rating method</p>
          <p>Increase the weight of content-based retrieval result by adding the average of
user rating. As shown in formula (2):
Scorere−ranked(i) = Scoreorg(i) + Scoreaverage user rating(i)
3) Weights User rating method</p>
          <p>Increase the weight of content-based retrieval result by adding the book which
gets more helpful votes. As shown in formula (3) and (4):
Since there is no theoretical reference on how to set the α value, in our official runs,
the value is selected via a series experiments that we conduct on the 2013 dataset.
Table 2 shows the results, we find that the system gets the best result when α is 0.95.
In the official evaluation, we sent four runs. We use four fields in the topics as query
terms, and we filter out some book candidates for all the type2 queries. The
configuration of each run is as follows.
 Run 1, the CSIE - 0.95AverageType2QTGN, re-ranking with Average User</p>
          <p>Rating.
 Run 2, the CSIE - Type2QTGN: without re-ranking.
 Run 3, the CSIE - 0.95RatingType2QTGN, re-ranking with User Rating.
 Run 4, CSIE - 0.95WRType2QTGN, Re-ranking with Weights User Rating.</p>
          <p>According to Table 2, the parameterα is 0.95 for best result in the runs with
reranking.</p>
          <p>
            Table 3 shows the official evaluation results of our four runs. Among them the
CSIE - 0.95AverageType2QTGN run gives the best NDCG@10 [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] result, while the
CSIE - Type2QTGN run gives similar result on NDCG@10 but give better result on
MAP and R@1000. The other two runs give poorer results might due to technical
errors. Comparing to the 2013 INEX SBS results in Table 5, our system performance
improved significantly. However, comparing to the result of INEX SBS 2014 in Table
4, our system performance decreased.
This paper reports our system and result in CLEF 2015 Social Book Suggestion track.
We sent four runs and the formal run results are list in Table 3. In the four runs, the
CSIE - 0.95AverageType2QTGN run gives best nDCG@10, which is searching with
content-based search engine, applying a set of filtering rules based on a list of key
phrase and re-ranking with Average User Rating. In the future, we will implement
more modules with literature knowledge on the writers, genre of books, geometric
categories of the publishers, and temporal categories of the authors that can deal with
the special cases in the topics.
          </p>
          <p>From this year, user profiles are available, which can be used to give better
recommendation. A system might use the user profiles to expand the queries or to suggest
more books that the user read before for other similar users. Outside resources might
also be used to expand the queries. For example, a system might check Wikipedia to
find more authors of the books in the same genre, and make better recommendation.
Books that won some awards might also be a good list for recommendation.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgement References</title>
      <p>“This study is conducted under the "Online and Offline integrated Smart Commerce
Platform(2/4)" of the Institute for Information Industry which is subsidized by the
Ministry of Economy Affairs of the Republic of China .</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Wei-Lun</surname>
            <given-names>Xiao</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shih-Hung</surname>
            <given-names>Wu</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liang-Pu</surname>
            <given-names>Chen</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hung-Sheng Chiu</surname>
          </string-name>
          , and
          <string-name>
            <surname>Ren-Dar</surname>
            <given-names>Yang</given-names>
          </string-name>
          , “
          <article-title>Social Feature Re-ranking in INEX 2013 Social Book Search Track”</article-title>
          ,
          <source>CLEF 2013 Evaluation Labs and Workshop Online Working Notes</source>
          ,
          <fpage>23</fpage>
          -
          <lpage>26</lpage>
          September, Valencia, Spain.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Marijn</given-names>
            <surname>Koolen</surname>
          </string-name>
          , Gabriella Kazai, Jaap Kamps, Michael Preminger, Antoine Doucet, and Monica Landoni, “
          <article-title>Overview of the INEX 2012 Social Book Search Track”</article-title>
          ,
          <source>INEX'12 Workshop</source>
          Pre-proceedings,P.77-P.
          <year>96</year>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Toine</given-names>
            <surname>Bogers</surname>
          </string-name>
          and Birger Larsen, “RSLIS at INEX 2012:
          <article-title>Social Book Search Track”</article-title>
          ,
          <source>INEX'12 Workshop</source>
          Pre-proceedings,P.97-P.
          <year>108</year>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Marijn</given-names>
            <surname>Koolen</surname>
          </string-name>
          , Hugo Huurdeman and Jaap Kamps, “
          <article-title>Comparing Topic Representations for Social Book Search”, CLEF 2013 Evaluation Labs</article-title>
          and Workshop Online Working Notes,
          <fpage>23</fpage>
          -
          <lpage>26</lpage>
          September, Valencia - Spain.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>T.</given-names>
            <surname>Strohman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Metzler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Turtle</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          , “
          <article-title>Indri: a language-model based search engine for complex queries”</article-title>
          ,
          <source>In Proceedings of the International Conference on Intelligent Analysis</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>6. Lucene, https://lucene.apache.org</mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Marijn</given-names>
            <surname>Koolen</surname>
          </string-name>
          , Gabriella Kazai,
          <string-name>
            <given-names>Michael</given-names>
            <surname>Preminger</surname>
          </string-name>
          , and Antoine Doucet, “
          <article-title>Overview of the INEX 2013 Social Book Search Track”</article-title>
          ,
          <source>CLEF 2013 Evaluation Labs and Workshop Online Working Notes</source>
          ,
          <fpage>23</fpage>
          -
          <lpage>26</lpage>
          September, Valencia - Spain.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Järvelin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kekäläinen</surname>
          </string-name>
          , “J.:
          <article-title>Cumulated Gain-based Evaluation of IR Techniques”</article-title>
          ,
          <source>ACM Transactions on Information Systems</source>
          <volume>20</volume>
          (
          <issue>4</issue>
          ) (
          <year>2002</year>
          )
          <fpage>422</fpage>
          -
          <lpage>446</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>9. 2003 list of Dewey category descriptions</article-title>
          , https://www.library.illininois.edu/ugl/about/dewey.html
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <article-title>CLEF 2015 Social Book Search Track</article-title>
          , http://social-booksearch.humanities.uva.nl/#/suggestion
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Shih-Hung</surname>
            <given-names>Wu</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pei-Kai</surname>
            <given-names>Liao</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hua-Wei</surname>
            <given-names>Lin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li-Jen</surname>
            <given-names>Hsu</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wei-Lun</surname>
            <given-names>Xiao</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liang-Pu</surname>
            <given-names>Chen</given-names>
          </string-name>
          , Tsun Ku, and
          <article-title>Gwo-Dong Chen Query Type Recognition and Result Filtering in INEX 2014 Social Book Search Track</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Marijn</surname>
            <given-names>Koolen</given-names>
          </string-name>
          , Toine Bogers, Gabriella Kazai,
          <article-title>Jaap Kamps, and MichaelPreminger Overview of the INEX 2014 Social Book Search Track</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>