<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the INEX 2012 Social Book Search Track</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marijn Koolen</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gabriella Kazai</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaap Kamps</string-name>
          <email>P@10</email>
          <email>kampsg@uva.nl</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Preminger</string-name>
          <email>michaelp@hioa.no</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antoine Doucet</string-name>
          <email>doucet@info.unicaen.fr</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Monica Landoni</string-name>
          <email>monica.landoni@unisi.ch</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Microsoft Research</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Oslo and Akershus University College of Applied Sciences</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Amsterdam</institution>
          ,
          <country country="NL">Netherlands</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Caen</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Lugano</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The goal of the INEX 2012 Social Book Search Track is to evaluate approaches for supporting users in reading, searching, and navigating book metadata and full texts of digitised books as well as associated user-generated content. The investigation is focused around two tasks: 1) the Social Book Search task investigates the complex nature relevance in book search and the role of user information and traditional and user-generated book metadata for retrieval, 2) the Prove It task evaluates focused retrieval approaches for searching pages in books that support or refute a given factual claim. There are two additional tasks that did not run this year. The Structure Extraction task tests automatic techniques for deriving structure from OCR and layout information, and the Active Reading Task aims to explore suitable user interfaces for eBooks enabling reading, annotation, review, and summary across multiple books. We report on the setup and the results of the two search tasks.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Prompted by the availability of large collections of digitised books, e.g., the
Million Book project6 and the Google Books Library project,7 the Social Book
Search Track8 was launched in 2007 with the aim to promote research into
techniques for supporting users in searching, navigating and reading book metadata
and full texts of digitised books. Toward this goal, the track provides
opportunities to explore research questions around ve areas:
6 http://www.ulib.org/
7 http://books.google.com/
8 Previously known as the Book Track (2007{2010) and the Books and Social Search
Track (2011).
{ Evaluation methodologies for book search tasks that combine aspects of
retrieval and recommendation,
{ Information retrieval techniques for dealing with professional and user-generated
metadata,
{ Information retrieval techniques for searching collections of digitised books,
{ Mechanisms to increase accessibility to the contents of digitised books, and
{ Users' interactions with eBooks and collections of digitised books.</p>
      <p>Based around these main themes, the following four tasks were de ned:
1. The Social Book Search (SBS) task, framed within the scenario of a user
searching a large online book catalogue for a given topic of interest, aims
at exploring techniques to deal with both complex information needs of
searchers|which go beyond topical relevance and can include aspects such
as genre, recency, engagement, interestingness, quality and how well-written
it is|and complex information sources including user pro les and personal
catalogues, and book descriptions containing both professional metadata and
user-generated content.
2. The Prove It (PI) task aims to test focused retrieval approaches on
collections of books, where users expect to be pointed directly at relevant book
parts that may help to con rm or refute a factual claim;
3. The Structure Extraction (SE) task aims at evaluating automatic techniques
for deriving structure from OCR and building hyperlinked table of contents;
4. The Active Reading task (ART) aims to explore suitable user interfaces to
read, annotate, review, and summarize multiple books.</p>
      <p>In this paper, we report on the setup and the results of each of the two search
tasks, SBS and PI, at INEX 2012. First, in Section 2, we give a brief summary of
the participating organisations. The SBS task is described in detail in Section 3,
and the PI task in Section 4. We close in Section 5 with a summary and plans
for INEX 2013.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Participating Organisations</title>
      <p>A total of 55 organisations registered for the track (compared with 47 in 2011,
82 in 2010, 84 in 2009, 54 in 2008, and 27 in 2007). At the time of writing, we
counted 5 active groups (compared with 10 in 2011 and 2010, 16 in 2009, 15 in
2008, and 9 in 2007), see Table 1.9
3</p>
    </sec>
    <sec id="sec-3">
      <title>The Social Book Search Task</title>
      <p>
        The goal of the Social Book Search (SBS) task is to evaluate the value of
professional metadata and user-generated content for book search on the web. Through
social media have extended book descriptions far beyond what is traditionally
9 SE is biennial and will occur again in 2013.
stored in professional catalogues. Not only are books described in the users' own
vocabulary, but are also reviewed and discussed online, and added to personal
catalogues of individual readers. This additional information is subjective and
personal, and allows users to search for books in di erent ways. Traditional
descriptions have formal and subject access points for identi cation, known-item
search and subject search. Yet readers use many more aspects of books to help
them decide which book to read next [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], such as how engaging, fun, educational
or well-written a book is. This results in a search task that requires a di erent
model than traditional ad hoc search [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>The SBS task investigates book requests and suggestions from the
LibraryThing discussion forums as a way to model book search in a social environment.
The discussions in these forums show that readers frequently turn to others to
get recommendations and tap into the collective knowledge of a group of readers
interested in the same topics.</p>
      <p>
        As a source book descriptions, the INEX Amazon/LibraryThing collection [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
is used, which contains 2.8 million book descriptions from Amazon, enriched with
content from LibraryThing. This collection contains both professional metadata
and user-generated content. An additional goal of the SBS task is to evaluate the
relative value of controlled book metadata, such as classi cation labels, subject
headings and controlled keywords, versus user-generated or social metadata, such
as tags, ratings and reviews, for retrieving the most relevant books for a given
user request.
      </p>
      <p>The SBS task aims to address the following research questions:
{ Can we build reliable and reusable test collections for social book search
based on book requests and suggestions from the LibraryThing discussion
forums?
{ Can we simulate book suggestions with judgements from Mechanical Turk?
{ Can user-dependent evidence improve retrieval performance for social book
search.
{ Can personal, a ective aspects of book search relevance be captured by
systems that incorporate user-generated content and user pro les?
{ What is the relative value of social and controlled book metadata for book
search?
3.1</p>
      <sec id="sec-3-1">
        <title>Scenario</title>
        <p>The scenario is that of a user turning to Amazon Books and LibraryThing to
search for books they want to read, buy or add to their personal catalogue. Both
services host large collaborative book catalogues that may be used to locate
books of interest.</p>
        <p>On LibraryThing, users can catalogue the books they read, manually index
them by assigning tags, and write reviews for others to read. Users can also post
messages on a discussion forum asking for help in nding new, fun, interesting,
or relevant books to read. The forums allow users to tap into the collective
bibliographic knowledge of hundreds of thousands of book enthusiasts. On Amazon,
users can read and write book reviews and browse to similar books based on
links such as \customers who bought this book also bought... ".</p>
        <p>Users can search online book collections with di erent intentions. They can
search for speci c books of which they know all the relevant details with the
intention to obtain them (buy, download, print). In other cases, they search for
a speci c book of which they do not know those details, with the intention of
identifying that book and nd certain information about it. Another possibility
is that they are not looking for a speci c book, but hope to discover one or more
books meeting some criteria. These criteria can be related to subject, author,
genre, edition, work, series or some other aspect, but also more serendipitously,
such as books that merely look interesting or fun to read.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Task description</title>
        <p>Although book metadata can often be used for browsing, this task assumes a
user issues a query to a retrieval system, which returns a (ranked) list of book
records as results. This query can be a number of keywords, but also one or more
book records as positive or negative examples.</p>
        <p>We assume the user inspects the results list starting from the top and works
her way down until she has either satis ed her information need or gives up. The
retrieval system is expected to order results by relevance to the user's information
need.</p>
        <p>The SBS task is to reply to a user's request that has been posted on the
LibraryThing forums (see Section 3.5) by returning a list of recommended books.
The books must be selected from a corpus that consists a collection of book
metadata extracted from Amazon Books and LibraryThing, extended with
associated records from library catalogues of the Library of Congress and the
British Library (see the next section). The collection includes both curated and
social metadata. User requests vary from asking for books on a particular genre,
looking for books on a particular topic or period or books by a given author.
The level of detail also varies, from a brief statement to detailed descriptions
of what the user is looking for. Some requests include examples of the kinds of
books that are sought by the user, asking for similar books. Other requests list
examples of known books that are related to the topic but are speci cally of no
interest. The challenge is to develop a retrieval method that can cope with such
diverse requests. Participants of the SB task are provided with a set of book
search requests and are asked to submit the results returned by their systems as
ranked lists.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Submissions</title>
        <p>We want to evaluate the book ranking of retrieval systems, speci cally the top
ranks. We adopt the submission format of TREC, with a separate line for each
retrieval result, consisting of six columns:
1. topic id: the topic number, which is based on the LibraryThing forum thread
number.
2. Q0: the query number. Unused, so should always be Q0.
3. isbn: the ISBN of the book, which corresponds to the le name of the book
description.
4. rank: the rank at which the document is retrieved.
5. rsv: retrieval status value, in the form of a score. For evaluation, results are
ordered by descending score.
6. run id: a code to identify the participating group and the run.</p>
        <p>Participants are allowed to submit up to six runs, of which at least one should
use only the title eld of the topic statements (the topic format is described in
Section 3.5). For the other ve runs, participants could use any eld in the topic
statement.
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Data</title>
        <p>
          To study the relative value of social and controlled metadata for book search, we
need a large collection of book records that contains controlled subject headings
and classi cation codes as well as social descriptions such as tags and reviews,
for a set of books that is representative of what readers are searching for. We use
the Amazon/LibraryThing corpus crawled by the University of Duisburg-Essen
for the INEX Interactive Track [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>The collection consists of 2.8 million book records from Amazon, extended
with social metadata from LibraryThing. This set represents the books available
through Amazon. These records contain title information as well as a Dewey
Decimal Classi cation (DDC) code and category and subject information supplied
by Amazon. From a sample of Amazon records we noticed the subject descriptors
to be noisy, with many inappropriately assigned descriptors that seem unrelated
to the books to which they have been assigned.</p>
        <p>The Amazon/LibraryThing collection has a limited amount of professional
metadata. Only 61% of the books have a DDC code and the Amazon subjects
are noisy with many seemingly unrelated subject headings assigned to books. To
make sure there is enough high-quality metadata from traditional library
catalogues, we extended the data set with library catalogue records from the Library
of Congress and the British Library. We only use library records of ISBNs that
are already in the collection. These records contain formal metadata such as
classi cation codes (mainly DDC and LCC) and rich subject headings based on
the Library of Congress Subject Headings (LCSH).10 Both the LoC records and
the BL records are in MARCXML11 format. We obtained MARCXML records
for 1.76 million books in the collection. There are 1,248,816 records from the
Library of Congress and 1,158,070 records in MARC format from the British
Library. Combined, there are 2,406,886 records covering 1,823,998 of the ISBNs in
the Amazon/LibraryThing collection (66%). Although there is no single library
catalogue that covers all books available on Amazon, we think these combined
library catalogues can improve both the quality and quantity of professional
book metadata.</p>
        <p>Each book is identi ed by ISBN. Since di erent editions of the same work
have di erent ISBNs, there can be multiple records for a single intellectual
work. The corpus consists of a collection of 2.8 million records from Amazon
Books and LibraryThing.com. See
https://inex.mmci.uni-saarland.de/data/ndagreements.jsp for information on how to get access to this collection. Each book
record is an XML le with elds like &lt;isbn&gt;, &lt;title&gt;, &lt;author&gt;, &lt;publisher&gt;,
&lt;dimensions&gt;, &lt;numberofpage&gt; and &lt;publicationdate&gt;. Curated metadata comes in
the form of a Dewey Decimal Classi cation in the &lt;dewey&gt; eld, Amazon subject
headings are stored in the &lt;subject&gt; eld, and Amazon category labels can be
found in the &lt;browseNode&gt; elds. The social metadata from Amazon and
LibraryThing is stored in the &lt;tag&gt;, &lt;rating&gt;, and &lt;review&gt; elds. The full list of
elds is shown in Table 2.</p>
        <p>How many of the book records have curated metadata? There is a DDC code
for 61% of the descriptions and 57% of the collection has at least one subject
heading. The classi cation codes and subject headings cover the majority of
records in the collection.</p>
        <p>More than 1.2 million descriptions (43%) have at least one review and 82%
of the collection has at least one LibraryThing tag.
3.5</p>
      </sec>
      <sec id="sec-3-5">
        <title>Information needs</title>
        <p>LibraryThing users discuss their books in the discussion forums. Many of the
topic threads are started with a request from a member for interesting, fun
new books to read. They describe what they are looking for, give examples of
what they like and do not like, indicate which books they already know and ask
other members for recommendations. Other members often reply with links to
works catalogued on LibraryThing, which have direct links to the corresponding
records on Amazon. These requests for recommendation are natural expressions
10 For more information see: http://www.loc.gov/aba/cataloging/subject/
11 MARCXML is an XML version of the well-known MARC format. See: http://www.
loc.gov/standards/marcxml/
book similarproducts title imagecategory
dimensions tags edition name
reviews isbn dewey role
editorialreviews ean creator blurber
images binding review dedication
creators label rating epigraph
blurbers listprice authorid rstwordsitem
dedications manufacturer totalvotes lastwordsitem
epigraphs numberofpages helpfulvotes quotation</p>
        <p>rstwords publisher date seriesitem
lastwords height summary award
quotations width editorialreview browseNode
series length content character
awards weight source place
browseNodes readinglevel image subject
characters releasedate imageCategories similarproduct
places publicationdate url tag
subjects studio data
of information needs for a large collection of online book records. We use a
selection of these forum topics to evaluate systems participating in the SBS
task.</p>
        <p>
          Each topic has a title and is associated with a group on the discussion
forums. For instance, topic 99309 in Figure 1 has title Politics of Multiculturalism
Recommendations? and was posted in the group Political Philosophy. The books
suggested by members in the thread are collected in a list on the side of the topic
thread (see Figure 1). A technique called touchstone can be used by members
to easily identify books they mention in the topic thread, giving other readers
of the thread direct access to a book record on LibraryThing, with associated
ISBNs and links to Amazon. We use these suggested books as initial relevance
judgements for evaluation. In the rest of this paper, we use the term suggestion
for books identi ed in the Touchstone lists in forum topics. Since all suggestions
are made by forum members, we assume they are valuable judgements for the
relevance of books. We rst describe the topic selection procedure and then how
we used LibraryThing user pro les to assign relevance values to the suggestions.
Topic selection Topic selection was done the same as last year (author?) [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
We crawled close to 60,000 topic threads and selected threads where at least
one book is suggested and the rst message contains a book request. First, we
identi ed topics where the topic title re ects the information need expressed in
it. For this we used the topic titles as queries and ran them against a full-text
index of the A/LT collection. We consider a title to be a decent re ection of
the information need if the full-text index found it least one suggestion in the
top 1000 results. This left 6510 topics. Next, we used short regular expressions
to select messages containing any of a list of phrases like looking for, suggest,
recommend. From this set we randomly selected topics and manually selected
those topics where the initial message contains an actual request for book
recommendations, until we had 89 new topics. We also labeled each selected topic
with topic type (requests for books related to subject, author, genre, edition etc.)
and genre information (Fiction, Non- ction or both).
        </p>
        <p>We included the 211 topics from the 2011 Social Search for Best Books task
and adjusted them to the simpler topic format of this year. All genre labels were
changed to either Fiction or Non- ction. The label Literature was changed to
Fiction and all other labels were changed to Non- ction. Speci city labels and
examples were removed.</p>
        <p>To illustrate how we marked up the topics, we show topic 99309 from Figure 1
as an example:
&lt;topic id="99309"&gt;
&lt;title&gt;Politics of Multiculturalism&lt;/title&gt;
&lt;group&gt;Political Philosophy&lt;/group&gt;
&lt;narrative&gt;I'm new, and would appreciate any recommended reading on the
politics of multiculturalism. Parekh's Rethinking Multiculturalism:
Cultural Diversity and Political Theory (which I just finished) in the
end left me unconvinced, though I did find much of value I thought he
depended way too much on being able to talk out the details later. It
may be that I found his writing style really irritating so adopted a
defiant skepticism, but still... Anyway, I've read Sen, Rawls,
Habermas, and Nussbaum, still don't feel like I've wrapped my little
brain around the issue very well and would appreciate any suggestions
for further anyone might offer.
&lt;/narrative&gt;
&lt;type&gt;subject&lt;/type&gt;
&lt;genre&gt;non-fiction&lt;/genre&gt;
&lt;/topic&gt;</p>
        <p>We think this set represents a broad range of book information needs. We
note that the titles and messages of the topic threads may be di erent from what
these users would submit as queries to a book search system such as Amazon,
LibraryThing, the Library of Congress or the British Library. Our topic selection
method is an attempt to identify topics where the topic title describes the
information need. Like last year, we ask the participants to generate queries from the
title and initial message of each topic. In the future, we could approach the topic
creators on LibraryThing and ask them to supply queries or set up a
crowdsourcing task where participants have to search the Amazon/LibraryThing collection
for relevant books based on the topic narrative, and we pool the queries they
type, and provide the most common query to INEX participants.
User pro les and personal catalogues We can distinguish di erent relevance
signals in these suggestions if we compare them against the books that the
topic creator added to her personal catalogue before (pre-catalogued) or after
(post-catalogued) starting the topic. We obtained user pro les for each of the
topic creators of the topics selected for evaluation and distributed these to the
participants. Each pro le contains a list of all the books a user has in her personal
catalogue, with per book the date on which it was added, and the tags the user
assigned to the book. The pro les were crawled at least 4 months after the topic
threads were crawled. We assume that within this time frame all topic creators
had enough time to decide which suggestions to catalogue.</p>
        <p>Catalogued suggestions The list of books suggested for a topic can be split
into three subsets. The subset of books that the topic creator had already
catalogued before starting the topic (Pre-catalogued suggestions, or Pre-CSs), the
subset of books that the topic creator catalogued after starting the topic
(Postcatalogued suggestions or Post-CSs) and the subset that the topic creator had
not catalogued at the time of crawling the pro les (Non-catalogued suggestions,
or Non-CSs).</p>
        <p>
          Members sometimes suggest books that the topic creator already has in her
catalogue. In this case, the suggestion is less valuable for the topic creator, but
still a sign that for the topic creator that the suggestion makes sense. Similarly, if
a topic creator does not catalogue a suggestion, before or after creating the topic,
we consider this a signal that the topic creator found the suggestion not valuable
enough. In both cases, the suggestion is still a valuable relevance judgement in
itself that goes beyond mere topical relevance [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. In contrast, when the topic
creator adds a suggestion to her catalogue after starting the topic (topic creation
is the rst signal that she has that particular information need), we assume the
suggestion is of great value to the topic creator.
        </p>
        <p>Self-supplied suggestions Some of the books in the Touchstone list are
suggestions by the topic creator herself. One reason for these suggestions could be
that the creator wants to let others know which books she already knows or
has read. Another reason could be that she discovered these books but
considered them not good enough for whatever reason. A third reason could be that
she discovered these books and wants the opinions of others to help her decide
whether it is good enough or not. Because it is hard to identify the reason for a
self-supplied suggestion, we consider these suggestions as not relevant, except for
the self-supplied suggestions the topic creator later added to her personal
catalogue. In this case, the post-cataloguing action is a signal that creator eventually
considered it good enough.</p>
      </sec>
      <sec id="sec-3-6">
        <title>Touchstone Suggestions as Judgements This year we used a topic set</title>
        <p>of 300 topics, including the 211 topics from last year and the 89 new topics.
We also provided user pro les of the topic creators as context for generating
recommendations. These pro les contain information on which books the user
has catalogued and on which the date.</p>
        <p>Because we want to focus on suggestions that the topic creator is most
interested in, we ltered the 300 topics and retained only those topics where the
creator added at least one of the suggested books to her personal catalogue on
or after the date she created the forum topic. This resulted in a subset of 96
topics, which is used for evaluation (Section 3.7). The next section describes our
method for generating relevance judgements.
3.6</p>
      </sec>
      <sec id="sec-3-7">
        <title>From Suggestions to Relevance Judgements</title>
        <p>
          A system presenting a user with book suggested on the forum ful ls the library
objective of helping users to nd or locate relevant items, whereas a system
presenting the user with books she will add to her catalogue, we argue that it
ful ls the library objective of helping her choose which of the relevant items
to obtain [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Based on this correspondence to library cataloguing objective, we
assign higher relevance values to books that are post-catalogued than to other
suggestions.
        </p>
        <p>We use the following terminology:
Creator The topic creator, who has the information need and formulated the
request.</p>
        <p>Suggestion Suggestions are books mentioned in the messages of the topic
thread, and that are identi ed via Touchstone.</p>
        <p>Suggestor The forum member who rst suggests a book. The thread is parsed
from rst to last message, and the rst member to mention the book is
considered the suggestor. Note that this can be the topic creator. Perhaps
she suggests books because she wants others to comment on them or because
she wants to show she already knows about these books.</p>
        <p>Pre-catalogued Suggestion a suggestion that the creator catalogues before
starting the topic.</p>
        <p>Post-catalogued Suggestion a suggestion that the creator catalogues after
having started the topic.</p>
        <p>Non-catalogued Suggestion a suggestion that the creator did not catalogue
before or after having started the topic.</p>
        <p>To operationalise the suggestions as relevance judgements, we use di erent
relevance values (rv):
Highly relevant (rv=4) Post-catalogued Suggestions are considered the best
suggestions, regardless of who the suggestor is.</p>
        <p>Relevant (rv=1) Pre- and Non-catalogued suggestions where the suggestor
is not the creator. Suggestions from others that the creator already has are
good suggestions in general (perhaps not useful for the creator, but still
relevant to the request).</p>
        <p>Non-relevant (rv=0) Pre- and Non-catalogued suggestions that the creator
suggested herself, i.e., the suggestor is the creator. These are either books
the creator already has (pre-catalogued) or may be negative examples (I'm
not looking for books like this), or are mentioned for some other reason. The
creator already knows about these books.</p>
        <p>We use the recommended books for a topic as relevance judgements for
evaluation. Each book in the Touchstone list is considered relevant. How many books
are recommended to LT members requesting recommendations in the discussion
groups? Are other members compiling exhaustive lists of possibly interesting
books or do they only suggest a small number of the best available books?
Statistics on the number of books recommended for the Full set of 300 topics
and the PCS subset of 96 topics with post-catalogued suggestions are given in
Table 3.</p>
        <p>We rst compare the suggestions for the full topic set with those of 2011.
The two sets of suggestions are similar in terms of minimum, median and mean
number of suggestions per topic. The maximum has increased somewhat this
year. Split over genres we see that the Fiction topics tend to get more suggestions
than Non-Fiction topics. Topics where creators explicitly mention both ction
and non- ction recommendation are welcome|denoted Mix|are more similar
to Non-Fiction topics in terms of maximum and median number of suggestions,
but closer to Fiction topics in terms of mean number of suggestions.</p>
        <p>If we zoom in on the PCS topics, we see they have a larger number of
suggestions per topic than the Full set, with a mean (median) of 16.2 (9). Most of the
suggestions are not catalogued by the topic creator and made by others (RV1). In
most topics there is at least one book rst mentioned by the topic creator (RV0
has a median of 1), and only a small number suggestions are post-catalogued by
the creator (RV4 has a mean of 1.7 and median of 1). What does it mean that
the PCS topics get more suggestions than the other topics in the Full set? One
# topics # suggest.</p>
        <p>min. max. mdn. mean std. dev.
211
300
135
146
19
96
96
96
96
2377
3533
2143
1098
292
1558
194
1200
164
1
1
1
1
1
1
0
0
1
79
101
101
56
59
98
21
80
14
7
7
9
5
7
9
1
7
1
reason might be that with more suggestions, there is a larger a priori
probability that the topic creator will catalogue at least one of them. Another, related,
reason is that a larger number of suggestions means the list of relevant books
is more complete, which could make the topic creator more con dent that she
can make an informed choice. Yet another reason may be that PCS topics are
dominated by Fiction topics, which have more suggestions than the Non-Fiction
topics.</p>
        <p>In Table 4 we show the number of Fiction, Non-Fiction and Mix topics in the
Full and PCS topics sets. In the Full set, there are a few more Non-Fiction topics
(146, or 49%) than Fiction topics (135 or 45%), with only 19 (6%) Mix topics.
In the PCS set, this is the other way around, with 49 Fiction topics (51%), 36
Non- ction topics (38%) and 9 Mix topics (9%). This partly explains why the
PCS topics have more suggestions. Post-cataloguing tends to happen more often
in topic threads related to ction.</p>
        <p>Judgements from Mechanical Turk To get a better understanding of the
nature of book suggestions and book selection, we plan to gather rich
relevance judgements from Mechanical Turk that cover di erent aspects of relevance.
Workers will judge the relevance of books based on the book descriptions in the
collection and the topic statement from the LT forum. Instead of asking them
to judge the overall relevance of books, we plan to ask them to identify di erent
relevance aspects of the information need and to judge the books on each of
these aspects separately. Additionally, we ask them to identify which part of the
description (title, subject headings, reviews or tags) is useful to determine the
relevance of the book for each relevance aspect in the request. Of course, workers
are not able to judge books on the user-dependent (personal, a ective relevance
aspects) of the topic creator. For these aspect we would need judgements from
the topic creator herself. One possibility is to approach topic creator on the
forums or via private messages to they LT pro le.</p>
        <p>We are currently in the process of setting up the Mechanical Turk experiment
and hope to have results for the nal report in the o cial proceedings.
ISBNs and intellectual works Each record in the collection corresponds
to an ISBN, and each ISBN corresponds to a particular intellectual work. An
intellectual work can have di erent editions, each with their own ISBN. The
ISBN-to-work relation is a many-to-one relation. In many cases, we assume the
user is not interested in all the di erent editions, but in di erent intellectual
works. For evaluation we collapse multiple ISBN to a single work. The highest
ranked ISBN is evaluated and all lower ranked ISBNs of the same work ignored.
Although some of the topics on LibraryThing are requests to recommend a
particular edition of a work|in which case the distinction between di erent
ISBNs for the same work are important|we leave ignore these distinctions to
make evaluation easier. This turns edition-related topics into known-item topics.</p>
        <p>However, one problem remains. Mapping ISBNs of di erent editions to a
single work is not trivial. Di erent editions may have di erent titles and even
have di erent authors (some editions have a foreword by another author, or a
translator, while others have not), so detecting which ISBNs actually represent
the same work is a challenge. We solve this problem by using mappings made
by the collective work of LibraryThing members. LT members can indicate that
two books with di erent ISBNs are actually di erent manifestations of the same
intellectual work. Each intellectual work on LibraryThing has a unique work ID,
and the mappings from ISBNs to work IDs is made available by LibraryThing.12</p>
        <p>The mappings are not complete and might contain errors. Furthermore, the
mappings form a many-to-many relationship, as two people with the same edition
of a book might independently create a new book page, each with a unique work
ID. It takes time for members to discover such cases and merge the two work
IDs, which means that at any time, some ISBNs map to multiple work IDs even
though they represent the same intellectual work. LibraryThing can detect such
cases but, to avoid making mistakes, leaves it to members to merge them. The
12 See: http://www.librarything.com/feeds/thingISBN.xml.gz
p54.run2.all-topic- elds.all-doc- elds 0.3069 0.1492
p54.run3.all-topic- elds.QIT.alpha0.99 0.3066 0.1488
p4.inex2012SBS.xml social.fb.10.50 0.3616 0.1437
p62.B IT30 30 0.3410 0.1339
p4.inex2012SBS.xml social 0.3256 0.1297
p62.mrf-booklike 0.3584 0.1295
p54.run5.title.II.alpha0.94 0.2558 0.1173
p62.IOT30 0.2933 0.1141
p62.IT30 0.2999 0.1082
p54.run6.title.II.alpha0.97 0.2392 0.0958
p62.lcm-2 0.2149 0.0901
p100.sb g0 0.2394 0.0884
p54.run4.title.QIT.alpha0.65 0.1762 0.0875
p100.sb g ttl nar0 0.1581 0.0740
p54.run1.title.all-doc- elds 0.1341 0.0678
p100.sb 2xsh ttl nar0 0.0157 0.0057
p100.sb 2xsh0 0.0199 0.0042
0.1198 0.1527 0.5736
0.1198 0.1527 0.5736
0.1219 0.1494 0.5775
0.1260 0.1659 0.5130
0.1135 0.1476 0.5588
0.1250 0.1514 0.5242
0.1073 0.1289 0.4891
0.1240 0.1503 0.5864
0.1187 0.1426 0.5864
0.0823 0.0941 0.4891
0.0667 0.1026 0.5054
0.0844 0.1145 0.5524
0.0719 0.0949 0.4891
0.0594 0.0939 0.4634
0.0583 0.0729 0.4891
0.0021 0.0022 0.0393
0.0021 0.0020 0.0647
fraction of works with multiple ISBNs is small so we expect this problem to have
a negligible impact on evaluation.
3.7</p>
      </sec>
      <sec id="sec-3-8">
        <title>Evaluation</title>
        <p>This year four teams together submitted 17 runs. The Oslo and Akershus
University College of Applied Sciences (OAUCAS) submitted 4 runs, the Royal School
of Library and Information Science (RSLIS) submitted 6 runs, the University of
Amsterdam (UAms) submitted 2 runs and the LIA group of the University of
Avignon (LIA) submitted 5 runs.</p>
        <p>The o cial evaluation measure for this task is nDCG@10. It takes graded
relevance values into account and concentrates on the top retrieved results. The
set of PCS topics and corresponding suggestions form the o cial topics and
relevance judgements for this year's evaluation. The results are shown in Table 5.</p>
        <p>The best performing run is p54.run2.all-topic- elds.all-doc- elds by RSLIS,
which used all topic elds combined against an index containing all available
document elds.</p>
        <p>The best run by UAms is p4.inex2012SBS.xml social.fb.10.50, which uses
only the topic titles and ran against an index containing the title information
elds (title, author, edition, publisher, year) and the user-generated content elds
(tags, reviews and awards). Blind relevance feedback was applied using the top
50 terms from the top 10 initial retrieval results.</p>
        <p>The best run by LIA is p62.B IT30 30.</p>
        <p>The best run by OAUCAS is p100.sb g0.</p>
        <p>We note that the best run does not use any information from the user
proles. The best performing run that incorporates user pro le information is the
second best run, p54.run3.all-topic- elds.QIT.alpha0.99 by RSLIS. Like the best
performing run, it uses all topic elds against all document elds, but re-ranks
the results list based on the LT pro le of the topic creator. Retrieved books that
share a lot of tags associated books already present in the user's catalog are
regarded as a more appropriate match. The nal retrieval score is a linear
combination of the original content-based score and the cosine similarity between a
tag vector containing the tag counts from a user's personal catalog and the tag
vectors of the retrieved books.</p>
        <p>The run p4.inex2012SBS.xml social.fb.10.50 achieves the highest MRR score
(0.3616), which means that on average, it retrieves the rst relevant book at or
above rank 3. The nine best systems achieve a P@10 score just above 0.1, which
means on average they have one suggestion in the top 10 results. Most systems
are able to retrieve an average of around 50% of the suggestions in the top 1000
results.</p>
        <p>Note that the three highest scores for P@10 (0.1260, 0.1250 and 0.1240)
correspond with the 4th, 6th and 8th highest scores for nDCG@10. The highest
nDCG@10 score corresponds to the 5th highest P@10 score. This could mean
that top performing system is not better than the other systems at retrieving
suggestions in general, but that it is better at retrieving PCSs, which are the
most important suggestions. The top two runs have similar nDCG@10 scores
and the same P@10 scores and retrieve more PCSs in the top 10 (36 over all 96
topics) than the other runs, the best of which retrieves only 26 PCSs in the top
10, over all 96 topics. The full topic statement is a more e ective description of
the books that the topic creator will catalogue than the topic title alone.</p>
        <p>In sum, systems that incorporate user pro le information have so far not
been able to improve upon a plain text retrieval baseline. The best systems for
retrieving PCSs use the full topic statement.</p>
        <p>Recall that the evaluation is done on a subset of the Full set of 300 topics.
In Section 3.5 we found that the PCS topics have more suggestions per topic
than the rest of the topics in the Full set, and that the fraction of Fiction topics
is also higher in the PCS set. To what extent does this di erence in genre and
number of suggestions result in di erences in evaluation?</p>
        <p>We compare the system rankings of the o cial relevance judgements (PCS
topics with di erentiated relevance value, denoted PCS(RV0+1+4) with two
alternative sets. One based on the same topics but with all suggestions mapped to
relevance value rv = 1 (denoted PCS(RVflat) and the other is the set of
judgements for the Full set of 300 topics, where all suggestions were also all mapped
to relevance value rv = 1, denoted Full(RVflat. The PCS(RVflat set allows us
to see whether the di erentiation between suggestions a ects the ranking. The
comparison between PCS(RVflat and Full(RVflat can show whether the di erent
topic selection criteria lead to di erent system rankings.</p>
        <p>
          Table 6 shows the Kendall's Tau (column 2) and TauAP (column 3) ranking
correlations over the 18 o cial submissions for nDCG@10. The TauAP ranking
correlation puts more weight on the top-ranked systems [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], emphasising how
well the evaluations agree on ranking the best systems. The standard Kendall
Tau correlation is very strong (&gt; 0:9) between Full(RVflat and PCS(RVflat,
suggesting the topic selection plays little role. The correlation between PCS(RVflat
and PCS(RV0+1+4 is also very high, furthermore suggesting that the di
erentiation between suggestions has no impact on the ranking. However, the TauAP
correlations show that disagreement between Full(RVflat and PCS(RVflat is
bigger among the top ranked system than the on the lower scoring systems. The two
PCS sets have very strongly correlated system rankings. From this we conclude
that the di erentiation between suggestions in terms of relevance value has little
impact, but that the PCS topics are somewhat di erent in nature than the other
topics.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>The Prove It (PI) Task</title>
      <p>The goal of this task was to investigate the application of focused retrieval
approaches to a collection of digitised books. The scenario underlying this task
is that of a user searching for speci c information in a library of books that
can provide evidence to con rm or reject a given factual statement. Users are
assumed to view the ranked list of book parts, moving from the top of the list
down, examining each result. No browsing is considered (only the returned book
parts are viewed by users).</p>
      <p>Participants could submit up to 10 runs. Each run could contain, for each
of the 83 topics (see Section 4.2), a maximum of 1,000 book pages estimated
relevant to the given aspect, ordered by decreasing value of relevance.</p>
      <p>A total of 18 runs were submitted by 2 groups (6 runs by UMass Amhers
(ID=50) and 12 runs by Oslo University College (ID=100)), see Table 1.
4.1</p>
      <sec id="sec-4-1">
        <title>The Digitized Book Corpus</title>
        <p>The track builds on a collection of 50,239 out-of-copyright books13, digitised
by Microsoft. The corpus is made up of books of di erent genre, including
history books, biographies, literary studies, religious texts and teachings, reference
13 Also available from the Internet Archive (although in a di erent XML format)
works, encyclopaedias, essays, proceedings, novels, and poetry. 50,099 of the
books also come with an associated MAchine-Readable Cataloging (MARC)
record, which contains publication (author, title, etc.) and classi cation
information. Each book in the corpus is identi ed by a 16 character long bookID { the
name of the directory that contains the book's OCR le, e.g., A1CD363253B0F403.</p>
        <p>The OCR text of the books has been converted from the original DjVu
format to an XML format referred to as BookML, developed by Microsoft
Development Center Serbia. BookML provides additional structure information,
including markup for table of contents entries. The basic XML structure of a
typical book in BookML is a sequence of pages containing nested structures
of regions, sections, lines, and words, most of them with associated coordinate
information, de ning the position of a bounding rectangle ([coords]):
&lt;document&gt;
&lt;page pageNumber="1" label="PT CHAPTER" [coords] key="0" id="0"&gt;
&lt;region regionType="Text" [coords] key="0" id="0"&gt;
&lt;section label="SEC BODY" key="408" id="0"&gt;
&lt;line [coords] key="0" id="0"&gt;
&lt;word [coords] key="0" id="0" val="Moby"/&gt;
&lt;word [coords] key="1" id="1" val="Dick"/&gt;
&lt;/line&gt;
&lt;line [...]&gt;&lt;word [...] val="Melville"/&gt;[...]&lt;/line&gt;[...]
&lt;/section&gt; [...]
&lt;/region&gt; [...]
&lt;/page&gt; [...]
&lt;/document&gt;</p>
        <p>BookML provides a set of labels (as attributes) indicating structure
information in the full text of a book and additional marker elements for more complex
structures, such as a table of contents. For example, the rst label attribute
in the XML extract above signals the start of a new chapter on page 1
(label=\PT CHAPTER"). Other semantic units include headers (SEC HEADER),
footers (SEC FOOTER), back-of-book index (SEC INDEX), table of contents
(SEC TOC). Marker elements provide detailed markup, e.g., for table of
contents, indicating entry titles (TOC TITLE), and page numbers (TOC CH PN),
etc.</p>
        <p>The full corpus, totaling around 400GB, was made available on USB HDDs.
In addition, a reduced version (50GB, or 13GB compressed) was made available
for download. The reduced version was generated by removing the word tags
and propagating the values of the val attributes as text content into the parent
(i.e., line) elements.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Topics</title>
        <p>
          In recent years we have had a topic-base of 83 topics, 21 of which we have
collected relevance judgments for using crowdsourcing through the Amazon
Mechanical Turk infrastructure [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>The ambition this year has been two-fold:
{ To increase the number of topics
{ To further develop the relevance judgment method, so as to combat the e ect
of the statement complexity on the assessment consistency.</p>
        <p>For the second point above, we have been attempting to divide each topics
into its primitive aspects (a process we refer to as "aspectization"). To this
end we developed a simple web-application with a database back-end, to allow
anyone to aspectize topics. This resulted in 30 topics</p>
        <p>For each page being assessed for con rmation / refutation of a topic, the
assessor is presented with a user interface similar to Figure 2</p>
        <p>This means that we go from a discrete (con rms / refute / none ) assessment
to a graded assessment, where a page may e.g. be assessed by a certain as 33
percent con rming a topic, if one of three aspects is judged as con rmed by
him/her for that page.</p>
        <p>For the current assessment we have prepared 30 topics, for which the number
of aspects range from 1 (very simple statements) to 6 per topic with an average
of 2,83 aspects per topic.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Collected Relevance Assessments</title>
        <p>At the time of writing this years relevance assessment are still not collected yet.
4.4</p>
      </sec>
      <sec id="sec-4-4">
        <title>Evaluation Measures and Results</title>
        <p>Result publication is awaiting the conclusion of the relevance assessment process.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and plans</title>
      <p>This paper presents an overview of the INEX 2012 Social Book Search Track.
This year, the track ran two tasks: the Social Book Search task, and the Prove
It task.</p>
      <p>The Social Book Search (SBS) task changed focus from the relative value
of professional and user-generated metadata, to the complexity of book search
information needs.</p>
      <p>We extended our investigation into the nature of book requests and
suggestions from the LibraryThing forums as statements of information needs and
relevance judgements. By di erentiating between who the suggestor is and whether
the topic creator subsequently adds a suggestion to her catalogue or not (post
catalogued suggestions), we want to focus even more on the personal, a ective
aspects of relevance judgement in social book search. We operationalised this
by di erentiating in relevance values, giving higher values for post-catalogued
suggestions than for other suggestions.</p>
      <p>Our choice to focus on topics with post-catalogued suggestions (PCS topics)
resulted in a topic set that is slightly di erent from the topics we used last year,
where we ignored the personal catalogued of the topic creator and considered all
topics that have a book request, a descriptive title and at least one suggestion.
The PCS topics have more suggestions on average than other topics, and a larger
fraction of them is focused on ction books. This results in a di erence in system
ranking, which is mainly due to the di erent nature of the topics, and not in the
di erentiation of the relevance values.</p>
      <p>In addition to the topic statements extracted from the forum discussions,
we extracted user pro les of the topic creators, which contain full catalogue
information on which books they have in the personal catalogues, when each
book was added to the catalogue and which tags the user assigned to each
book. These pro les were distributed along with the topic statements, to allow
participants to build systems that incorporate both the topical description of
the information need and personal behaviour, preferences and interests of the
topic creators.</p>
      <p>The evaluation has shown that the most e ective systems incorporate the
full topic statement, which includes the title of the topic thread, the name of
the discussion group, and the full rst message that elaborates on the request.
However, the best system did not use any user pro le information. So far, the
best system is a plain full-text retrieval system.</p>
      <p>Next year, we continue with the task to further investigate the role of user
information. We also plan to enrich the relevance judgements with further
judgements on the relevance of books to speci c relevance aspects of the information
need. For this, we plan to use either Mechanical Turk or approach the topic
creators on LibraryThing to obtain more speci c judgements directly from the
person with the actual information need.</p>
      <p>This year the Prove It task has undergone some changes when it comes
to assessments. The number of participants for the PI task is still low, which
also puts some limitations on what we are able to do collaboratively, but based
on the changes introduced this year which will hopefully give us more useful
assessments, we hope to increase the number of participants, further vitalizing
the task.</p>
      <p>Acknowledgments We are very grateful to Justin van Wees for providing us with
the user pro les of the topic creators for this year's evaluation. This research
was supported by the Netherlands Organization for Scienti c Research (NWO
projects # 612.066.513, 639.072.601, and 640.005.001) and by the European
Communitys Seventh Framework Program (FP7 2007/2013, Grant Agreement
270404).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Beckers</surname>
          </string-name>
          , Norbert Fuhr, Nils Pharo, Ragnar Nordlie, and
          <article-title>Khairun Nisa Fachry. Overview and results of the inex 2009 interactive track</article-title>
          . In Mounia Lalmas, Joemon M. Jose, Andreas Rauber, Fabrizio Sebastiani, and Ingo Frommholz, editors,
          <source>ECDL</source>
          , volume
          <volume>6273</volume>
          of Lecture Notes in Computer Science, pages
          <volume>409</volume>
          {
          <fpage>412</fpage>
          . Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Gabriella</given-names>
            <surname>Kazai</surname>
          </string-name>
          , Jaap Kamps, Marijn Koolen, and
          <string-name>
            <surname>Natasa</surname>
          </string-name>
          Milic-Frayling.
          <article-title>Crowdsourcing for book search evaluation: Impact of hit design on comparative system ranking</article-title>
          .
          <source>In Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , pages
          <volume>205</volume>
          {
          <fpage>214</fpage>
          . ACM Press, New York NY,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Marijn</given-names>
            <surname>Koolen</surname>
          </string-name>
          , Jaap Kamps, and
          <string-name>
            <given-names>Gabriella</given-names>
            <surname>Kazai</surname>
          </string-name>
          .
          <article-title>Social Book Search: The Impact of Professional and User-Generated Content on Book Suggestions</article-title>
          .
          <source>In Proceedings of the International Conference on Information and Knowledge Management (CIKM</source>
          <year>2012</year>
          ). ACM,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Marijn</given-names>
            <surname>Koolen</surname>
          </string-name>
          , Gabriella Kazai, Jaap Kamps, Antoine Doucet, and
          <string-name>
            <given-names>Monica</given-names>
            <surname>Landoni</surname>
          </string-name>
          .
          <article-title>Overview of the INEX 2011 books and social search track</article-title>
          . In Shlomo Geva, Jaap Kamps, and Ralf Schenkel, editors,
          <source>Focused Retrieval of Content and Structure: 10th International</source>
          <article-title>Workshop of the Initiative for the Evaluation of XML Retrieval (INEX</article-title>
          <year>2011</year>
          ), volume
          <volume>7424</volume>
          <source>of LNCS</source>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Kara</given-names>
            <surname>Reuter</surname>
          </string-name>
          .
          <article-title>Assessing aesthetic relevance: Children's book selection in a digital library</article-title>
          .
          <source>JASIST</source>
          ,
          <volume>58</volume>
          (
          <issue>12</issue>
          ):
          <volume>1745</volume>
          {
          <fpage>1763</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Elaine</given-names>
            <surname>Svenonius</surname>
          </string-name>
          .
          <source>The Intellectual Foundation of Information Organization</source>
          . MIT Press,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Emine</given-names>
            <surname>Yilmaz</surname>
          </string-name>
          , Javed A.
          <string-name>
            <surname>Aslam</surname>
            , and
            <given-names>Stephen</given-names>
          </string-name>
          <string-name>
            <surname>Robertson</surname>
          </string-name>
          .
          <article-title>A new rank correlation coe cient for information retrieval</article-title>
          . In
          <string-name>
            <surname>Sung-Hyon</surname>
            <given-names>Myaeng</given-names>
          </string-name>
          , Douglas W. Oard, Fabrizio Sebastiani,
          <string-name>
            <surname>Tat-Seng Chua</surname>
          </string-name>
          , and Mun-Kew Leong, editors,
          <source>SIGIR</source>
          , pages
          <volume>587</volume>
          {
          <fpage>594</fpage>
          . ACM,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>