<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the INEX 2012 Linked Data Track</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Qiuyue Wang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaap Kamps</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georgina Ram rez Camps</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maarten Marx</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anne Schuth</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Theobald</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sairam Gurajada</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arunav Mishra</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Max Planck Institute for Informatics</institution>
          ,
          <addr-line>Saarbrucken</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Renmin University of China</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universitat Pompeu Fabra</institution>
          ,
          <addr-line>Barcelona</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Amsterdam</institution>
          ,
          <addr-line>Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper provides an overview of the Linked Data Track that was newly introduced to the set of INEX tracks in 2012. The goal of the new Linked Data Track was to investigate retrieval techniques over a combination of textual and highly structured data, where rich textual contents from Wikipedia articles serve as the basis for retrieval and ranking, while addtional RDF properties carry key information about semantic relations among entities that cannot be captured by keywords alone. Our intension in organizing this new track thus follows one of the key themes of INEX, namely to explore and investigate if and how structural information could be exploited to improve the e ectiveness of ad-hoc retrieval. In particular, we were interested in how this combination of data could be used together with structured queries to help users navigate or explore large sets of results (a task that is well-known from faceted search systems), or to address Jeopardy-style natural-language clues and questions (known, for example, from recent question answering settings over linked data collections, see for example [6]). The Linked Data Track thus aims to close the gap between IR-style keyword search and semantic-web-style reasoning techniques, with the goal to bring together di erent communities and to foster research at the intersection of Information Retrieval, Databases, and the Semantic Web. As its core collection, the Linked Data Track employs a fusion of XML-i ed Wikipedia articles with RDF properties from both DBpedia [4] and YAGO2 [5], the latter of which contain the article entity as either their subject ( rst argument) or object (second argument). The core data collection was based on the popular MediaWiki format1, where we additionally replaced all Wiki-markup by syntactically valid XML tags, attributes, and CDATA sections. In addition, all internal Wikipedia links (including the article entity itself) have been enriched with links to both their corresponding DBpedia and YAGO2 entities (as far as available). In addition, participants were explicitly encouraged to make use of 1 http://dumps.wikimedia.org/enwiki/20110722/</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>more RDF facts available from DBpedia and YAGO2, in particular for
processing the reasoning-related faceted search and Jeopardy topics. For INEX 2012,
we explored three di erent retrieval tasks:
{ The classic Ad-hoc Retrieval Task investigates informational queries to
be answered mainly by the textual contents of the Wikipedia articles.
{ The Faceted Search Task employs a hand-crafted hierarchy of facets and
facet-values obtained from DBpedia that aim to guide the searcher toward
relevant information.
{ The new Jeopardy Task employs natural-language Jeopardy clues which
are manually translated into a semi-structured query format based on SPARQL
with keyword lter conditions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data Collection</title>
      <p>The new Wikipedia-LOD (v1.1) collection is hosted by the Max Planck Institute
for Informatics and has been made available for download in May 2012 from the
following link: http://www.mpi-inf.mpg.de/inex-lod/wikipedia-lod-2012/</p>
      <p>The collection consists of 3 compressed tar.gz les and contains an overall
amount of 3.1 Million individual XML articles. The uncompressed size of the
collection is 61 GB. A detailed DTD le that describes the structure of the XML
collection is also available from the above URL. Each Wikipedia-LOD article
consists of a mixture of XML tags, attributes, and CDATA sections, containing
infobox attributes, free-text contents, describing the entity or category that the
article captures, and a section with both DBpedia and YAGO2 properties that
are related to the article's entity. All sections contain links to other Wikipedia
articles (including links to the corresponding DBpedia and YAGO2 resources),
Wikipedia categories, and external Web pages.</p>
      <p>Figure 1 shows an example of an XML-i ed Wikipedia article about the
entity Albert Einstein by depicting the two main sections of the article:
i) the Wikipedia section, containing an XML-i ed infobox, enhanced links
pointing to DBpedia and YAGO2, and Wikipedia text contents with more
XML markup, and
ii) the Linked Data section with RDF triples imported from both DBpedia and
YAGO2 that contain the entity Albert Einstein as either their subject or
object.</p>
      <p>
        Wikipedia To WikiXML Parser. For converting the raw Wikipedia articles
into our XML format, we used a parser derived from the wiki2xml parser [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
provided by MediaWiki [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The parser generates an XML le from the raw
Wikipedia article (originally in Wiki markup) by transforming infobox
information to a proper XML representation, comprehending links with DBpedia and
YAGO2 entities, and nally annotating each article with a list of RDF properties
from the DBpedia and YAGO2 knowledge sources.
Collection Statistics. The Wikipedia-LOD collection currently contains 3.1
Million XML documents in 3 compressed tar.gz les counting to the size of 61 GB
in uncompressed form. Table 1 provides more detailed numbers about di erent
properties of the collection.
      </p>
      <p>Linked Data Sources. In addition to the new core collection, which is based
on XML-i ed Wikipedia articles, the Linked Data Track explicitly encourages
(but does not require) the use of current Linked Open Data dumps for DBpedia
(v3.7) and YAGO2, which are available from the following URLs:
{ DBpedia v3.7 (created in July 2011):</p>
      <p>http://downloads.dbpedia.org/3.7/en/
{ YAGO2 core and full dumps (created on 2012-01-09):
http://www.mpi-inf.mpg.de/YAGO2-naga/YAGO2/</p>
      <p>Property
XML Documents
XML Elements
Wikipedia Category Articles
Wikipedia Entity Articles
Wikipedia Entity Articles with Infoboxes
Other Wikipedia Articles
Resolved DBpedia Links
Resolved YAGO2 Links
Intra-Wiki Links
External Web Links
Imported DBpedia Properties
Imported YAGO2 Properties</p>
      <p>DBpedia and YAGO2 are two comprehensive, common-sense knowledge bases
providing structured information that has been semi-automatically extracted
mostly from Wikipedia infoboxes and categories. Both knowledge bases focus
on extracting attribute-value pairs from Wikipedia infoboxes and category lists,
which serve as basis for applying various information extraction techniques. They
also contain geo-coordinates, links between Wikipedia pages, redirection and
disambiguation pages, external links, and much more. Each Wikipedia page
corresponds to a resource in DBpedia and YAGO2. The connection between the data
sets is given in the "wikipedia links en.nt" le from DBpedia. The following
entry, for example,
&lt;http://dbpedia.org/resource/AccessibleComputing&gt;
&lt;http://xmlns.com/foaf/0.1/page&gt;
&lt;http://en.wikipedia.org/wiki/AccessibleComputing&gt;
connects the DBpedia entity with the URI http://dbpedia.org/resource/
AccessibleComputing with the Wikipedia page that is available under the URI
http://en.wikipedia.org/wiki/AccessibleComputing.</p>
      <p>The Linked Data Track was explicitly intended to be an \open track" and
thus invited participants to include more Linked Data sources (see, for
example, http://linkeddata.org) or other sources that go beyond \just" DBpedia
and YAGO2. Any inclusion of further data sources was welcome, however,
workshop submissions and follow-up research papers should explicitly mention these
sources when describing their approaches.</p>
    </sec>
    <sec id="sec-3">
      <title>Retrieval Tasks and Topics</title>
      <sec id="sec-3-1">
        <title>Ad-hoc Task and Faceted Search Tasks</title>
        <p>The Ad-hoc Task is to return a ranked list of results (Wikipedia pages) estimated
relevant to the user's information need, which is typically formulated into a
keyword query. Given an exploratory or broad query, the search system may
return a large number of results. Faceted search is a way to help users navigate
through the large set of results to quickly identify the results of interest. It
presents the user a list of facet-values to re ne the query. After the user choosing
from the suggested facet-values, the result list is narrowed down and then the
system may present a new list of facet-values for the user to further re ne the
query. The interactive process continues until the user nds the items of interest.
One of the key issues in faceted search systems is to recommend appropriate
facet-values to help the user quickly identify what he/she really wants in the large
set of results. The task aims to investigate di erent techniques of recommending
facet-values.</p>
        <p>
          This year, we did not ask participants to submit ad-hoc or faceted search
topics. We generated and collected the topics from the following three sources.
Firstly, we built a three-level hierarchy of topics as described in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. For example,
Vietnam
        </p>
        <p>Vietnam war
Vietnam war movies
Vietnam war facts
Vietnam food</p>
        <p>Vietnam food recipes</p>
        <p>Vietnam food blog
Vietnam travel</p>
        <p>Vietnam travel national park</p>
        <p>Vietnam travel airports</p>
        <p>The topics on the top level are general topics, e.g., \Vietnam". We
randomly created 5 general topics, i.e. \Vietnam", \guitar", \tango", \bicycle",
and \music". For each general topic, we typed it into Google, and from Google's
online suggestions, we chose 3 subtopics. For example, when you type in
\Vietnam", Google may suggest \Vietnam war", \Vietnam food" or \Vietnam travel",
and so on, which can be viewed as subtopics to \Vietnam". Furthermore, for
each subtopic, we selected 2 sub-subtopics using Google Suggest again. Thus
we formed a three-level hierarchy of topics, with 5 general topics, 15 subtopics
and 30 sub-subtopics. Since the relevant answers for a topic can be treated as
the union of the relevant answers of all its subtopics, only the leaf-level topics,
i.e. 30 sub-subtopics need to be assessed. So we put the 30 sub-subtopics to
the Ad-hoc Task and 20 non-leaf level topics to the Faceted Search Task. The
relevance results for the ad-hoc topics will serve as the relevant results to their
corresponding faceted search topics.</p>
        <p>Secondly, we selected 20 topics from INEX 2009 and 2010 Ad-hoc Tracks to
compare the performance of di erent data collections. Since we want to select
challenging topics, we took 40 worst performed topics (with lowest average
precisions) from the INEX 2009 Ad-hoc Track and 30 worst performed topics from
the INEX 2010 Ad-hoc Track, and then randomly selected 10 topics from each
set. In this process, we also found some natural general topics, \Normandy",
\museum" and \social networ", which have multiple subtopics among the 20
topics that we collected. So we added the 3 topics to the set of faceted search
topics.</p>
        <p>
          Thirdly, to compare the performance of structured queries that were used in
Jeopardy Task and unstructured queries, we added all the 90 keyword titles of
Jeopardy topics into the set of ad-hoc topics. In total, we collected 140 ad-hoc
topics and 23 faceted search topics, which are in the same format as that in
previous years [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Jeopardy Task</title>
        <p>The new Jeopardy Task investigated retrieval techniques over a set of 90
naturallanguage Jeopardy-style clues and questions, which have been manually
translated into SPARQL query patterns that were enhanced with keyword-based
lter conditions. Speci cally, we investigated a data model, where every entity (in
DBpedia or YAGO2) is associated with the Wikipedia article (contained in the
Wikipedia-LOD v1.1 collection) that describes this entity. An XML le with 90
Jeopardy-style topics was made available available for download in June 2012
under the following URL:
http://www.mpi-inf.mpg.de/inex-lod/LDT-2012-jeopardy-topics.xml</p>
        <p>For example, topic no. 2012301 from the current set of Jeopardy topics looks
as follows:
&lt;topic id="2012301" category="LAKES"&gt;
&lt;jeopardy_clue&gt;Niagara Falls has its source of origin
from this lake. &lt;/jeopardy_clue&gt;
&lt;keyword_title&gt;Niagara Falls source lake&lt;/keyword_title&gt;
&lt;sparql_ft&gt;</p>
        <p>Select ?q Where {
&lt;http://dbpedia.org/resource/Niagara_Falls&gt;</p>
        <p>&lt;http://dbpedia.org/property/watercourse&gt; ?o .
?o &lt;http://dbpedia.org/ontology/origin&gt; ?q .</p>
        <p>Filter FTContains(?o, "river water course niagara") .</p>
        <p>Filter FTContains(?q, "lake origin of")}
&lt;/sparql_ft&gt;
&lt;/topic&gt;</p>
        <p>The &lt;jeopardy clue&gt; element contains the original Jeopardy clue as a
naturallanguage sentence; the &lt;keyword title&gt; element contains a set of keywords that
have been manually extracted from this title and will be reused as part of the
Ad-hoc Retrieval Task; and the &lt;sparql ft&gt; element contains a formulation
of the natural-language sentence into a corresponding SPARQL pattern. The
&lt;category&gt; attribute of the &lt;topic&gt; element may be used as an additional hint
for disambiguating the query.</p>
        <p>In the above query, the DBpedia entity
http://dbpedia.org/resource/Niagara Falls has been marked as the subject of the rst triplet pattern, while both
the object of the rst triplet pattern and the subject and object of the second
triplet pattern are unknown. The two FTContains lter conditions however
restrict both these subjects and objects to entities that should be associated with
the keywords \river water course niagara" and\lake origin" via the content of
their corresponding Wikipedia articles, respectively. The result of this query is
exactly one target entity, namely the DBpedia resource
http://dbpedia.org/resource/Lake Erie.</p>
        <p>
          Since this particular variant of processing SPARQL queries with full-text
lter conditions is not a default functionality of current SPARQL engines (and
queries should not be run against a standard RDF collection such as DBpedia
or YAGO2 alone), participants were encouraged to develop individual solutions
to index both the RDF and textual contents of the Wikipedia-LOD collection in
order to process these queries. Adding full-text search to SPARQL queries is an
ongoing research issue. While initial implementations and syntax proposals exist
(see for example [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]), we are not aware of any SPARQL engine that currently
allows for associating and indexing entire text documents along with RDF
resources. We also remark that this particular LOD data model di ers from most
current SPARQL full-text approaches, as we impose keyword conditions over
individual entities (resources) rather than entire facts (triplets).
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Run Submissions</title>
      <p>All run submissions were to be uploaded via the INEX website via the URL:
https://inex.mmci.uni-saarland.de/. The due date for the submission of all
LOD runs was July 14, 2012.
4.1</p>
      <sec id="sec-4-1">
        <title>Ad-Hoc and Jeopardy Tasks</title>
        <p>For the Ad-hoc and Jeopardy Tasks, each run must contain a maximum of 1,000
results per topic, ordered by decreasing value of relevance. For the Ad-hoc Task,
each result is a Wikipedia article uniquely identi ed by its page ID. For the
Jeopardy Task however, each query result could be a set of entities (identi ed by
their corresponding Wikipedia page IDs) in case that the select clause contains
more than one query variables. For relevance assessment and evaluation of the
results, we require submission les to be in the familiar TREC format, with
each row representing a single query result. In case the select clause contains
more than one query variable as in a Jeopardy topic, the row should consist of
a comma- or semicolon-separated list of target entity ID's. This list of entities
must re ect the order of query variables as speci ed by the select clause of the
Jeopardy topic.</p>
        <p>&lt;qid&gt; Q0 &lt;page_id_list&gt; &lt;rank&gt; &lt;rsv&gt; &lt;run_id&gt;
Where:
{ The rst column is the topic number.
{ The second column is the query number within that topic. This is currently
unused and should always be Q0.
{ The third column is a comma- or semicolon-separated list the ID's of the
resulting Wikipedia page(s).
{ The fourth column is the rank of the result.
{ The fth column shows the score (integer or oating point) that generated
the ranking.
{ The sixth column is called the \run tag" and should be a unique identi er
for your group AND for the method used. Run tags must contain 12 or fewer
letters and numbers, with NO punctuation, to facilitate labeling graphs with
the tags.</p>
        <p>An example submission thus may look as follows:
2012301 Q0 12 1 0.9999 2012UniXRun1
2012301 Q0 997 2 0.9998 2012UniXRun1
2012301 Q0 9989 3 0.9997 2012UniXRun1</p>
        <p>Here we have three results for topic \2012301". The rst result is the entity
(i.e. Wikipedia page) with ID \12". The second result is the entity with ID
\997", and the third result is the entity with ID \9989".
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Faceted Search Task</title>
        <p>For the Faceted Search Task, the organizers will provide a result le, which
contains a result list of maximum 2000 results for each general topic. Based on
the reference result le, a run submitted by a participant should be a XML le
conforming to the following DTD, which contains a hierarchy of recommended
facet-values for each topic, in which each node represents a facet-value and all
of its children constitute the newly recommended facet-value list when the user
selects this facet-value to re ne the query. The maximum fan-out of each node
in the hierarchy is restricted to be 20.
&lt;!ELEMENT run (topic+)&gt;
&lt;!ATTLIST run rid ID #REQUIRED&gt;
&lt;!ELEMENT topic (fv+)&gt;
&lt;!ATTLIST topic tid ID #REQUIRED&gt;
&lt;!ELEMENT fv (fv*)&gt;
&lt;!ATTLIST fv f CDATA #REQUIRED</p>
        <p>v CDATA #REQUIRED&gt;
Where:
{ The root element is &lt;run&gt;, which has an ID type attribute, rid, representing
the unique identi er of the run.
{ The &lt;run&gt; contains one or more &lt;topic&gt;'s. The ID type attribute, tid, in
each &lt;topic&gt; gives the topic number.
{ Each &lt;topic&gt; has a hierarchy of &lt;fv&gt;'s. Each &lt;fv&gt; shows a facet-value pair,
with f attribute being the facet and v attribute being the value. All the
possible facet-value pairs are from the triples in DBpedia or YAGO2.
{ The &lt;fv&gt;'s can be nested to form a hierarchy of facet-values.
An example submission is:</p>
        <p>Here for the topic \2012001", the faceted search system rst recommends
the facet-value condition \dbpedia-owl:date = 1955-11-01" among other
facetvalue conditions, which are its siblings. If the user selects this condition to
re ne the query, the system will recommend a new list of facet-value
conditions, which are \dbpedia-owl:place = dbpedia:South Vietnam" and
\dbpediaowl:place = dbpedia:North Vietnam". If the user then selects \dbpedia-owl:plac
= dbpedia:North Vietnam", the system will recommend the facet-value
condition \rdbprob:capital = dbpedia:Ho Chi Minh City". Note that no facet-value
condition may occur twice on a path in the hierarchy.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Relevance Assessments and Evaluation Metrics</title>
      <p>In total 20 ad-hoc search runs were submitted by 7 participants, i.e., Ecole des
Mines de Saint-Etienne (EMSE), Kasetsart University, Renmin University of
China, University of Otago, Oslo University College, University of Amsterdam,
Norwegian University of Science and Technology (NTNU), and 5 valid Jeopardy
runs were submitted by 2 participants, i.e., Kasetsart University and Max-Planck
Institute for Informatics (MPI).</p>
      <p>Assessment was done using the Amazon Mechanical Turk. We did not assess
the 20 topics from the INEX 2009 and 2010 Ad-hoc Tracks as we could use the
assessment results done in previous years. We assessed the 30 sub-subtopics and
50 Jeopardy topics randomly selected from the 90 ones. For each sub-subtopic,
we pooled all the submitted runs in a round-robin manner, and then picked up
the top 200 results to be assessed. For each selected Jeopardy topic, we pooled
the results in the same way and picked up the top 100 results to be assessed as
in general Jeopardy Task can be viewed a known-item search.</p>
      <p>The TREC MAP metric, as well as P@5, P@10, P@20 and so on, was used
to measure the performance of all ad-hoc and Jeopardy runs. For the Faceted
Search Task, we use the same metrics as that used in last year [?] to evaluate
the runs.
6
6.1</p>
    </sec>
    <sec id="sec-6">
      <title>Results</title>
      <sec id="sec-6-1">
        <title>Ad-hoc and Jeopardy Task Results</title>
        <p>As mentioned above, 140 ad-hoc topics were collected from three di erent sources:
sub-subtopics, old topics from INEX 2009 and 2010, and keyword titles of
Jeopardy topics. Among them, the 30 sub-subtopics, 20 old topics and 50 Jeopardy
topics have assessment results. In this section, we will rst present the evaluation
results over the whole set of ad-hoc topics for all the submitted runs, and then
analyze the e ectiveness of the runs for each of the three sets of topics.</p>
        <p>There are 20 runs submitted to the Ad-hoc Task by 7 participating groups.
For each group, we selected its best performing run in terms of MAP, since MAP
averages reasonably well over all topic types. Table 2 shows an overview of the 7
best performing runs from di erent groups. Over all topics, the best scoring run
is from the Renmin University of China with a MAP of 0.2776 and also highest
1/rank, P@5, P@10, P@20 and P@30. Second best scoring team is University
of Otago (0.2721). Third best scoring team is Ecole des Mines de Saint-Etienne
(0.2609). Interpolated precision against recall is plotted in Fig 2, which shows
little di erences among the 3-4 best performing runs. The best performing runs
are quite similar actually.</p>
        <p>Table 3 shows the results over the 30 sub-subtopics. Since University of
Amsterdam did not submit any results on sub-subtopics, there are only 6 instead of 7
runs in the table. We see that Renmin University of China (0.33365), University
of Otago (0.3081), and Ecole des Mines de Saint-Etienne (0.2991) are still the 3
best performing groups.</p>
        <p>Table 4 shows the results over the 20 old topics from INEX 2009 and 2010
Ad-hoc Tracks, now again evaluated by MAP. There are only 6 runs in the table
since Oslo University College did not submit any results on this set of topics. We
see that Renmin University of China still performs the best in terms of MAP
(0.0936), and University of Amsterdam runs the second with the best 1/rank
and P@5. The MAPs are commonly very low for this set of topics. This is no
surprise since these are \hard" topics from previous years.</p>
        <p>Table 5 shows the results over only the Jeopardy topics, now evaluated by
the mean reciprocal rank (1/rank). There are 7 groups submitted results to the
Jeopardy topics, even though some of them submitted the runs to the
Jeopardy task not to the Ad-hoc Task. We observe that Renmin University of China
(0.7655) runs the rst in terms of the mean reciprocal rank (1/rank), but
University of Otago (0.741) has the best MAP. The second best scoring team in
terms of 1/rank is University of Amsterdam.
7</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusions and Future Work</title>
      <p>The Linked Data Track, which was a new track in INEX 2012, was organized
towards our goal to close the gap between IR-style keyword search and
semanticweb-style reasoning techniques. The track thus continues one of the earliest
guiding themes of INEX, namely to investigate whether structure may help to
improve the results of ah-hoc keyword search. As a core of this e ort, we
introduced a new document collection, coined Wikipedia-LOD v1.1, of XML-i ed
Wikipedia articles which were additionally annotated with RDF-style
resourceproperty pairs from both DBpedia and YAGO2. This document collection serves
as the basis for three tasks: i) the Ad-hoc Retrieval Task, ii) the Faceted Search
Task, and iii) a new Jeopardy Task, which were all held as part of this year's
Linked Data Track. We believe that this track encourages further research
towards applications that exploit semantic annotations over large text collections
and thus facilitates the development of e ective retrieval techniques for the same.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Media</given-names>
            <surname>Wiki</surname>
          </string-name>
          . http://www.mediawiki.org/wiki/MediaWiki.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>2. SPARQL FullText, W3C Working Draft. http://www.w3.org/2009/sparql/wiki/Feature:FullText.</mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>3. Wikipedia To XML Extension. http://www.mediawiki.org/wiki/Extension:Wiki2xml.</mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          , G. Kobilarov,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z. G.</given-names>
            <surname>Ives. DBpedia</surname>
          </string-name>
          :
          <article-title>A Nucleus for a Web of Open Data</article-title>
          .
          <source>In ISWC/ASWC</source>
          , pages
          <volume>722</volume>
          {
          <fpage>735</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. J. Ho art,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Berberich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Lewis-Kelham</surname>
          </string-name>
          , G. de Melo, and
          <string-name>
            <surname>G. Weikum.</surname>
          </string-name>
          <article-title>YAGO2: exploring and querying world knowledge in time, space, context, and many languages</article-title>
          .
          <source>In WWW (Companion Volume)</source>
          , pages
          <fpage>229</fpage>
          {
          <fpage>232</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>V.</given-names>
            <surname>Lopez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. S.</given-names>
            <surname>Uren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sabou</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Motta</surname>
          </string-name>
          .
          <article-title>Is Question Answering t for the Semantic Web?: A survey</article-title>
          .
          <source>Semantic Web</source>
          ,
          <volume>2</volume>
          (
          <issue>2</issue>
          ):
          <volume>125</volume>
          {
          <fpage>155</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>A.</given-names>
            <surname>Schuth</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Marx</surname>
          </string-name>
          . SPARQL FullText, W3C Working Draft. http://staff.science.uva.nl/ marx/pub/INEX/facetedtaskproposal.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          , G. Ram rez, M. Marx,
          <string-name>
            <given-names>M.</given-names>
            <surname>Theobald</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          .
          <article-title>Overview of the INEX 2010 Data Centric Track</article-title>
          .
          <source>In INEX</source>
          , volume
          <volume>7424</volume>
          of Lecture Notes in Computer Science, pages
          <volume>118</volume>
          {
          <fpage>137</fpage>
          . Springer, Heidelberg,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>