<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Squiggle: a Semantic Search Engine at work</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Irene Celino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Turati</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emanuele Della Valle</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dario Cerizza</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>CEFRIEL</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>I. Celino, A. Turati, E. Della Valle e D. Cerizza are researchers of the Semantic Web Activities group (see</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>-We present Squiggle, a Semantic Web framework that eases the deployment of semantic search engines. Search engines are becoming such an easy way to find textual resources that we wish to use them also for multimedia content; however, syntactic techniques, even if promising, are not up to the task. With Squiggle we prove that Semantic Web technologies provide real benefits to end users in terms of an easier and more effective access to information. The effectiveness of our approach is fully demonstrated by real-world deployments available on the web.</p>
      </abstract>
      <kwd-group>
        <kwd>Information retrieval</kwd>
        <kwd>Semantic search</kwd>
        <kwd>Multimedia content</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION AND MOTIVATIONS</title>
      <p>Swhen we need to find something. However, finding what</p>
      <p>
        EARCHING everything everywhere is becoming our habit
we need is often a hard job. Current search engine technology
is very good in finding complete Web pages, but it lacks the
desired precision1 and recall2 when searching for multimedia
resources. For instance, searching “jaguar” in an image search
engine results in a mix of felines and cars, which are difficult
to tell apart. Squiggle [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is an extensible semantic search
framework for the development of semantic search engines.
By adding a conceptual flavor to the crawling and the
indexing of resources, Squiggle can exploit ontological
elements to improve and enrich searching time, without
undermining the user experience. These features, together
with the employment of SKOS [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] model, make Squiggle a
powerful and reusable framework to build engines with both
syntactic and semantic functionality.
      </p>
    </sec>
    <sec id="sec-2">
      <title>II. SEARCHING WITH SQUIGGLE</title>
      <p>The interaction of a final user with Squiggle is intuitive and
very similar to the use of a traditional search engine; however,
the results are better and more meaningful. In the following
we provide examples of searches with Squiggle, explaining
how it works.</p>
    </sec>
    <sec id="sec-3">
      <title>1 Precision is the proportion of relevant data of all data retrieved.</title>
      <p>2 Recall is the proportion of retrieved relevant data, out of all available
relevant data.</p>
      <sec id="sec-3-1">
        <title>A. 1st step: Syntactic search and query analysis</title>
        <p>
          The user is presented with a simple search form in which he
can insert some keywords. Squiggle, like syntactic search
engines, can immediately retrieve all results containing those
keywords, since part of Squiggle is based on the well known
search engine library Lucene3 [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. For example, a ski fan
could search for images of “Herminator”, the famous Austrian
athlete Hermann Maier; using Squiggle, however, not only the
user obtains syntactically-matching results (images tagged
with the word “herminator”), but his query is also analyzed in
order to identify its meaning. This is possible because of
Squiggle disambiguation capabilities: the search engine can
access an ontology of the domain (e.g., an ontology of the
athletes in the skiing domain) and try to identify the concepts
that could have some connections with the query (in the
previous example, Squiggle identifies “herminator” as a
nickname, an alternative label for Hermann Maier).
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>B. 2nd step: Semantic search</title>
        <p>The results of the previous disambiguation phase are
displayed, together with the syntactic results, in a lateral box
under the heading “Did you mean...?”. Therefore, the user can
manually choose the meaning of his query among the
proposed ones. In response, he obtains more precise and
numerous results; this is possible because, during the contents
indexing, Squiggle is able to semantically annotate those
resources with regards to the domain ontology. In the previous
example, during the conceptual indexing phase, all the images
whose syntactic annotations contained both Hermann Maier
complete name (the concept’s skos:prefLabel) and his
nickname “herminator” (represented with a skos:altLabel
relation) were annotated with his concept URI.</p>
      </sec>
      <sec id="sec-3-3">
        <title>C. 3rd step: Semantic suggestions</title>
        <p>
          But there’s more: accessing the domain ontology, Squiggle
is able to exploit all its content, i.e. not only the alternative
labels of its concepts, but also their relations. This capability
lets Squiggle expand the user query, by following the
relations between the concepts identified in it and other
ontological elements, and propose to the user possible
searches of his interest. For example, a fan of the Queen band,
looking for audio files of their songs, could be presented with
a lateral box suggesting an expansion of his query to include
results related to Freddy Mercury (who could be in relation
with the Queen band by a skos:relatedPartOf property in an
ontology of the music domain). This meaning suggestion, as
well as the disambiguation phase described before, is possible
because Squiggle accesses a semantic repository built on
Sesame4 [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] that contains the domain ontology.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>III. SQUIGGLE REAL-WORLD DEPLOYMENTS</title>
      <p>To prove our approach, we built two different
domainspecific search engines with the Squiggle framework. The
following sections explain some of their distinctive
characteristics.</p>
      <sec id="sec-4-1">
        <title>A. Squiggle Ski</title>
        <p>CEFRIEL, as Official Supplier of Applied Academic
Research for Torino 2006 Olympic Winter Games5, caught the
opportunity to demonstrate Squiggle by developing Squiggle
Ski, which helps the international public in finding images of
the athletes involved in the alpine skiing races.</p>
        <p>In order to instantiate Squiggle Ski, we built a SKOS based
domain ontology partially by hand, developing a small
multilingual taxonomy of the disciplines in the sectors of
alpine skiing, and partially by collecting information on the
FIS-Ski web site6. Then we built an experimental focused
crawler that exploits the knowledge in the ski ontology to
collect images of skiers from sport news Web sites all over the
world. The awareness about all relevant terms (names of
athletes, disciplines, places, etc. with possible alternative
labels in different languages) helps both the focused crawler
to filter the appropriate photos and the conceptual indexer to
semantically annotate them before the indexing process.</p>
        <p>Squiggle Ski is freely available at
http://squiggle.cefriel.it/ski.</p>
      </sec>
      <sec id="sec-4-2">
        <title>B. Squiggle Music</title>
        <p>Squiggle Music is an instantiation of Squiggle framework
that indexes audio files (mainly mp3 files) enriching them
with information about artists, song titles and music genres.</p>
        <p>
          We created a SKOS-based ontology by merging two freely
available meta-databases: MusicMoz [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and MusicBrainz [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
For each song, MusicBrainz offers its TRM id7, which is an
audio fingerprinting unique for an audio track. Using tools
like QuickNamer8, that is able to calculate the TRM of a file,
and searching in MusicBrainz database for matching, it’s
possible to put an mp3 file in relation with the song’s
metadata. Therefore, combining the smart data from
MusicBrainz and MusicMoz with a smart machine like
QuickNamer, we built an automatic semantic annotator that
acts as a domain-dependent plug-in of Squiggle framework
during the conceptual indexing phase. This annotator is
therefore able to add to each file all its metadata (artist, song
title, etc.).
        </p>
        <p>Squiggle Music is on-line at http://squiggle.cefriel.it/music.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>IV. CONCLUSIONS</title>
      <p>We enlightened how the employment of Semantic Web
technologies to the development of search engines provides
real benefits to end users, enabling an easier and more
effective access to information; a semantic search engine like
Squiggle appears to be more usable, in that users are
supported with semantic “suggestions”, as our test-beds
demonstrate at a glance.</p>
      <p>Moreover, we designed Squiggle keeping in mind the
particular needs of searching when dealing with multimedia
contents: the extensible nature of Squiggle fully enables the
joint employment of smart machines to process media and
domain ontologies.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Arjohn</given-names>
            <surname>Kampman</surname>
          </string-name>
          , Frank van Harmelen,
          <string-name>
            <given-names>and Jeen</given-names>
            <surname>Broekstra</surname>
          </string-name>
          .
          <article-title>Sesame: A generic architecture for storing and querying rdf and rdf schema</article-title>
          .
          <source>in proceedings of ISWC 2002, October</source>
          <volume>7</volume>
          - 10, Sardinia, Italy,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>CEFRIEL</given-names>
            <surname>Semantic Web</surname>
          </string-name>
          <article-title>Activities group</article-title>
          . Squiggle home. http://squiggle.cefriel.it/,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Miles</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Brickley</surname>
          </string-name>
          . SKOS Core Guide, W3C Working Draft. http://www.w3.org/TR/swbp
          <article-title>-skos-core-guide, 2 November 2005</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4] MusicBrainz. http://www.musicbrainz.org,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5] MusicMoz. http://www.musicmoz.org,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Otis</given-names>
            <surname>Gospodnetic</surname>
          </string-name>
          and
          <string-name>
            <given-names>Erik</given-names>
            <surname>Hatcher</surname>
          </string-name>
          . Lucene in action.
          <source>Manning Publications</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>