Squiggle: a Semantic Search Engine at work Irene Celino, Andrea Turati, Emanuele Della Valle and Dario Cerizza, CEFRIEL A. 1st step: Syntactic search and query analysis Abstract—We present Squiggle, a Semantic Web framework that eases the deployment of semantic search engines. The user is presented with a simple search form in which he Search engines are becoming such an easy way to find textual can insert some keywords. Squiggle, like syntactic search resources that we wish to use them also for multimedia content; engines, can immediately retrieve all results containing those however, syntactic techniques, even if promising, are not up to keywords, since part of Squiggle is based on the well known the task. With Squiggle we prove that Semantic Web search engine library Lucene3 [6]. For example, a ski fan technologies provide real benefits to end users in terms of an easier and more effective access to information. The effectiveness could search for images of “Herminator”, the famous Austrian of our approach is fully demonstrated by real-world deployments athlete Hermann Maier; using Squiggle, however, not only the available on the web. user obtains syntactically-matching results (images tagged with the word “herminator”), but his query is also analyzed in Index Terms—Information retrieval, Semantic search, order to identify its meaning. This is possible because of Multimedia content Squiggle disambiguation capabilities: the search engine can access an ontology of the domain (e.g., an ontology of the athletes in the skiing domain) and try to identify the concepts I. INTRODUCTION AND MOTIVATIONS that could have some connections with the query (in the S EARCHING everything everywhere is becoming our habit when we need to find something. However, finding what we need is often a hard job. Current search engine technology previous example, Squiggle identifies “herminator” as a nickname, an alternative label for Hermann Maier). is very good in finding complete Web pages, but it lacks the B. 2nd step: Semantic search desired precision1 and recall2 when searching for multimedia The results of the previous disambiguation phase are resources. For instance, searching “jaguar” in an image search displayed, together with the syntactic results, in a lateral box engine results in a mix of felines and cars, which are difficult under the heading “Did you mean...?”. Therefore, the user can to tell apart. Squiggle [2] is an extensible semantic search manually choose the meaning of his query among the framework for the development of semantic search engines. proposed ones. In response, he obtains more precise and By adding a conceptual flavor to the crawling and the numerous results; this is possible because, during the contents indexing of resources, Squiggle can exploit ontological indexing, Squiggle is able to semantically annotate those elements to improve and enrich searching time, without resources with regards to the domain ontology. In the previous undermining the user experience. These features, together example, during the conceptual indexing phase, all the images with the employment of SKOS [3] model, make Squiggle a whose syntactic annotations contained both Hermann Maier powerful and reusable framework to build engines with both complete name (the concept’s skos:prefLabel) and his syntactic and semantic functionality. nickname “herminator” (represented with a skos:altLabel relation) were annotated with his concept URI. II. SEARCHING WITH SQUIGGLE C. 3rd step: Semantic suggestions The interaction of a final user with Squiggle is intuitive and But there’s more: accessing the domain ontology, Squiggle very similar to the use of a traditional search engine; however, is able to exploit all its content, i.e. not only the alternative the results are better and more meaningful. In the following labels of its concepts, but also their relations. This capability we provide examples of searches with Squiggle, explaining lets Squiggle expand the user query, by following the how it works. relations between the concepts identified in it and other ontological elements, and propose to the user possible searches of his interest. For example, a fan of the Queen band, I. Celino, A. Turati, E. Della Valle e D. Cerizza are researchers of the looking for audio files of their songs, could be presented with Semantic Web Activities group (see http://swa.cefriel.it) at CEFRIEL – Politecnico di Milano, via Fucini 2, 20127 Milano (Italy); their email contacts a lateral box suggesting an expansion of his query to include are: {irene.celino, andrea.turati, emanuele.dellavalle, dario.cerizza}@cefriel.it results related to Freddy Mercury (who could be in relation with the Queen band by a skos:relatedPartOf property in an 1 Precision is the proportion of relevant data of all data retrieved. ontology of the music domain). This meaning suggestion, as 2 Recall is the proportion of retrieved relevant data, out of all available relevant data. well as the disambiguation phase described before, is possible IV. CONCLUSIONS because Squiggle accesses a semantic repository built on We enlightened how the employment of Semantic Web Sesame4 [1] that contains the domain ontology. technologies to the development of search engines provides real benefits to end users, enabling an easier and more III. SQUIGGLE REAL-WORLD DEPLOYMENTS effective access to information; a semantic search engine like To prove our approach, we built two different domain- Squiggle appears to be more usable, in that users are specific search engines with the Squiggle framework. The supported with semantic “suggestions”, as our test-beds following sections explain some of their distinctive demonstrate at a glance. characteristics. Moreover, we designed Squiggle keeping in mind the particular needs of searching when dealing with multimedia A. Squiggle Ski contents: the extensible nature of Squiggle fully enables the CEFRIEL, as Official Supplier of Applied Academic joint employment of smart machines to process media and Research for Torino 2006 Olympic Winter Games5, caught the domain ontologies. opportunity to demonstrate Squiggle by developing Squiggle Ski, which helps the international public in finding images of the athletes involved in the alpine skiing races. REFERENCES In order to instantiate Squiggle Ski, we built a SKOS based [1] Arjohn Kampman, Frank van Harmelen, and Jeen Broekstra. Sesame: A domain ontology partially by hand, developing a small generic architecture for storing and querying rdf and rdf schema. in multilingual taxonomy of the disciplines in the sectors of proceedings of ISWC 2002, October 7 - 10, Sardinia, Italy, 2002. [2] CEFRIEL Semantic Web Activities group. Squiggle home. alpine skiing, and partially by collecting information on the http://squiggle.cefriel.it/, 2005. FIS-Ski web site6. Then we built an experimental focused [3] A. Miles and D. Brickley. SKOS Core Guide, W3C Working Draft. crawler that exploits the knowledge in the ski ontology to http://www.w3.org/TR/swbp-skos-core-guide, 2 November 2005. [4] MusicBrainz. http://www.musicbrainz.org, 2005. collect images of skiers from sport news Web sites all over the [5] MusicMoz. http://www.musicmoz.org, 2005. world. The awareness about all relevant terms (names of [6] Otis Gospodnetic and Erik Hatcher. Lucene in action. Manning athletes, disciplines, places, etc. with possible alternative Publications, 2004. labels in different languages) helps both the focused crawler to filter the appropriate photos and the conceptual indexer to semantically annotate them before the indexing process. Squiggle Ski is freely available at http://squiggle.cefriel.it/ski. B. Squiggle Music Squiggle Music is an instantiation of Squiggle framework that indexes audio files (mainly mp3 files) enriching them with information about artists, song titles and music genres. We created a SKOS-based ontology by merging two freely available meta-databases: MusicMoz [4] and MusicBrainz [5]. For each song, MusicBrainz offers its TRM id7, which is an audio fingerprinting unique for an audio track. Using tools like QuickNamer8, that is able to calculate the TRM of a file, and searching in MusicBrainz database for matching, it’s possible to put an mp3 file in relation with the song’s metadata. Therefore, combining the smart data from MusicBrainz and MusicMoz with a smart machine like QuickNamer, we built an automatic semantic annotator that acts as a domain-dependent plug-in of Squiggle framework during the conceptual indexing phase. This annotator is therefore able to add to each file all its metadata (artist, song title, etc.). Squiggle Music is on-line at http://squiggle.cefriel.it/music. 3 http://lucene.apache.org 4 http://openrdf.org 5 http://www.cefriel.it/press/olimpiadi2006.html 6 http://www.fis-ski.com 7 http://www.relatable.com/tech/trm.html 8 http://phonascus.sourceforge.net