-

Ideas for a Standard LL4IR Extension

Philipp Schaer

philipp.schaer@th-koeln.de 0 1

Narges Tavakolpoursaleh

narges.tavakolpoursaleh@gesis.org 0 1 0 A LL4IR Component for Popular Search Environments 1 Cologne University of Applied Sciences , Cologne , Germany GESIS 2 Leibniz Institute for the Social Sciences , Cologne , Germany

We introduce the idea of developing a standard extension for common search engines and repository systems. This would not only increase the number of possible living labs participants on the site level but would additionally bring some other bene ts like common standards and practices. We already developed such an extension for the repository system DSpace that might be a basis for future implementations.

The idea of Living Labs for Information Retrieval has been successfully implemented in international IR evaluation campaigns like CLEF or TREC. To establish a robust and stable evaluation environment the LL4IR API is publicly available and is professionally hosted thanks to a funding by EFS ELIAS and Microsoft Azure. However, until today only ve platforms implemented the API within their systems: REGIO JATEK and Seznam[ 1 ], and the three academic sites CiteseerX, Microsoft Academic Search, and the Social Science Open Access Repository SSOAR1. The latter being developed by the authors. While implementing the LL4IR component into SSOAR we learned: (1) The process of extracting head queries, compiling the JSON markup, establishing a work ow for uploading the feedback, implementing the interleaving, and many other tasks sum up and make it quite an e ort to be part of LL4IR. (2) The query distribution of our system is highly skewed and not many typical head queries that are issued several hundred times a day are present. This might be due to the quite speci c range of topics represented within the repository and the nature of academic search itself. Another reason might be that a lot of users are using the direct links provided by other search engines like Microsoft Academic Search. SSOAR is DSpace-based repository system that uses a Solr search engine. While implementing the LL4IR component we paid attention to make it as minimally 1 http://trec-open-search.org/sites/ invasive and encapsulated as possible. This led to a quite reusable piece of software that might be used as an o cial extension for DSpace. We tested the extension with the stable branches 3 and 5, both within an out-of-the-box vanilla installation and the speci c implementation of SSOAR. We believe this to be a bene t for the whole repository community as this allows other repository operators to easily be part of the LL4IR community. There are more than 1,350 systems listed in OpenDOAR, a registry for Open Access Repositories, that are based on DSpace. A huge eld of candidates for next year's CLEF or TREC LL4IR campaigns. These di erent installations share a common system setup but featuring di erent content (e.g. repositories from the social sciences, arts or the sciences). This introduces the possibility to test ranking mechanisms in very di erent domains.

Due to many comparable systems within the same campaign it might be possible to surpass the missing head queries as the systems might share only their top n queries and sum up their head queries with other DSpace installations. This wouldn't lead to 100 head queries but maybe thousands. Why not use that many di erent queries and than later decide which of them are statistically stable enough to be included within the nal evaluation round?

A standard LL4IR implementation introduces the possibilities to set some common standards and practices, like same timeout con gurations, the guarantee that the interleaving algorithm is the same in every system, and so on.

By having both Microsoft Academic Search as a search engine and DSpace systems as the content-bearing repositories it might be possible to interlink search sessions. While a user is searching for scienti c documents within Academic Search he is later transferred to the repository where he can nd the full text or additional document information. When both systems are part of the Living Labs campaign they might include a common URL parameter to indicate that this speci c request is to be taken into account for the LL4IR campaign. This way we can nd out if users are interacting with the document like e.g. bookmarking them, recommending them, tweeting about them or downloading the PDF le. 3

Conclusion and Outlook All the things we listed above are still true for other popular search environments like Solr-based systems or content management systems like Typo3 or Wordpress. We therefore would like to discuss whether the idea of using standard extension for popular search environments is worth the try and what other positive or negative outcomes might be possible.

1. Schuth , A. , Balog , K. , Kelly , L. : Overview of the living labs for information retrieval evaluation (ll4ir) clef lab 2015 . In: CLEF 2015 - 6th Conference and Labs of the Evaluation Forum. Lecture Notes in Computer Science (LNCS) , Springer ( September 2015 )