Second International Workshop on Searching and Integrating New Web Data Sources (VLDS 2012) Co-located with VLDB (Very Large DataBases) 2012 http://vlds.search-computing.org Marco Brambilla, Stefano Ceri Tim Furche, Georg Gottlob Politecnico di Milano, DEI Dept. of Computer Science, Oxford University Milano, Italy Oxford, UK firstname.lastname@polimi.it firstname.lastname@cs.ox.ac.uk 1. CONTEXT Solving these problems requires new solutions on the intersection Recent years witnessed an exponential growth of data providers of data integration, multi-domain search, deep web extraction, and available on the Web. These providers offer a plethora of differ- information extraction. In this edition, a particular focus is the con- ent ways of accessing their data sources, spanning from APIs over struction of search services and knowledge bases from unstructured proprietary query languages (such as Yahoo! Query Language, web data and the deep web. YQL) to endpoints accessible through standard query languages (e.g., SPARQL). At the same time, data is increasingly being la- beled, tagged, and linked with existing data, partially due to social 3. TOPICS OF INTEREST networking applications. These data sources expose their data as The topics of interest for this workshop include: semi-structured information and an increasing number also provide Methods and tools for Search Services, including: the information in the linked data cloud, with URI-based refer- – Modeling and Exposing search functionalities as services ences between resources. Linked Open Data (LOD) emerges as a – Deploying and Using search services best practice for exposing, sharing, and connecting pieces of data, – Languages and platforms for composing search services information, and knowledge. – Best practices and methodologies for designing and compos- This is a major change of paradigm. On one side, this augments ing search services the power of search methods which access and query information – Mashup platforms and practices applied to search with respect to the old-fashioned page based Web paradigm. On the Methods and tools for deep web information access: other side, though, this challenges the current information retrieval, – Exploitation of public APIs for search (e.g., Google APIs, data integration, and Web search practices to comply with the new Yahoo Query Language YQL) shape and capabilities of new Web data sources. Searching for – Implementation issues of ranking, ordering, and chunking in data upon such new, Web-enabled data sources has the potential of queries on data sources reshaping the scenario of current Web applications, going beyond – Use of query languages (including SQL, SPARQL, XQuery) the capabilities of conventional search engines in solving search for deep web data sources problems, but it also presents new technical challenges, for search – Mashup platforms and practices for deep web data as well as for surfacing techniques. Current web pages all to often Methods and Tools for domain-specific search, including: stick with the old-fashioned page-based Web paradigm. Therefore, – Algorithms and tools for domain-specific or purpose-specific methods for turning such web pages into search services or other search forms of knowledge are very much necessary for search services to – Best practices and methodologies for domain or purpose- be universally useful. specific search Methods and Tools for Open Linked Data, including: – Algorithms and tools for search and exploration over linked 2. GOAL and semantically-enriched data This years’ VLDS workshop gathers, as in previous years, leading – Methods for preparing and labeling data to support search researchers and practitioners in the diverse fields related to data inte- applications gration, deep web search, and the construction of knowledge bases User experience of search from the web with the purpose of discussing innovative strategies – User interfaces for search, including purpose- or domain- for combining search facilities with integration aspects for Web data specific services sources. The workshop represents a unique venue for discussing all – Information exploration and exploratory search over Web the aspects related to the surfacing, publication, and orchestration structured, semi-structured and unstructured data of services over new Web data sources, the most suitable paradigms – Continuous, incremental and push-based search to improve the user experience in context, as well as the application Applications of search scenarios which may better benefit of these new technologies. – Warehousing and integration of searchable data – Enterprise search applications VLDS’12. Istanbul, August 31st , 2012. Copyright c 2012 for the individual papers by the papers’ authors. Copying – Social search permitted for private and academic purposes. This volume is published and – Web recommender systems copyrighted by its editors. Benchmarks for search applications on integrated data 4. PROGRAM COMMITTEE Web Knowledge Bases We wish to thank the PC members that contributed to the success Marilena Oita, Antoine Amarilli and Pierre Senellart of the workshop by carefully reviewing the submitted papers and Cross-Fertilizing Deep Web Analysis and Ontology Enrich- providing the authors with useful suggestions for improving the ment papers: Jianfeng Si, Qing Li, Tieyun Qian and Xiaotie Deng Robert Baumgartner Lixto Software GmbH Hierarchical Clustering on HDP Topics to build a Semantic Michael Benedikt Oxford University Tree from Text Florian Daniel University of Trento Ndapandula Nakashole, Mauro Sozio, Fabian Suchanek and Martin Anish Das Sarma Google Research Theobald Arjen de Vries CWI Query-Time Reasoning in Uncertain RDF Knowledge Bases Sergio Flesca DEIS - University of Calabria with Soft and Hard Rules Alejandro Jaimes Yahoo! Research Arnd Christian König Microsoft Research Deep Web Jens Lehmann Universität Leipzig Meghyn Bienvenu, Daniel Deutch, Davide Martinenghi, Pierre Ioana Manolescu INRIA Saclay–Île-de-France and LRI, Senellart and Fabian Suchanek Université Paris Sud-11 Dealing with the Deep Web and all its Quirks Hamid Motahari HP Labs Feng Niu, Ce Zhang, Christopher Re and Jude Shavlik Neoklis Polyzotis University of California Santa Cruz DeepDive: Web-scale Knowledge-base Construction using David Robertson University of Edinburgh Statistical Learning and Inference Mike Rosner University of Malta Sebastian Schaffert Salzburg Research Forschungsge- Wrappers sellschaft Klara Weiand University of Munich Tim Furche, Giovanni Grasso, Christian Schallhart, Andrew Sellers Gerhard Weikum KPI and Antonino Rullo Clement Yu University of Illinois at Chicago Think before you Act! Minimising Action Execution in We also wish to thank the additional reviewers that kindly helped Wrappers to PC to select the best papers for VLDS 2012. Rolando Creo, Valter Crescenzi, Disheng Qiu and Paolo Merialdo Minimizing the Costs of the Training Data for Learning Web 5. WORKSHOP PROGRAM Wrappers The workshop received about 15 submissions, of which only 6. ACKNOWLEDGEMENTS about 50% (7 papers) have been accepted. For the workshop, the This workshop was partially supported by the Search Comput- papers are divided into three sessions on “Web Knowledge Bases”, ing project (SeCo, http://www.search-computing.org) and by “Deep Web”, and “Wrappers”. These papers are joined by two in- the DIADEM project (DIADEM, http://diadem.cs.ox.ac.uk), vited keynotes by Gerhard Weikum (Max-Planck-Institut, Germany) both funded by the ERC under the Advanced Grant programme. We and Raghu Ramakrishnan (Microsoft). would also like to thank Sun SITE Central Europe for hosting these Invited Speakers proceedings on http://ceur-ws.org. Gerhard Weikum Semantic Search: from Names and Phrases to Entities and Relations Raghu Ramakrishnan The Future of Information Discovery and Search: Content Optimization, Interactivity, Semantics, and Social Networks