Second International Workshop on Searching and
             Integrating New Web Data Sources (VLDS 2012)
                           Co-located with VLDB (Very Large DataBases) 2012
                                 http://vlds.search-computing.org
                  Marco Brambilla, Stefano Ceri                                      Tim Furche, Georg Gottlob
                         Politecnico di Milano, DEI                          Dept. of Computer Science, Oxford University
                                Milano, Italy                                                Oxford, UK
                  firstname.lastname@polimi.it                                    firstname.lastname@cs.ox.ac.uk

1.    CONTEXT                                                                   Solving these problems requires new solutions on the intersection
   Recent years witnessed an exponential growth of data providers            of data integration, multi-domain search, deep web extraction, and
available on the Web. These providers offer a plethora of differ-            information extraction. In this edition, a particular focus is the con-
ent ways of accessing their data sources, spanning from APIs over            struction of search services and knowledge bases from unstructured
proprietary query languages (such as Yahoo! Query Language,                  web data and the deep web.
YQL) to endpoints accessible through standard query languages
(e.g., SPARQL). At the same time, data is increasingly being la-
beled, tagged, and linked with existing data, partially due to social
                                                                             3.    TOPICS OF INTEREST
networking applications. These data sources expose their data as               The topics of interest for this workshop include:
semi-structured information and an increasing number also provide            Methods and tools for Search Services, including:
the information in the linked data cloud, with URI-based refer-                 – Modeling and Exposing search functionalities as services
ences between resources. Linked Open Data (LOD) emerges as a                    – Deploying and Using search services
best practice for exposing, sharing, and connecting pieces of data,             – Languages and platforms for composing search services
information, and knowledge.                                                     – Best practices and methodologies for designing and compos-
   This is a major change of paradigm. On one side, this augments                 ing search services
the power of search methods which access and query information                  – Mashup platforms and practices applied to search
with respect to the old-fashioned page based Web paradigm. On the            Methods and tools for deep web information access:
other side, though, this challenges the current information retrieval,          – Exploitation of public APIs for search (e.g., Google APIs,
data integration, and Web search practices to comply with the new                 Yahoo Query Language YQL)
shape and capabilities of new Web data sources. Searching for                   – Implementation issues of ranking, ordering, and chunking in
data upon such new, Web-enabled data sources has the potential of                 queries on data sources
reshaping the scenario of current Web applications, going beyond                – Use of query languages (including SQL, SPARQL, XQuery)
the capabilities of conventional search engines in solving search                 for deep web data sources
problems, but it also presents new technical challenges, for search             – Mashup platforms and practices for deep web data
as well as for surfacing techniques. Current web pages all to often          Methods and Tools for domain-specific search, including:
stick with the old-fashioned page-based Web paradigm. Therefore,                – Algorithms and tools for domain-specific or purpose-specific
methods for turning such web pages into search services or other                  search
forms of knowledge are very much necessary for search services to               – Best practices and methodologies for domain or purpose-
be universally useful.                                                            specific search
                                                                             Methods and Tools for Open Linked Data, including:
                                                                                – Algorithms and tools for search and exploration over linked
2.    GOAL                                                                        and semantically-enriched data
   This years’ VLDS workshop gathers, as in previous years, leading             – Methods for preparing and labeling data to support search
researchers and practitioners in the diverse fields related to data inte-         applications
gration, deep web search, and the construction of knowledge bases            User experience of search
from the web with the purpose of discussing innovative strategies               – User interfaces for search, including purpose- or domain-
for combining search facilities with integration aspects for Web data             specific services
sources. The workshop represents a unique venue for discussing all              – Information exploration and exploratory search over Web
the aspects related to the surfacing, publication, and orchestration              structured, semi-structured and unstructured data
of services over new Web data sources, the most suitable paradigms              – Continuous, incremental and push-based search
to improve the user experience in context, as well as the application        Applications of search
scenarios which may better benefit of these new technologies.                   – Warehousing and integration of searchable data
                                                                                – Enterprise search applications
VLDS’12. Istanbul, August 31st , 2012.
Copyright c 2012 for the individual papers by the papers’ authors. Copying
                                                                                – Social search
permitted for private and academic purposes. This volume is published and       – Web recommender systems
copyrighted by its editors.                                                  Benchmarks for search applications on integrated data
4.   PROGRAM COMMITTEE                                                 Web Knowledge Bases
   We wish to thank the PC members that contributed to the success     Marilena Oita, Antoine Amarilli and Pierre Senellart
of the workshop by carefully reviewing the submitted papers and              Cross-Fertilizing Deep Web Analysis and Ontology Enrich-
providing the authors with useful suggestions for improving the              ment
papers:                                                                Jianfeng Si, Qing Li, Tieyun Qian and Xiaotie Deng
  Robert Baumgartner Lixto Software GmbH                                     Hierarchical Clustering on HDP Topics to build a Semantic
  Michael Benedikt        Oxford University                                  Tree from Text
  Florian Daniel          University of Trento                         Ndapandula Nakashole, Mauro Sozio, Fabian Suchanek and Martin
  Anish Das Sarma         Google Research                              Theobald
  Arjen de Vries          CWI                                                Query-Time Reasoning in Uncertain RDF Knowledge Bases
  Sergio Flesca           DEIS - University of Calabria                      with Soft and Hard Rules
  Alejandro Jaimes        Yahoo! Research
  Arnd Christian König Microsoft Research                             Deep Web
  Jens Lehmann            Universität Leipzig
                                                                       Meghyn Bienvenu, Daniel Deutch, Davide Martinenghi, Pierre
  Ioana Manolescu         INRIA Saclay–Île-de-France and LRI,
                                                                       Senellart and Fabian Suchanek
                          Université Paris Sud-11
                                                                             Dealing with the Deep Web and all its Quirks
  Hamid Motahari          HP Labs
                                                                       Feng Niu, Ce Zhang, Christopher Re and Jude Shavlik
  Neoklis Polyzotis       University of California Santa Cruz
                                                                             DeepDive: Web-scale Knowledge-base Construction using
  David Robertson         University of Edinburgh
                                                                             Statistical Learning and Inference
  Mike Rosner             University of Malta
  Sebastian Schaffert     Salzburg Research Forschungsge-              Wrappers
                          sellschaft
  Klara Weiand            University of Munich                         Tim Furche, Giovanni Grasso, Christian Schallhart, Andrew Sellers
  Gerhard Weikum          KPI                                          and Antonino Rullo
  Clement Yu              University of Illinois at Chicago                  Think before you Act! Minimising Action Execution in
   We also wish to thank the additional reviewers that kindly helped         Wrappers
to PC to select the best papers for VLDS 2012.                         Rolando Creo, Valter Crescenzi, Disheng Qiu and Paolo Merialdo
                                                                             Minimizing the Costs of the Training Data for Learning Web
5.   WORKSHOP PROGRAM                                                        Wrappers
   The workshop received about 15 submissions, of which only           6.   ACKNOWLEDGEMENTS
about 50% (7 papers) have been accepted. For the workshop, the
                                                                          This workshop was partially supported by the Search Comput-
papers are divided into three sessions on “Web Knowledge Bases”,
                                                                       ing project (SeCo, http://www.search-computing.org) and by
“Deep Web”, and “Wrappers”. These papers are joined by two in-
                                                                       the DIADEM project (DIADEM, http://diadem.cs.ox.ac.uk),
vited keynotes by Gerhard Weikum (Max-Planck-Institut, Germany)
                                                                       both funded by the ERC under the Advanced Grant programme. We
and Raghu Ramakrishnan (Microsoft).
                                                                       would also like to thank Sun SITE Central Europe for hosting these
Invited Speakers                                                       proceedings on http://ceur-ws.org.
Gerhard Weikum
     Semantic Search: from Names and Phrases to Entities and
     Relations
Raghu Ramakrishnan
     The Future of Information Discovery and Search: Content
     Optimization, Interactivity, Semantics, and Social Networks