Overview of Graph Search and Beyond Omar Alonso Marti A. Hearst Jaap Kamps Microsoft UC Berkeley University of Amsterdam Mountain View, CA Berkeley, CA Amsterdam USA USA The Netherlands ABSTRACT 1. INTRODUCTION Modern Web data is highly structured in terms of entities Information on the Web is increasingly structured in terms and relations from large knowledge resources, geo-temporal of entities and relations from large knowledge resources, geo- references and social network structure, resulting in a mas- temporal references and social network structure, resulting sive multidimensional graph. This graph essentially uni- in a massive multidimensional graph. This graph essentially fies both the searcher and the information resources that unifies both the searcher and the information resources that played a fundamentally different role in traditional IR, and played a fundamentally different role in traditional IR, and “Graph Search” offers major new ways to access relevant offers major new ways to access relevant information. In ser- information. Graph search affects both query formulation vices that rely on personalized information like social net- (complex queries about entities and relations building on works, the graph plays an even more important role, in other the searcher’s context) as well as result exploration and dis- words: you are the query. covery (slicing and dicing the information using the graph Graph search affects both query formulation as well as structure) in a completely personalized way. This new graph result exploration and discovery. On the one hand, it al- based approach introduces great opportunities, but also great lows for incrementally expressing complex information needs challenges, in terms of data quality and data integration, that triangulate information about multiple entities or entity user interface design, and privacy. types, relations between those entities, with various filters We view the notion of “graph search” as searching in- on geo-temporal constraints or the sources of information formation from your personal point of view (you are the used (or ignored), and taking into account the rich profile query) over a highly structured and curated information and context information of the searcher (and his/her peers, space. This goes beyond the traditional two-term queries and peers of peers, etc). On the other hand, it allows for and ten blue links results that users are familiar with, re- more powerful ways to explore the results from various as- quiring a highly interactive session covering both query for- pects and viewpoints, by slicing and dicing the information mulation and result exploration. The workshop attracted using the graph structure, and using the same structure for a range of researchers working on this and related topics, explaining why results are retrieved or recommended, and and made concrete progress working together on one of the by whom. greatest challenges in the years to come. This new graph based information seeking approach in- troduces great opportunities, but also great challenges, both Categories and Subject Descriptors technical ranging from data quality and data integration to user interface design, as well as ethical challenges in terms of H.3.3 [Information Storage and Retrieval]: Information privacy; transparency, bias and control; and avoiding the so- Search and Retrieval—Query formulation, Search process, called filter bubbles. Graph search is already available today Selection process in many flavors with different levels of interactivity. Social network-based services like Facebook and LinkedIn provide Keywords flexibility to search their personal network form many di- verse angles. Web search engines like Google and Bing rely Graph search; Semantic search; Personalization; Exploration; more on using graphs to show related content as a mecha- Query suggest nism to include other possible contexts for a given query. Clearly, it is not limited to web, and can be applied to other highly structured data. Just to give an example, the hansards or parliamentary proceedings are fully public data with a clear graph structure linking every speech to the re- spective speaker, their role in parliament and their political party. Graph search allows to explore politics from the view- point of individual members of parliament or government. At a high level, graph search seems limited to familiar Copyright c 2015 for the individual papers by the papers’ authors. Copying permit- ted for private and academic purposes. This volume is published and copyrighted by entity types (e.g., Facebook entities) and templates. How its editors. far can this scale? Will this work on truly open domains? SIGIR Workshop on Graph Search and Beyond ’15 Santiago, Chile There is a huge potential to use the graph to go beyond rec- Published on CEUR-WS: http://ceur-ws.org/Vol-1393/. ommendations for new friends and contacts or semantically ther aspects? How can we augment query autocomple- related content. Unlocking the potential of richer knowledge tion to actively prompt user to interactively construct sources for new search strategies requires us to think outside longer queries exploring different aspects? the box, by combining different insights from IR, semantic Result Exploration There is a radical shift towards the search, data integration, query expansion and user interfaces control of the searcher—small changes in the query can to name a few. lead to radically different result sets—how can we sup- The rest of this report is structured in the following way. port active exploration of slices of the data to explore Section 2 discusses the open research questions raised by further aspects? Unlike traditional facetted search op- searching highly structured data from a personal point of tions, the result space is highly dynamic, how can we view. Next, in Section 3) we discuss the four keynotes who provide adaptive exploration options tailored to the helped frame the problems and reach a shared understand- context and searcher, at every stage of the process? ing of the issues involved amongst all workshop attendees. Rose Marie Philip talked about personalized post search at Evaluation How do we know the system is any good? How Facebook, Swee Lim about graph search at Linkedin, Doug to evaluate the overall process, given its personalized Oard about good uses for crummy knowledge graphs, and and interactive nature? Can we rely on the direct Alex Wade about Microsoft academic graph. Section 4 dis- evaluation of query suggestions and query recommen- cusses the six contributed papers, which were presented in a dations? Are there suitable behavioral criteria for in boaster and poster session. Finally, Section 5 provides pre- the wild testing, such as longer queries, multiple fil- liminary discussion of the results and progress made during ters, longer dwell-time, more active engagement, more the workshop. structured-query templates? Can we use are stan- dard experimental evaluation methods from HCI and UI/UX design? 2. OPEN RESEARCH QUESTIONS We view the notion of “graph search” as searching in- Privacy Access to personal data is fraught ethical and pri- formation from your personal point of view (you are the vacy concerns, is there is similarly structured public query) over a highly structured and curated information data for scientific research? As an extreme form of space. This goes beyond the traditional two-term queries personalization, how to avoid the uncanny cave, filter and ten blue links results that users are familiar with, re- bubbles and echo chambers? How ethical is it to priv- quiring a highly interactive session covering both query for- ilege a particular query refinement suggestion over the mulation and result exploration. many other possible candidates? This raises many open questions: Further discussion on the challenges of graph based search can be found in [1]. IR Theory What happens if search gets personal? Does this break the classic dichotomy between users and documents, as users are nodes in the social network 3. KEYNOTES data themselves? What is the consequence of ultimate Four invited speakers helped frame the problems and reach personalization, as the local graph differs for all users? a shared understanding of the issues involved amongst all As the local graph structure is key, does this obviate workshop attendees. the need for large central indexes? Do these types of requests fit in the classic paradigm (e.g., Broder’s tax- 3.1 Personalized Post Search at Facebook onomy)? How does this shift the balance between the The opening keynote was given by Rose Marie Philip control of the searcher and the ranker over the result (Facebook) on “personalized post search at Facebook” [7]. set? There are over a billion people and over a trillion posts on Data Integration Building a knowledge graph requires mas- Facebook. Among these posts, there are uniquely personal- sive data integration at many levels: are there trade- ized answers to many search queries. The goal of Facebook offs in simplicity and level of detail (such as the classic post search is to help people find the most personally rele- knowledge representation trade-off)? What levels of vant posts for each individual query, tailored to the content granularity and comprehensiveness are needed for ef- of people’s networks. In this talk, I will present some of our fective deployment? What quality is needed: is any work to build a search product that uses personalized graph noise acceptable? How to deal with near duplicate de- signals in ranking. I will also give an overview of query tection, conflation, or entity disambiguation? modification, posts retrieval and ranking of results. Use Cases and Applications Rather than a universal so- 3.2 Graph Search at Linkedin lution, graph search is particularly useful for specific The second keynote in the morning was given by Swee types of information needs and queries. What are Lim (Linkedin) on “graph search at Linkedin” [5]. the data and tasks that make graph search works? Linkedin is the largest professional social network. Linkedin’s What kind of scenarios that would benefit from a graph graph and search systems help our users discover other users, model? In what context can switching perspectives by jobs, companies, schools, and relevant professional informa- showing results from the vista of other persons useful? tion. I will present the evolution of these systems, how they Query formulation How to move from singular queries to support current use cases, their strengths and weaknesses, highly interactive sessions with multiple variant queries? our next generation systems, and how we intend to leverage What new tools are needed to help a searcher construct these systems to perform graph searches. the appropriate graph search query using refinements or filters to better articulate their needs, or explore fur- 3.3 Good Uses for Crummy Knowledge Graphs The first keynote in the afternoon was given by Doug Oard archies. Similarity measures are central in IR, and related (University of Maryland) on “good uses for crummy knowl- distance measures central in graph data. The discussion is edge graphs” [6]. motivated by a use case of “paradigmatic” structures. In 1993, Ken Church and Ed Hovy suggested that be- Tong et al. [12] investigate category and word relation fore we ask how well some new technology meets the need graphs for retrieving trouble shooting information/documents, we envision for it, we should pause and first reflect on the addressing the classical IR problem of human assigned con- question of whether – now that we know something about trolled terms versus document free text in the context of a what can be built – we are envisioning the right uses for curated data space and rich context (at least in principle). what we have. They titled their paper “Good Applications The paper offers an interesting graph approach is outlined, for Crummy Machine Translation” [3]. At about that same mapping terms to categories, for both requests and docu- time, information retrieval researchers obliged them by (gen- ments. Making this graph level explicitly available to users erally without having read their paper) starting to work offers interesting new possibilities, and opens up ways to on cross-language information retrieval; arguably the best map the noisy term occurrence space to the curated, con- application for crummy machine translation ever invented. cept and entity based space of the category codes. Hence Now we have some crummy knowledge graphs – and this this paves the way to a semantic, entity based view. time we have read the Church and Hovy paper – so perhaps Yu et al. [14] investigates the strength of connections in an the time is right for us to ask whether we have yet envisioned entity graph, specifically a scholarly network with a rich en- good uses for crummy knowledge graphs. In this talk, I will tity graph available as public data. This is an interesing use seek to seed that discussion. case with a curated data space and rich context, plus an in- teresting dynamic structure over time. The paper proposes 3.4 Overview of Microsoft Academic Graph to take the strength (or weakness) of connections into ac- The second afternoon keynote was given by Alex Wade count — here as simulated blind feedback — turning a net- (Microsoft Research) with an “overview of Microsoft aca- work into a weighted network of the simple graph into a demic graph” [13] valued graph. The Microsoft Academic Graph is a heterogeneous graph containing scientific publication records, citation relation- ships between those publications, as well as authors, insti- 5. CONCLUSIONS tutions, journals and conference “venues” and fields of study. The workshop brought together researchers from a range of areas in information access, who worked together on search- 4. ACCEPTED PAPERS ing information from your personal point of view over a We requested the submission of short, 4–5 page papers to highly structured and curated information space. One of be presented as boaster and poster. We accepted a total the main lines of discussion was the considerable industrial of 6 papers out of 8 submissions after peer review (a 75% activity around social graphs. The most famous example acceptance rate). is Facebook Graph Search, a feature that allows users to Jadeja and Shah [4] investigate data driven ways to visu- perform more sophisticated searches on their social network alize and navigate graph or tree structured data. Navigating [11]. Bing has been integrating Facebook into their web or traversing highly curated graph data is an understudied search results for the last couple of years. Similarly, Google problem, and hierarchical or tree visualizations can help cre- has been annotating search results with Google+ profiles. ate order and overview. When visualizing the data from the And all the rest of the search industry is moving in the viewpoint of a particular node makes any graph data (such same direction. as social network data) look like a tree with the starting There are also crucial links with work on searching struc- node (a person with all context) as point of origin. tured data, and work on the appropriate query languages, Sabetghadam et al. [8] investigates ways of “reranking” in particular as part of semantic search. These branches of results based on a graph traversal approach for multimodal research in particular focus on complex querying of struc- IR, that is hoped to be robust over different distributions tured text or data, whereas the graph search addresses also, of modalities. The use case of multimodal IR in a curated and perhaps primarily, the process of constructing series of data space with rich context presents a challenge, as different complex queries interactively. This is directly related to ex- features and scores on different modalities will be very dif- ploratory search and sense making. The graph structure ferently distributed in very different probability spaces. An provides natural facets for exploring the data, from a lo- application of Metropolis-Hastings as sampling/estimation cal point of view, allowing for a more dynamic structure method is suggested as (partial) solution. than traditional faceted search using rigid, global, hierarchi- Sakamoto et al. [9] investigate captioning or summarizing cal structure. This challenges our understanding of search results in highly curated graph data. Succint descriptions user interfaces design and evaluation, with search results are essential for effective graph exploration, and requires to moving from the found links, to the HIT page as snippets, take the context and structure into account. The paper dis- and now to query suggestion as previews of possible query cusses a particular graph of words, sentences and documents, extensions. and also touches upon semantic annotations, which would Graph Search has fundamental consequences for informa- move the document and text space to an entity space, with tion access and offers tremendous opportunities for building all documents and text linked to a particular category or new systems and tools that allow users to explore informa- entity. tion from many different angles, shifting control back to the Santisteban and Cárcamo [10] investigates a variant of the user. This is a radical departure from current systems where classic Tanimoto or Jaccard similarity measure able to deal the machine learning dominate the interaction: the entire in- with assymmetry in directed graphs and subsumption hier- formation space is determined by the user, and the user is in the driver’s seat when expressing her needs and exploring the space of options interactive. Acknowledgments This research is funded in part by the Netherlands Organization for Scientific Research (ExPoSe project, NWO CI # 314.99.108; DiLiPaD project, NWO Digging into Data # 600.006.014). References [1] O. Alonso and J. Kamps. Beyond graph search: Ex- ploring and exploiting rich connected data sets. In ICWE’15: Engineering the Web in the Big Data Era, volume 9114 of LNCS, pages 3–12. Springer, 2015. URL http://dx.doi.org/10.1007/978-3-319-19890-3_1. [2] O. Alonso, M. A. Hearst, and J. Kamps, editors. GSB’15: Proceedings of the SIGIR’15 Workshop on Graph Search and Beyond, 2015. CEUR-WS. URL http://ceur-ws.org/Vol-1393/. [3] K. W. Church and E. H. Hovy. Good applications for crummy machine translation. Machine Translation, 8:239–258, 1993. URL http://dx.doi.org/10.1007/ BF00981759. [4] M. Jadeja and K. Shah. Tree-map: A visualization tool for large data. In Alonso et al. [2], pages 9–13. URL http://ceur-ws.org/Vol-1393/. [5] S. Lim. Graph search at linkedin. In Alonso et al. [2], page 5. URL http://ceur-ws.org/Vol-1393/. [6] D. W. Oard. Good uses for crummy knowledge graphs. In Alonso et al. [2], page 6. URL http://ceur-ws.org/ Vol-1393/. [7] R. M. Philip. Personalized post search at facebook. In Alonso et al. [2], page 7. URL http://ceur-ws.org/ Vol-1393/. [8] S. Sabetghadam, M. Lupu, and A. Rauber. Leveraging metropolis-hastings algorithm on graph-based model for multimodal ir. In Alonso et al. [2], pages 14–18. URL http://ceur-ws.org/Vol-1393/. [9] K. Sakamoto, H. Shibuki, T. Mori, and N. Kando. Fu- sion of heterogeneous information in graph-based rank- ing for query-biased summarization. In Alonso et al. [2], pages 19–22. URL http://ceur-ws.org/Vol-1393/. [10] J. Santisteban and J. T. Cárcamo. Unilateral jaccard similarity coefficient. In Alonso et al. [2], pages 23–27. URL http://ceur-ws.org/Vol-1393/. [11] N. V. Spirin, J. He, M. Develin, K. G. Karahalios, and M. Boucher. People search within an online so- cial network: Large scale analysis of facebook graph search query logs. In CIKM’14, pages 1009–1018. ACM, 2014. URL http://doi.acm.org/10.1145/2661829. 2661967. [12] B. Tong, T. Yanase, H. Ozaki, and M. Iwayama. Infor- mation retrieval boosted by category for troubleshoot- ing search system. In Alonso et al. [2], pages 28–32. URL http://ceur-ws.org/Vol-1393/. [13] A. D. Wade. Overview of microsoft academic graph. In Alonso et al. [2], page 8. URL http://ceur-ws.org/ Vol-1393/. [14] Y. Yu, Z. Jiang, and X. Liu. Random walk and feedback on scholarly network. In Alonso et al. [2], pages 33–37. URL http://ceur-ws.org/Vol-1393/.