Overview of Graph Search and Beyond

                               Omar Alonso                            Marti A. Hearst                 Jaap Kamps
                               Microsoft                                  UC Berkeley            University of Amsterdam
                            Mountain View, CA                             Berkeley, CA                 Amsterdam
                                  USA                                         USA                   The Netherlands


ABSTRACT                                                                             1.   INTRODUCTION
Modern Web data is highly structured in terms of entities                               Information on the Web is increasingly structured in terms
and relations from large knowledge resources, geo-temporal                           of entities and relations from large knowledge resources, geo-
references and social network structure, resulting in a mas-                         temporal references and social network structure, resulting
sive multidimensional graph. This graph essentially uni-                             in a massive multidimensional graph. This graph essentially
fies both the searcher and the information resources that                            unifies both the searcher and the information resources that
played a fundamentally different role in traditional IR, and                         played a fundamentally different role in traditional IR, and
“Graph Search” offers major new ways to access relevant                              offers major new ways to access relevant information. In ser-
information. Graph search affects both query formulation                             vices that rely on personalized information like social net-
(complex queries about entities and relations building on                            works, the graph plays an even more important role, in other
the searcher’s context) as well as result exploration and dis-                       words: you are the query.
covery (slicing and dicing the information using the graph                              Graph search affects both query formulation as well as
structure) in a completely personalized way. This new graph                          result exploration and discovery. On the one hand, it al-
based approach introduces great opportunities, but also great                        lows for incrementally expressing complex information needs
challenges, in terms of data quality and data integration,                           that triangulate information about multiple entities or entity
user interface design, and privacy.                                                  types, relations between those entities, with various filters
   We view the notion of “graph search” as searching in-                             on geo-temporal constraints or the sources of information
formation from your personal point of view (you are the                              used (or ignored), and taking into account the rich profile
query) over a highly structured and curated information                              and context information of the searcher (and his/her peers,
space. This goes beyond the traditional two-term queries                             and peers of peers, etc). On the other hand, it allows for
and ten blue links results that users are familiar with, re-                         more powerful ways to explore the results from various as-
quiring a highly interactive session covering both query for-                        pects and viewpoints, by slicing and dicing the information
mulation and result exploration. The workshop attracted                              using the graph structure, and using the same structure for
a range of researchers working on this and related topics,                           explaining why results are retrieved or recommended, and
and made concrete progress working together on one of the                            by whom.
greatest challenges in the years to come.                                               This new graph based information seeking approach in-
                                                                                     troduces great opportunities, but also great challenges, both
Categories and Subject Descriptors                                                   technical ranging from data quality and data integration to
                                                                                     user interface design, as well as ethical challenges in terms of
H.3.3 [Information Storage and Retrieval]: Information                               privacy; transparency, bias and control; and avoiding the so-
Search and Retrieval—Query formulation, Search process,                              called filter bubbles. Graph search is already available today
Selection process                                                                    in many flavors with different levels of interactivity. Social
                                                                                     network-based services like Facebook and LinkedIn provide
Keywords                                                                             flexibility to search their personal network form many di-
                                                                                     verse angles. Web search engines like Google and Bing rely
Graph search; Semantic search; Personalization; Exploration;
                                                                                     more on using graphs to show related content as a mecha-
Query suggest
                                                                                     nism to include other possible contexts for a given query.
                                                                                     Clearly, it is not limited to web, and can be applied to
                                                                                     other highly structured data. Just to give an example, the
                                                                                     hansards or parliamentary proceedings are fully public data
                                                                                     with a clear graph structure linking every speech to the re-
                                                                                     spective speaker, their role in parliament and their political
                                                                                     party. Graph search allows to explore politics from the view-
                                                                                     point of individual members of parliament or government.
                                                                                        At a high level, graph search seems limited to familiar
Copyright c 2015 for the individual papers by the papers’ authors. Copying permit-
ted for private and academic purposes. This volume is published and copyrighted by   entity types (e.g., Facebook entities) and templates. How
its editors.                                                                         far can this scale? Will this work on truly open domains?
SIGIR Workshop on Graph Search and Beyond ’15 Santiago, Chile                        There is a huge potential to use the graph to go beyond rec-
Published on CEUR-WS: http://ceur-ws.org/Vol-1393/.
ommendations for new friends and contacts or semantically                ther aspects? How can we augment query autocomple-
related content. Unlocking the potential of richer knowledge             tion to actively prompt user to interactively construct
sources for new search strategies requires us to think outside           longer queries exploring different aspects?
the box, by combining different insights from IR, semantic         Result Exploration There is a radical shift towards the
search, data integration, query expansion and user interfaces          control of the searcher—small changes in the query can
to name a few.                                                         lead to radically different result sets—how can we sup-
   The rest of this report is structured in the following way.         port active exploration of slices of the data to explore
Section 2 discusses the open research questions raised by              further aspects? Unlike traditional facetted search op-
searching highly structured data from a personal point of              tions, the result space is highly dynamic, how can we
view. Next, in Section 3) we discuss the four keynotes who             provide adaptive exploration options tailored to the
helped frame the problems and reach a shared understand-               context and searcher, at every stage of the process?
ing of the issues involved amongst all workshop attendees.
Rose Marie Philip talked about personalized post search at         Evaluation How do we know the system is any good? How
Facebook, Swee Lim about graph search at Linkedin, Doug                to evaluate the overall process, given its personalized
Oard about good uses for crummy knowledge graphs, and                  and interactive nature? Can we rely on the direct
Alex Wade about Microsoft academic graph. Section 4 dis-               evaluation of query suggestions and query recommen-
cusses the six contributed papers, which were presented in a           dations? Are there suitable behavioral criteria for in
boaster and poster session. Finally, Section 5 provides pre-           the wild testing, such as longer queries, multiple fil-
liminary discussion of the results and progress made during            ters, longer dwell-time, more active engagement, more
the workshop.                                                          structured-query templates? Can we use are stan-
                                                                       dard experimental evaluation methods from HCI and
                                                                       UI/UX design?
2.   OPEN RESEARCH QUESTIONS
  We view the notion of “graph search” as searching in-            Privacy Access to personal data is fraught ethical and pri-
formation from your personal point of view (you are the                 vacy concerns, is there is similarly structured public
query) over a highly structured and curated information                 data for scientific research? As an extreme form of
space. This goes beyond the traditional two-term queries                personalization, how to avoid the uncanny cave, filter
and ten blue links results that users are familiar with, re-            bubbles and echo chambers? How ethical is it to priv-
quiring a highly interactive session covering both query for-           ilege a particular query refinement suggestion over the
mulation and result exploration.                                        many other possible candidates?
  This raises many open questions:                                 Further discussion on the challenges of graph based search
                                                                   can be found in [1].
IR Theory What happens if search gets personal? Does
    this break the classic dichotomy between users and
    documents, as users are nodes in the social network            3.    KEYNOTES
    data themselves? What is the consequence of ultimate             Four invited speakers helped frame the problems and reach
    personalization, as the local graph differs for all users?     a shared understanding of the issues involved amongst all
    As the local graph structure is key, does this obviate         workshop attendees.
    the need for large central indexes? Do these types of
    requests fit in the classic paradigm (e.g., Broder’s tax-      3.1    Personalized Post Search at Facebook
    onomy)? How does this shift the balance between the               The opening keynote was given by Rose Marie Philip
    control of the searcher and the ranker over the result         (Facebook) on “personalized post search at Facebook” [7].
    set?                                                              There are over a billion people and over a trillion posts on
Data Integration Building a knowledge graph requires mas-          Facebook. Among these posts, there are uniquely personal-
    sive data integration at many levels: are there trade-         ized answers to many search queries. The goal of Facebook
    offs in simplicity and level of detail (such as the classic    post search is to help people find the most personally rele-
    knowledge representation trade-off)? What levels of            vant posts for each individual query, tailored to the content
    granularity and comprehensiveness are needed for ef-           of people’s networks. In this talk, I will present some of our
    fective deployment? What quality is needed: is any             work to build a search product that uses personalized graph
    noise acceptable? How to deal with near duplicate de-          signals in ranking. I will also give an overview of query
    tection, conflation, or entity disambiguation?                 modification, posts retrieval and ranking of results.

Use Cases and Applications Rather than a universal so-             3.2    Graph Search at Linkedin
    lution, graph search is particularly useful for specific          The second keynote in the morning was given by Swee
    types of information needs and queries. What are               Lim (Linkedin) on “graph search at Linkedin” [5].
    the data and tasks that make graph search works?                  Linkedin is the largest professional social network. Linkedin’s
    What kind of scenarios that would benefit from a graph         graph and search systems help our users discover other users,
    model? In what context can switching perspectives by           jobs, companies, schools, and relevant professional informa-
    showing results from the vista of other persons useful?        tion. I will present the evolution of these systems, how they
Query formulation How to move from singular queries to             support current use cases, their strengths and weaknesses,
    highly interactive sessions with multiple variant queries?     our next generation systems, and how we intend to leverage
    What new tools are needed to help a searcher construct         these systems to perform graph searches.
    the appropriate graph search query using refinements
    or filters to better articulate their needs, or explore fur-   3.3    Good Uses for Crummy Knowledge Graphs
  The first keynote in the afternoon was given by Doug Oard       archies. Similarity measures are central in IR, and related
(University of Maryland) on “good uses for crummy knowl-          distance measures central in graph data. The discussion is
edge graphs” [6].                                                 motivated by a use case of “paradigmatic” structures.
  In 1993, Ken Church and Ed Hovy suggested that be-                 Tong et al. [12] investigate category and word relation
fore we ask how well some new technology meets the need           graphs for retrieving trouble shooting information/documents,
we envision for it, we should pause and first reflect on the      addressing the classical IR problem of human assigned con-
question of whether – now that we know something about            trolled terms versus document free text in the context of a
what can be built – we are envisioning the right uses for         curated data space and rich context (at least in principle).
what we have. They titled their paper “Good Applications          The paper offers an interesting graph approach is outlined,
for Crummy Machine Translation” [3]. At about that same           mapping terms to categories, for both requests and docu-
time, information retrieval researchers obliged them by (gen-     ments. Making this graph level explicitly available to users
erally without having read their paper) starting to work          offers interesting new possibilities, and opens up ways to
on cross-language information retrieval; arguably the best        map the noisy term occurrence space to the curated, con-
application for crummy machine translation ever invented.         cept and entity based space of the category codes. Hence
Now we have some crummy knowledge graphs – and this               this paves the way to a semantic, entity based view.
time we have read the Church and Hovy paper – so perhaps             Yu et al. [14] investigates the strength of connections in an
the time is right for us to ask whether we have yet envisioned    entity graph, specifically a scholarly network with a rich en-
good uses for crummy knowledge graphs. In this talk, I will       tity graph available as public data. This is an interesing use
seek to seed that discussion.                                     case with a curated data space and rich context, plus an in-
                                                                  teresting dynamic structure over time. The paper proposes
3.4    Overview of Microsoft Academic Graph                       to take the strength (or weakness) of connections into ac-
  The second afternoon keynote was given by Alex Wade             count — here as simulated blind feedback — turning a net-
(Microsoft Research) with an “overview of Microsoft aca-          work into a weighted network of the simple graph into a
demic graph” [13]                                                 valued graph.
  The Microsoft Academic Graph is a heterogeneous graph
containing scientific publication records, citation relation-
ships between those publications, as well as authors, insti-      5.   CONCLUSIONS
tutions, journals and conference “venues” and fields of study.
                                                                     The workshop brought together researchers from a range
                                                                  of areas in information access, who worked together on search-
4.    ACCEPTED PAPERS                                             ing information from your personal point of view over a
   We requested the submission of short, 4–5 page papers to       highly structured and curated information space. One of
be presented as boaster and poster. We accepted a total           the main lines of discussion was the considerable industrial
of 6 papers out of 8 submissions after peer review (a 75%         activity around social graphs. The most famous example
acceptance rate).                                                 is Facebook Graph Search, a feature that allows users to
   Jadeja and Shah [4] investigate data driven ways to visu-      perform more sophisticated searches on their social network
alize and navigate graph or tree structured data. Navigating      [11]. Bing has been integrating Facebook into their web
or traversing highly curated graph data is an understudied        search results for the last couple of years. Similarly, Google
problem, and hierarchical or tree visualizations can help cre-    has been annotating search results with Google+ profiles.
ate order and overview. When visualizing the data from the        And all the rest of the search industry is moving in the
viewpoint of a particular node makes any graph data (such         same direction.
as social network data) look like a tree with the starting           There are also crucial links with work on searching struc-
node (a person with all context) as point of origin.              tured data, and work on the appropriate query languages,
   Sabetghadam et al. [8] investigates ways of “reranking”        in particular as part of semantic search. These branches of
results based on a graph traversal approach for multimodal        research in particular focus on complex querying of struc-
IR, that is hoped to be robust over different distributions       tured text or data, whereas the graph search addresses also,
of modalities. The use case of multimodal IR in a curated         and perhaps primarily, the process of constructing series of
data space with rich context presents a challenge, as different   complex queries interactively. This is directly related to ex-
features and scores on different modalities will be very dif-     ploratory search and sense making. The graph structure
ferently distributed in very different probability spaces. An     provides natural facets for exploring the data, from a lo-
application of Metropolis-Hastings as sampling/estimation         cal point of view, allowing for a more dynamic structure
method is suggested as (partial) solution.                        than traditional faceted search using rigid, global, hierarchi-
   Sakamoto et al. [9] investigate captioning or summarizing      cal structure. This challenges our understanding of search
results in highly curated graph data. Succint descriptions        user interfaces design and evaluation, with search results
are essential for effective graph exploration, and requires to    moving from the found links, to the HIT page as snippets,
take the context and structure into account. The paper dis-       and now to query suggestion as previews of possible query
cusses a particular graph of words, sentences and documents,      extensions.
and also touches upon semantic annotations, which would              Graph Search has fundamental consequences for informa-
move the document and text space to an entity space, with         tion access and offers tremendous opportunities for building
all documents and text linked to a particular category or         new systems and tools that allow users to explore informa-
entity.                                                           tion from many different angles, shifting control back to the
   Santisteban and Cárcamo [10] investigates a variant of the    user. This is a radical departure from current systems where
classic Tanimoto or Jaccard similarity measure able to deal       the machine learning dominate the interaction: the entire in-
with assymmetry in directed graphs and subsumption hier-          formation space is determined by the user, and the user is
in the driver’s seat when expressing her needs and exploring
the space of options interactive.
Acknowledgments This research is funded in part by the
Netherlands Organization for Scientific Research (ExPoSe
project, NWO CI # 314.99.108; DiLiPaD project, NWO
Digging into Data # 600.006.014).
References
 [1] O. Alonso and J. Kamps. Beyond graph search: Ex-
     ploring and exploiting rich connected data sets. In
     ICWE’15: Engineering the Web in the Big Data Era,
     volume 9114 of LNCS, pages 3–12. Springer, 2015. URL
     http://dx.doi.org/10.1007/978-3-319-19890-3_1.
 [2] O. Alonso, M. A. Hearst, and J. Kamps, editors.
     GSB’15: Proceedings of the SIGIR’15 Workshop on
     Graph Search and Beyond, 2015. CEUR-WS. URL
     http://ceur-ws.org/Vol-1393/.
 [3] K. W. Church and E. H. Hovy. Good applications
     for crummy machine translation. Machine Translation,
     8:239–258, 1993. URL http://dx.doi.org/10.1007/
     BF00981759.
 [4] M. Jadeja and K. Shah. Tree-map: A visualization tool
     for large data. In Alonso et al. [2], pages 9–13. URL
     http://ceur-ws.org/Vol-1393/.
 [5] S. Lim. Graph search at linkedin. In Alonso et al. [2],
     page 5. URL http://ceur-ws.org/Vol-1393/.
 [6] D. W. Oard. Good uses for crummy knowledge graphs.
     In Alonso et al. [2], page 6. URL http://ceur-ws.org/
     Vol-1393/.
 [7] R. M. Philip. Personalized post search at facebook. In
     Alonso et al. [2], page 7. URL http://ceur-ws.org/
     Vol-1393/.
 [8] S. Sabetghadam, M. Lupu, and A. Rauber. Leveraging
     metropolis-hastings algorithm on graph-based model
     for multimodal ir. In Alonso et al. [2], pages 14–18.
     URL http://ceur-ws.org/Vol-1393/.
 [9] K. Sakamoto, H. Shibuki, T. Mori, and N. Kando. Fu-
     sion of heterogeneous information in graph-based rank-
     ing for query-biased summarization. In Alonso et al. [2],
     pages 19–22. URL http://ceur-ws.org/Vol-1393/.
[10] J. Santisteban and J. T. Cárcamo. Unilateral jaccard
     similarity coefficient. In Alonso et al. [2], pages 23–27.
     URL http://ceur-ws.org/Vol-1393/.
[11] N. V. Spirin, J. He, M. Develin, K. G. Karahalios,
     and M. Boucher. People search within an online so-
     cial network: Large scale analysis of facebook graph
     search query logs. In CIKM’14, pages 1009–1018. ACM,
     2014. URL http://doi.acm.org/10.1145/2661829.
     2661967.
[12] B. Tong, T. Yanase, H. Ozaki, and M. Iwayama. Infor-
     mation retrieval boosted by category for troubleshoot-
     ing search system. In Alonso et al. [2], pages 28–32.
     URL http://ceur-ws.org/Vol-1393/.
[13] A. D. Wade. Overview of microsoft academic graph. In
     Alonso et al. [2], page 8. URL http://ceur-ws.org/
     Vol-1393/.
[14] Y. Yu, Z. Jiang, and X. Liu. Random walk and feedback
     on scholarly network. In Alonso et al. [2], pages 33–37.
     URL http://ceur-ws.org/Vol-1393/.