=Paper= {{Paper |id=Vol-1628/DC_3 |storemode=property |title=Intelligent Nudging to Support Interactive Exploration of a Data Graph |pdfUrl=https://ceur-ws.org/Vol-1628/DC_3.pdf |volume=Vol-1628 |authors=Marwan Al-Tawil |dblpUrl=https://dblp.org/rec/conf/ht/Al-Tawil16 }} ==Intelligent Nudging to Support Interactive Exploration of a Data Graph== https://ceur-ws.org/Vol-1628/DC_3.pdf
                    Intelligent Nudging to Support Interactive
                          Exploration of Big Data Graphs
                                                          Marwan Al-Tawil
                                          School of Computing, University of Leeds, UK
                                                      scmata@leeds.ac.uk

ABSTRACT                                                              effectiveness, motivation, knowledge expansion) [12, 19].
This research investigates how to support the user exploration        Specifically, we focus on knowledge utility – how useful a
through big data graphs. Current successful approaches to             trajectory in a data graph is to expand one’s knowledge in the
interactive exploration take into account the utility from a user’s   domain. Earlier research has acknowledged that data graph
point of view. In this PhD, we are focusing on knowledge utility –    exploration can promote expansion of domain knowledge through
how useful the trajectories in a data graph are for expanding         serendipitous learning (e.g. users discover concepts or
user’s domain knowledge. The main goal of this research is to         relationships they were unaware of) [5, 15]. However, not all
design intelligent nudging techniques to direct the user to ‘good’    paths are beneficial for knowledge expansion, and ways for
trajectories for knowledge expansion. Our earlier work                identifying ‘good’ trajectories are required.
investigating empirical nudging strategies for users exploration,     Identifying good trajectories in a data graph requires anchoring
suggests that paths which start with familiar and highly inclusive    entities that serve as knowledge bridges to learn new things.
entities and bring something unfamiliar are likely to increase the    Our earlier work has acknowledged that when the user explores
learning effect of users exploration. This direct us to investigate   familiar entities, nudging should direct the user to explore
subsumption theory for meaningful learning and adopt it as our        something new [15]. This calls to investigate the subsumption
underpinning theoretical model to generate good trajectories,         theory for meaningful learning [6]. According to this theory, to
where familiar and highly inclusive entities are used as knowledge    incorporate new knowledge, the most familiar and inclusive
anchors to bring and learn new knowledge. This calls for              entities in the user’s cognition are used as knowledge anchors to
developing an automatic approach to identify knowledge anchors        subsume and learn new knowledge. Subsequently the new
in a data graph. We follow an analogy with basic level objects in     knowledge can take on meaning by becoming anchored with
domain taxonomies that underpin our automated approach for            the basic concepts in the user’s cognitive structures.
identifying knowledge anchors. Several metrics for extracting         However, identifying knowledge anchors in big data graphs is not
knowledge anchors in a data graph are developed and examined.         a trivial task and brings forth various research challenges,
                                                                      including: dealing with larger number of entities, from 100s of
CCS Concepts                                                          entities in a typical ontology versus millions of entities in a typical
G.2.2 [Mathematics of Computing]: Graph Theory – Graph                data graph, and the need to exploit large number of data instances
algorithms; H.1.2, 4.7 [Information systems]: Data management         in the data graph.
systems - Graph-based database models, WWW - RDF.                     The broader challenge of this PhD is: to design intelligent
                                                                      nudging techniques to direct the user to ‘good’ trajectories
Keywords                                                              through big data graphs for knowledge expansion. To meet this
Big Data graphs; interactive exploration; knowledge utility;          challenge, we address two research questions:
knowledge anchors.                                                    Question 1: How to develop automatic ways to identify data
                                                                      graph entities that provide knowledge anchors for navigation
1. INTRODUCTION AND MOTIVATION                                        paths? This question can be seen as focusing on the Cognitive
With the emergence and the growing rates of RDF Linked Data           Science notion of basic level objects1 [7], to develop metrics to
graphs, many applications take advantage of the exploration of the    automatically identify knowledge anchors in a data graph.
knowledge encoded in the graphs to support users’ interactive         Question 2: How can we use knowledge anchors in a data graph
exploration [3, 4, 17]. Consequently, more and more big data          to design navigation paths for expanding users domain
graphs are being exposed to users for exploratory search tasks        knowledge? In the second question, subsumption theories will be
such as learning or investigating, where the users usually discover   adopted to nudge the user through navigation paths. We aim to
new connections and associations [12]. Layman users who are           maximize the serendipitous learning through bringing the users
engaged in exploratory search sessions will usually have no (or       first to the anchoring entities and then direct them to new and
limited) familiarity with the specific domain and little (or no)      interesting concepts at different levels of abstraction in the graph.
awareness of the encoded knowledge in the graph. In other words,
the users’ cognitive structures about the domain may not match
the semantic structure of the data graph. This can provide major
                                                                      2. RELATED WORK
obstacles to interactive exploration, especially when the users       Recent research on exploratory search through linked data graphs
need to learn new things, resulting into confusion and frustration.   has been examining different ways to provide intelligent support
                                                                      for users’ navigation. This has brought together research from the
This research aims to support users' interactive exploration in big
data graphs through directing the users to trajectories that can      1
                                                                          The term “basic level objects” has been used in Cognitive Science. Other
bring some benefit (utility) for the users (e.g. efficiency,
                                                                          developments, e.g. Formal Concept Analysts, call them “concepts.
Semantic Web, personalization, HCI and Cognitive Science to                  meaningful learning [6]. Two subsumption strategies, the
shape novel tools for interactive exploration of semantic data [14].         subordinate and the super-ordinate strategies, will be used
Work on personalized exploration includes improving search                   [24]. On the one hand, subordinate strategy can be used to
efficiency by considering user interests [8, 9, 17] or diversifying          direct the user to explore unfamiliar members linked to
the user exploration paths with recommendations based on the                 anchoring entities in the data graph (i.e. nudge to explore
browsing history [10]. Extracting semantic patterns from linked              subclass entities of an anchor). n the other hand, the
data sources to improve diversity in recommendation results to               super-ordinate strategy will be used when the anchoring
users has been proposed in [18]. Diversity is measured based on              entities are members of new unfamiliar and more inclusive
the semantic distance of topics and genres of the results. The work          entities (i.e. nudge to explore superclass entities of an anchor).
in [13] presents an approach to rank RDF statements with the
expectation that some statements will be more valuable or              4. CURRENT OUTCOMES
interesting to users than other statements within some context.        We formally describe and implement the metrics and the
A wide range of tools for offering interactive exploration using       corresponding algorithms for identifying knowledge anchors [1].
linked data technologies can be found in a recent survey [12].         The metrics were implemented by running SPARQL queries over
Our work brings a new dimension to this research stream by             the MusicPinta data graph stored in a triplestore [5]. The
looking at the knowledge utility of the exploration path. We           performance of the algorithms is examined using benchmarking
hypothesize that the cognitive learning theory of ‘meaningful          sets with basic level entities identified by humans, corresponding
learning’ [6] can be used to design paths with high knowledge          to the cognitive structures humans form on the part of the world
utility, where new knowledge is subsumed under familiar and            represented in the data graph. A free-naming tasks based user
highly inclusive abstract entities. To identify knowledge anchors      study in the music domain using the MusicPinta data graph, was
in a data graph, we operationalize the notion of basic level objects   carried out to identify the benchmarking sets. This resulted in two
[7]. The problem of extracting important concepts from                 such benchmarking sets, StrongAnchors set that includes entities
information spaces using the notion of basic level objects has         closest to the human cognitive structures, and WeakAnchors set
been tackled by two approaches, ontology summarization [2, 11]         that includes entities people are likely to recognize when they are
and in Formal Concept Analysis (FCA) [22, 23]. These                   on the lower level abstraction in the graph. Based on quantitative
approaches utilize basic level objects with the aim of identifying     and qualitative analysis, the strengths and limitations of each
key concepts to help domain experts in understanding and               metrics are assessed, and a hybridization approach is proposed.
reengineering of an ontology or a concept lattice respectively.
In our work, we apply the notion of basic level objects in a data
                                                                       5. CONCLUSION AND FUTURE WORK
                                                                       Interactive data exploration is becoming a key daily life activity.
graph to identify anchoring entities which are likely to be familiar
                                                                       It involves exploring large amount of data and deciding where to
to layman users who are not domain experts. The formal
                                                                       go next in the graph. The success of big data graphs to support
framework that maps Rosch’s definitions of basic level objects
                                                                       interactive exploration brings forth the challenge of building
and cue validity [7] to data graphs is a major contribution of our
                                                                       intelligent approaches to nudge the user through beneficial paths
work. We are unique in our use for these anchoring entities to
                                                                       with high knowledge utility. This emphasizes the importance of
support interactive exploration of a data graph.
                                                                       identifying anchoring entities in the graph that can be used to
                                                                       subsume and learn new knowledge.
3. PROPOSED APPROACH
We follow two approaches to address the two research questions         Moving forward, The immediate future work is to apply the
in this work, respectively:                                            metrics for identifying knowledge anchors in another domain. The
                                                                       INSPIRE2 data graph about career options will be used to identify
1. Develop automatic ways to identify data graph entities that         anchoring career points, that can be used in assisting the users in
   can be used as knowledge anchors for navigation paths. We           identifying paths that will be beneficial for expanding their
   achieve this objective by adopting the notion of basic level        awareness of their career options, including short or longer-term
   objects which was introduced in Cognitive Science research          career paths. In the long run, we aim to develop nudging
   [7], illustrating that domain taxonomies include category           techniques using the subsumption strategies. An initial probing
   objects which are at the basic level of abstraction. Basic level    algorithm can be used to identify users familiarity [20] (i.e. use
   categories “carry the most information, possess the highest         knowledge anchors to identify new or interesting entities for the
   category cue validity, and are, thus, the most differentiated       users). Another option is to have a quick probing at every
   from one another” [7]. We adopt two approaches to identify          knowledge anchor during the exploration. A user study will be
   knowledge anchors: distinctiveness approach, that is based on       conducted to evaluate the navigation strategies. Random paths
   the formal definition of cue validity, to identify the most         will be our base-line for evaluating the navigation strategies.
   differentiated basic categories whose attributes are associated
   amongst the category members but are not associated to              The impact of this work is not limited to support data graph
   members of other categories; and homogeneity approach to            exploration. It can be also applied to ontology summarization,
   identify basic categories whose members share many                  where anchoring entities allow capturing a lay person’s view of
   attributes together. The homogeneity approach is                    the domain. Also, knowledge anchors can be used to initiate
   complementary with the distinctiveness feature. A basic             a dialog to solve the ‘cold start' problem in personalization
   category object with high cue validity will have high number        and adaptation.
   of entities common to its members.
2. Develop navigation paths using knowledge anchors. To                2
   achieve this objective we adopt subsumption strategies for              INSPIRE is a system under development in Birkbeck, University of
                                                                           London, about career guidance domain, particularly career transitions.
6. REFERENCES                                                         [13] Dean, M., Basu, P., Carterette, B., Partridge, C. and Hendler,
[1] Marwan Al-Tawil, Vania Dimitrova, Dhavalkumar Thakker                  J. What to Send First? A Study of Utility in the Semantic
    and Brandon Bennett. Identifying Knowledge Anchors in a                Web. In (LHD+SemQuant), 2012, @ ISWC2012..
    Data Graph. In HT2016, Halifax, Canada.                           [14] Waitelonis, J., Knuth, M., Wolf, L., Hercher, J., and H. Sack.
[2] Troullinou, G., Kondylakis, H., Daskalaki , E., Plexousakis ,          The Path is the Destination-Enabling a New Search
    D. RDF Digest: Efficient Summarization of RDF/S KBs. In                Paradigm with Linked Data. In LD in the Future Internet @
    ESWC, 2015.                                                            Future Internet Assembly, 2010.

[3] Popov, I.O., Schraefel, M., Hall, W., Shadbolt, N.                [15] Al-Tawil, M., Thakker, D. and Dimitrova, V. Nudging to
    Connecting the Dots: A Multi-pivot approach to data                    Expand User’s Domain Knowledge while Exploring Linked
    exploration. In ISWC, 2011.                                            Data. In (IESD), 2014, @ ISWC2014.

[4] Thellmann, K., Galkin, M., Orlandi, F., and Auer, S.              [16] Tanaka, J., & Taylor, M. Object Categories and Expertise: Is
    LinkDaViz – Automatic Binding of Linked Data to                        the Basic Level in the Eye of the Beholder? Cognitive
    Visualizations. In ISWC, 2015.                                         Psychology, 1991, 23, pp 457-482.

[5] Thakker, D., Dimitrova, V., Lau, L., Yang-Turner, F. &            [17] Marie, N., Corby, O., Gandon, F. and Ribiere, M. Composite
    Despotakis, D. Assisting User Browsing over Linked Data:               interests’ exploration thanks to on-the-fly linked data
    Requirements Elicitation with a User Study. In ICWE 2013               spreading activation. In Hypertext 2013.

[6] Ausubel, D. A Subsumption Theory of Meaningful Verbal             [18] Maccatrozzo , V., Aroyo , L., Robert , R. Crowdsourced
    Learning and Retention. In Journal of General Psychology.              Evaluation of Semantic Patterns for Recommen-dations. In
    Volume 66, Issue 2, 1962, pp. 213-224.                                 UMAP 2013. LBR.

[7] Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M.,            [19] Nunes, T., Schwabe, D. Exploration of Semi-Structured Data
    &Boyes-Braem, P. Basic objects in natural categories.                  Sources. In (IESD), 2014, @ ISWC2014.
    Cognitive Psychology, 1976, 8, 382-439.                           [20] Al-Tawil, M., Dimitrova, V., Thakker, D. Using Basic Level
[8] Sah, M. & Wade, V. Personalized Concept-based Search and               Concepts in a Linked Data Graph to Detect User's Domain
    Exploration on the Web of Data using Results                           Familiarity. In UMAP2015, Dublin, Ireland.
    Categorization. In ESWC 2013.                                     [21] Thakker, D., Despotakis, D., Dimitrova, V., Lau, L., Brna, P.
                                                                           (2012). Taming digital traces for informal learning: A
[9] Rossel,O. Implemention of a “search and browse” scenario               semantic-driven approach. In Proceedings of EC-TEL 2012.
    for the LinkedData. In IESD, 2014.
                                                                      [22] Belohlavek, R., Trnecka, M. Basic Level in Formal Concept
[10] Vocht1, et, al. A Visual Exploration Workflow as Enablerfor           Analysis: Interesting Concepts and Psychological
     the Exploitation of Linked Open Data. In IESD, 2014.                  Ramifications. In IJCAI 2013.
[11] Peroni, S., Motta, E., d'Aquin, M. Identifying key concepts      [23] Belohlavek, R., Trnecka, M. Basic level of concepts in
     in an ontology through the integration of cognitive principles        formal concept analysis. In ICFCA, 2012, pp 28-44.
     with statistical and topological measures. In ASWC, 2008.
                                                                      [24] Ausubel D., Novak, J., Hasian, H. Educational Psychology: a
[12] Marie, N., Gandon, F. Survey of linked data based                     cognitive view - Rinehart Winston, New York, 1978.
     exploration. In IESD@ISWC2014.