<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Basic Level Concepts in a Linked Data Graph to Detect User's Domain Familiarity</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Marwan Al-Tawil, Vania Dimitrova, Dhavalkumar Thakker School of Computing, University of Leeds</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We investigate how to provide personalized nudges to aid a user's exploration of linked data in a way leading to expanding her domain knowledge. This requires a model of the user's familiarity with domain concepts. The paper examines an approach to detect user domain familiarity by exploiting anchoring concepts which provide a backbone for probing interactions over the linked data graph. Basic level concepts studied in Cognitive Science are adopted. A user study examines how such concepts can be utilized to deal with the cold start user modelling problem, which informs a probing algorithm.</p>
      </abstract>
      <kwd-group>
        <kwd>Linked data</kwd>
        <kwd>knowledge utility</kwd>
        <kwd>user modeling</kwd>
        <kwd>basic level concepts</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The recent growth of the Web of Linked Data1(LD), which provides access to big data
graphs representing domain entities and their relationships, has opened a new avenue
of research on developing computational models to facilitate data exploration by
layman users [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This has brought together research from Semantic Web and HCI to
shape novel tools for interactive exploration of semantic data2. One of the key
challenges is ensuring that the interaction with linked data brings benefits for the users.
Hence, personalization and adaptation can play a crucial role. Research in
personalized exploration of linked data is still in an embryonic stage. Current work includes
improving search efficiency by considering user interests [
        <xref ref-type="bibr" rid="ref4 ref7">4, 7</xref>
        ] or diversifying the
user exploration paths with recommendations based on the browsing history [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Our research brings a new dimension of personalization and adaptation to enhance
the benefits of linked data exploration, namely knowledge utility. We investigate how
to aid a user’s exploration of linked data in a way leading to expanding her domain
knowledge. This can have a broad implication for facilitating sense making while
exploring linked data. Learning is an inevitable part of exploratory search, as users are
discovering new connections and associations. Our earlier research has shown that
although linked data exploration can promote domain knowledge expansion
(‘serendipitous learning’ effect), not all paths can be beneficial. We derived empirically
strategies to nudge the user to beneficial paths [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The user familiarity with the
entities in the linked data graph (LDG) was identified as a crucial input for the nudging
strategies we aim to develop – profitable exploration sequences include a start
(anchor) in a familiar entity followed by bringing a new (unexpected/interesting) entity.
      </p>
      <p>
        Identifying the user familiarity with the domain entities (domain concepts or
instances) from the LDG is not a trivial task because LDGs usually include thousands
of entities at different levels of abstraction. This brings forth the well-known cold
start problem of user modelling, which is aggravated by the sheer number of LDG
entities. One way to address cold start is via a probing dialogue. While LDG can
provide a knowledge pool to implement a probing dialogue for user modelling (c.f. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]),
it is not clear what domain entities to select from the vast amount of possibilities for
probing. Consequently, the interactions can be too long and may refer to entities that
do not bring high value for modelling a user’s domain familiarity.
      </p>
      <p>
        This paper examines an approach to detect user domain familiarity by using
anchoring concepts in the LDG around which a probing dialogue can be developed. We
adopt the Cognitive Science notion of basic level concepts (BLCs) – domain concepts
that are highly informative and can be easily retrieved from memory. An example of a
basic level concept in the Music domain is Guitar [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]; it has Musical Instrument as a
superordinate concept (more abstract) and Classical Guitar as a subordinate concept
(more specific). BLCs are likely to provide knowledge bridges to learn new concepts
in big information spaces and to serve as indicators for user modelling. Cognitive
science research has shown that the use of BLCs may indicate domain familiarity, e.g.
experts tend to recognise subordinate concepts [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        To get insights into how BLCs can be utilized to identify user's domain familiarity,
we conduct a user study that adopts earlier Cognitive Science methods which derive
BLCs in a specific domain [
        <xref ref-type="bibr" rid="ref3 ref5">3, 5</xref>
        ] to identify the BLCs in a LDG. Based on the study,
we derive heuristics how BLCs can be related to user domain familiarity. We then
suggest a user modelling probing algorithm that utilizes BLCs.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Identifying Basic Level Concepts in a Linked Data Graph</title>
      <p>We conducted a user study to examine how BLCs in a LDG can be utilized to model a
user’s domain familiarity.</p>
      <sec id="sec-2-1">
        <title>2.1. Study Design</title>
        <p>
          Dataset.We have used a dataset from the music domain which underpins a linked
data browser (MusicPinta) developed by us in an earlier research [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. The MusicPinta
LDG is fairly large and diverse, yet of manageable size for experimentation. It
contains 2.4M entities and 38M triple statements, and includes facts about 876 musical
instruments from various categories, including many country-specific instruments.
Musical instruments, which have been used by Cognitive Science studies in BLC,
provide a suitable domain for cognitive activities linked with BLCs [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
Participants. The study involved 40 participants recruited on a voluntary basis,
varied in Gender (28 male and 12 female), cultural background (1 Belgian, 10 British, 5
Bulgarian, 1 French, 1 German, 5 Greek, 1 Indian, 2 Italian, 6 Jordanian, 1 Libyan , 2
Malaysian, 1 Nigerian, 1 Polish, and 3 Saudi Arabian), and age (18 – 55, mean = 25).
Method. We follow the experimental set up in earlier Cognitive Science studies
which derived BLCs using free-naming tasks in a specific domains [
          <xref ref-type="bibr" rid="ref3 ref5">3, 5</xref>
          ], including
the Music domain. Participants were asked to freely name objects that were shown in
image stimuli, under limited response time (10s). 364 taxonomical musical
instruments were extracted from the MusicPinta dataset by running SPARQL queries from
the MusicPinta SPARQL endpoint to get all musical instrument concepts linked via
the rdfs:subClass relationship. The musical instrument concepts were classified either
into leaf (l) instruments (total=265) or category (c) instruments (total= 108). Leaf
instruments are found at the bottom of a hierarchy and do not have children, whereas
category instruments have at least one child. For each leaf instrument l, a
representative image (stimuli) was collected from the Musical Instrument Museums Online
(MIMO)3 and Wikipedia4. For a category c, all leafs from that category were shown
as a group. Following the Cognitive Science studies, additional objects, outside the
domain, were included to minimize response bias - 64 additional images were
randomly chosen from the most occurring concepts in artificial and natural categories
from the Battig and Montague category norms [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], including vehicles, clothing,
furniture, tools, fruits, vegetables, animals and birds.
        </p>
        <p>Ten online surveys were run adopting two strategies: (i) Strategy 1 – leaf
instruments: eight surveys presented the leaf instruments – each survey presented 32 leaves
and 8 additional images. (ii) Strategy 2 – category instruments: two surveys
presentedthe category instruments- each survey showed 54 categories and 14 additional
images. The image allocation in surveys was random. Every survey had 4
participants; each participant conducted one survey following an online link, including:
• Pre-task questionnaire-collecting information about user profile (e.g.
agegroup, nationality, and gender).
• Free-naming task- Each image was shown for 10 seconds on the participant's
screen and he/she was asked to type the name of the given object(s) in the
image as quickly as possible. Figures 1-4 show example instrument images and
participant’s answers from the study. For this task, we recorded accuracy (i.e.
the participant answered correctly) and frequency (i.e. how many times a
particular instrument name was mentioned correctly) of their accurate answers.
• Post-task questionnaire- collected information about the participant's
familiarity level for the six top level musical instrument categories (String
Instruments, Wind Instruments, Percussion Instruments, Electronic Instruments, and Other
Instruments). Participants were asked to rate their knowledge in these
categories on a scale of 1 to 7 (1=No Knowledge and 7=Expert).</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Basic Level Concepts Identified</title>
        <p>
          To extract BLCs from the MusicPinta dataset we considered accuracy and frequency
of the participants' answers [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], grouping the answers into:
• Group1: Naming a leaf instrument with its category instead of its own name. In
this group, we calculated the frequency of exact matches between the
partici3http://www.mimo-international.com/MIMO/
4Wikipedia images were used only in the cases when a MIMO image did not exist.
•
•
pants' answers and the category of instruments seen. For example, as shown in
Fig.1, a participant has named the leaf instrument Violotta with its parent category
Violin. We counted how many times Violin was named when its leaves were seen.
Group 2:Exact naming of categories. In this group, we considered the cases
when participants were able to exactly name the category of the instrument they
saw, e.g.Fig. 3 shows a response where the category Violins was seen and named.
Group 3: Naming a category level instrument with its parent or children
instrument name. This is illustrated in Fig. 2 and Fig. 4 - the participant saw a category
level instrument (Fiddle and Plucked String Instruments)and named its parent (Violin
and String Instruments, respectively).
        </p>
        <p>Fig1. Leaf Violotta seen, named as Violin.</p>
        <p>Fig2. Category Fiddle seen, named as Violin.</p>
        <p>Fig3. Category Violins seen, named as Violins.</p>
        <p>Fig4. Category Plucked String Instruments seen,
named as String Instruments.</p>
        <p>In each of the groups, LDG entities with frequency above 2 (i.e. they were named by
at least two users) were included. The entities identified in Group 1 are derived from
Strategy 1, while Group 2 and Group 3 give complementary output for the LDG
entities derived by Strategy 2 (i.e. when the participants saw categories of instruments).
Hence, the union of Group 2 and Group 3 gives the likely BLCs identified with
Strategy 2, which is then intersected with the output from Strategy 1 to obtain the final
BLC list. This included: Accordion, Bells, Bouzouki, Clarinet, Drums, Flute, Guitars,
Harmonica, Harp, Saxophone, String instruments, Trumpet, Violins and Xylophone.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Using Basic Level Concepts for User Modeling</title>
      <p>We compared the user survey answers and user's familiarity for the six top level
musical instrument categories. The findings were used to derive probing heuristics.</p>
      <p>When the user is able to name an instance (leaf instrument as seen in Strategy 1)
instead of its corresponding BLC, she has high familiarity in the corresponding top
level category. There were 27 cases (out of 41) where the participants could name leaf
instruments rather than using their BLCs (as the majority of users did). For example, a
participant named the leaf Electric cello instead of naming its BLC Violin. In 67% of
these cases users had high familiarity with the top level instrument category.</p>
      <p>When the user successfully names children of a basic level concept from images of
the corresponding categories (as seen in Strategy 2), she has high familiarity in the
corresponding top level category. There were 34 cases where the participants named
children that belonged to the basic level. For example, one participant named the child
Cello instead of naming it with its BLC Violin. In 62% of these cases participants had
high familiarity with the top level instrument category.</p>
      <p>When the user cannot name a basic level concept from the corresponding BLC
images (as seen in Strategy 2), she has low familiarity in the corresponding top level
category. There were 11 cases (out of 64) where participants were shown a BLC and
were unable to name it. In these cases, the participants had indicated low or no
knowledge with the top level instrument category.</p>
      <p>When the user can name a basic level concept from the corresponding images for
the BLC category (as seen in Strategy 2), she is likely to have high familiarity with the
corresponding top level category.There were 43 (out of 64) cases where participants
were shown a BLC and named it correctly. In half (58%) of these cases the
participants had high familiarity with the top level instrument category of the BLC.</p>
      <p>Based on the above heuristics, we propose a probing algorithm based on BLC.</p>
      <sec id="sec-3-1">
        <title>Input</title>
        <p>Domain - a linked data graph
G = (V , E) whereV = {v1,v2,...,vn}
Set of Images I = {i1,i2,...,in} and a function
image :V → I assigning an image i to each
vertex v
Set of Basic level concepts:
B = {b1,b2,...,bk}
User diagnosis is a mapping that
overlays G’s vertices with a familiarity
level – none, low, medium, high.
familiarity :V → W whereV = {v1,v2,...,vn} and
W = {none,low, medium, high}
For every vertex from G, we define the
following functions which are
implemented with simple inferences using
hierarchical relationships:
- P(v) – returns all parent concepts
for v
- C(v) – returns all children concepts
(including the leafs) for v
- L(v) – returns the leaves
(instances) for v
- T(v) – top level categories for v</p>
      </sec>
      <sec id="sec-3-2">
        <title>Output</title>
        <p>
          User model U = V ×W where d :V → W
Processing
//initialization
for all v ∈V do d(v) = none
for all b ∈ B do //BLC naming
show image(b) and ask to name it
if user_answer≠ b do //cannot name BLC
familiarity(b) = none
for all t ∈T (b) familiarity(t) = low
else do //names BLC
familiarity(b) = medium
for all t ∈T (v) familiarity(t) = medium
In this work, we examine the advantage of using basic level concepts to detect user
familiarity in a linked data graph. The user study identifies the BLCs in a Music
domain in a free-naming task and illustrates how these concepts can be utilized to detect
the user familiarity with a subset of entities from the LDG. Obviously, these findings
can only be applied if it is possible to automatically detect BLCs from a LDG.
Following the Cognitive Science definition of BLCs–domain concepts that carry the most
information, possess the highest category cue validity, and are, thus, the most
differentiated from one another are highly informative [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]-we have implemented eight
algorithms for extracting BLCs from the LDG. The algorithms search for basic categories
at the most inclusive level at which attributes are common to most categories'
members and basic categories which are most differentiated from other categories
(categories with highest cue validity, i.e. their members have attributes common to the
category and not belonging to other categories). We have implemented appropriate
SPARQL queries over the MusicPinta dataset adopting several semantic relationships
and similarity measures. The set of BLCs identified in the study is used as a ‘ground
truth’ to benchmark the algorithms. Current results show that the best performing
algorithms achieve precision of 0.48, which is promising but insufficiently high.
        </p>
        <p>
          Our immediate future work is to tune the BLC algorithms and explore various
fusion methods to improve the precision results. We will then be able to implement the
probing algorithm and utilise it in developing the nudging strategies derived in [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Van</given-names>
            <surname>Overschelde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            ,
            <surname>Rawson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. A.</given-names>
            , &amp;
            <surname>Dunlosky</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Category norms: An updated and expanded version of the Battig and Montague (1969) norms</article-title>
          .
          <source>Journal of Memory and Language</source>
          ,
          <year>2004</year>
          ,
          <volume>50</volume>
          ,
          <fpage>289</fpage>
          -
          <lpage>335</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Thakker</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimitrova</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lau</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang-Turner</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Despotakis</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>Assisting User Browsing over Linked Data: Requirements Elicitation with a User Study</article-title>
          .
          <source>In proceedings of ICWE</source>
          <year>2013</year>
          , pp.
          <fpage>376</fpage>
          -
          <lpage>383</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Rosch</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mervis</surname>
            ,
            <given-names>C. B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gray</surname>
            ,
            <given-names>W. D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johnson</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Boyes-Braem</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <article-title>Basic objects in natural categories</article-title>
          .
          <source>Cognitive Psychology</source>
          ,
          <year>1976</year>
          ,
          <volume>8</volume>
          ,
          <fpage>382</fpage>
          -
          <lpage>439</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Sah</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Wade</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <article-title>Personalized Concept-based Search and Exploration on the Web of Data using Results Categorization</article-title>
          .
          <source>In ESWC</source>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Tanaka</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Taylor</surname>
          </string-name>
          , M.
          <article-title>Object Categories and Expertise: Is the Basic Level in the Eye of the Beholder?</article-title>
          <source>Cognitive Psychology</source>
          ,
          <year>1991</year>
          ,
          <volume>23</volume>
          (
          <issue>3</issue>
          ).
          <fpage>457</fpage>
          -
          <lpage>482</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.DhavalThakker, Lydia Lau, Ronald Denaux, VaniaDimitrova, Paul Brna, Christina M.
          <article-title>Steiner:Using DBpedia as a Knowledge Source for Culture-Related User Modelling Questionnaires</article-title>
          .
          <source>UMAP</source>
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Rossel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <article-title>Implemention of a “search and browse” scenario for theLinkedData</article-title>
          .
          <source>In Intelligent Exploration of Linked Data (IESD)</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Vocht1</surname>
          </string-name>
          , et, al. A
          <article-title>Visual Exploration Workflow as Enablerfor the Exploitation of Linked Open Data</article-title>
          .
          <source>In Intelligent Exploration of Linked Data (IESD)</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>MC</given-names>
            <surname>Schraefel</surname>
          </string-name>
          ,
          <article-title>What does it look like, really? Imagining how citizens might effectively, usefully and easily find, explore, query and re-present open/linked data</article-title>
          .
          <source>In ISWC</source>
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Al-Tawil</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thakker</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Dimitrova</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <article-title>Nudging to Expand User's Domain Knowledge while Exploring Linked Data</article-title>
          .
          <source>In Intelligent Exploration of Linked Data (IESD)</source>
          ,
          <year>2014</year>
          , @
          <fpage>ISWC2014</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Palmer</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hennessy</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Unze</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Pick</surname>
            ,
            <given-names>A. D.</given-names>
          </string-name>
          <article-title>How is a trumpet known? the “basic object level” concept and the perception of musical instruments</article-title>
          .
          <source>American Journal of Psychology</source>
          ,
          <volume>102</volume>
          ,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>