<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Detection of Online Learning Activity Scopes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Syeda Sana e Zainab</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mathieu D'Aquin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data Science Institute, National University of Ireland Galway</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>During last ten years, online learning has been on the ascent as the advantages of access, accommodation, and quality learning are beginning to take shape. A key challenge is to identify learning activities and recognize how they participate in the learner's progress. In this paper, we look at the way this problems becomes even more challenging when considering the full set of online activities carried out by a learner, as compared to what is achieved on speci c online platforms that are dedicated to learning. We in particular show how the integration of linked data-based information can help resolve the issue of representing activities for the purpose of identifying the key topics on which a learner is focusing, in a hierarchical clustering method. We apply this approach in the context of the AFEL project, and show how it performs in realistic use cases of an online learner's pro le, comparing general browsing with the use of a dedicated learning platform.</p>
      </abstract>
      <kwd-group>
        <kwd>eLearning Text mining Hierarchical Clustering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Online learning, often called \eLearning", is a new class of learning that has been
ourishing during the last ten years, following the growing adoption of eLearning
in universities, schools, and companies. Online learning provides a convenient
environment where instructors and students may access the course, study, and
perform interactive learning activities with fewer time and space restrictions.
However, the analysis of such online learning activities often requires systems
and applications using learning analytics and data mining approaches. In many
cases learning activities which are contributing to the learner's progress are not
well detected and are outside of these systems and applications.</p>
      <p>In the AFEL (Analytics for Everyday Learning) project1, we aim to address
this challenge by de ning a process for detecting online learning activities
enriched with speci c \topic-based" learning trajectories and visualize them in
an application that enables potential learners to re ect on their activities and
ultimately improve the way they focus their learning.</p>
      <p>The aim is to apply this to both online platforms dedicated to learning and
to general online activities, showing how learning happens in everyday activities.</p>
    </sec>
    <sec id="sec-2">
      <title>1 http://afel-project.eu</title>
      <p>Didactalia2 is the online social learning platform we use as a testbed. We can
directly connect to Didactalia through its API and retrieve metadata about its
resources, allowing us to classify them in di erent topics, and to assess their
value to the learning of the user.</p>
      <p>However, when generalising this approach to multiple platform, we cannot
guaranty that a meaningful description will be available for the resource
considered. To take an extreme case, we are testing the detection and assessment of
learning activities within the overall web browsing activities of a user. In this
case, we need to face issues related to the di erences in scale and velocity of the
data we received. We also need to nd ways to retrieve information that can help
understanding the content of resources and assess them with respect to learning.</p>
      <p>In this paper, we describe a method to identify key areas in which users of
online resources are learning, by: 1- enriching the resources considered based
on linked data, 2- clustering activities using this enriched information, and
3identify the clusters that are most representative of the user's learning through
learning indicators also using the enriched information.
2</p>
      <sec id="sec-2-1">
        <title>Related Work</title>
        <p>
          Research demonstrates that assessment of the student learning is a key to
success of any education system, and is \one of the main considerations separating
capable schools and instructors from incapable ones"[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. This also applies to
online learning, e.g Helic et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] contend that good online learning requires
monitoring of a student's progress with the material and testing of the gained
information and aptitudes all the time. Various measurements enable educators
to monitor the student input, feedback, and progress towards goals and is crucial
in online education [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Few works have been done to analyze learning through
a social media [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] but they are more focused towards the learning perspective of
social network rather than detecting learning aspects in online activities. Some
researches consider the combination of semantic web technologies with learning
activities analysis [
          <xref ref-type="bibr" rid="ref5 ref8">5,8</xref>
          ], but they are more focused towards the integration of
data.
        </p>
        <p>
          Ferguson and Buckingham Shum [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] proposed the concept of learning
activities analysis to speci cally capture the social interactions underlying the social
learning processes. Similarly, a model for mobile and ubiquitous learning
environments has been proposed by Aliohani and Davis [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] for the learning analysis.
A further action towards learning analysis methods which take into account the
amount of data produced by the learner's activities both in formal and
informal settings is o ered by the semantic web [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], that pointed out the practical
educational bene ts for teaching and learning. Some of the learning analytical
work has been focused on health care systems. Such systems [
          <xref ref-type="bibr" rid="ref17 ref19">17,19</xref>
          ] typically
employ sensors e.g. wearable sensors and visual sensors, collect a users activities
data such as eating or exercise and apply machine learning algorithms to identify
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2 https://didactalia.net</title>
      <p>
        activities progress. A signi cant work have been done in applications and tools
development [
        <xref ref-type="bibr" rid="ref11 ref12 ref13 ref20 ref9">9,20,11,13,12</xref>
        ] for learning activities analysis. However these
applications enable learners to track their progress on certain areas (games, music,
document writing etc) or communities (Players, Moodles etc).
      </p>
      <p>Among the diverse works that have been proposed on the analysis of learning
activities, a few has been focused on scopes detection in online learning
activities. Our work is focused towards the identi cation of online learning activities
scopes and develop an application that helps the learner in their online learning
progression.
3</p>
      <sec id="sec-3-1">
        <title>Motivation: The AFEL Project</title>
        <p>In several other areas than learning where self-directed activities are
prominent (e.g. tness), there has been a trend in recent years following the
technological development of tools for self-tracking. Those tools quantify a speci c
user's activities with respect to a certain goal (e.g. being physically t) to enable
self-awareness and re ection, with the purpose of turning them into behavioral
changes. While the actual bene ts of self-tracking in those areas are still
debatable, our understanding of how such approaches could bene t learning behaviors
as they become more self-directed remains very limited.</p>
        <p>AFEL is an European Horizon 2020 project which aim is to address both
the theoretical and technological challenges arising from applying learning
analytics in the context of online, social learning. The pillars of the project are the
technologies to capture large scale, heterogeneous data about learner's online
activities across multiple platforms (including social media) and the
operationalization of theoretical cognitive models of learning to measure and assess those
online learning activities. One of the key planned outcomes of the project is
therefore a set of tools enabling self-tracking on online learning by a wide range
of potential learners to enable them to re ect and ultimately improve the way
they focus their learning.</p>
        <p>Below is a speci c scenario considering a learner not formally engaged in a
speci c study program, but who is, in a self-directed and explicit way, engaged in
online learning. The objective is to describe in a simple way how the envisioned
AFEL tools could be used for self-awareness and re ection, but also to explore
what the expected bene ts of enabling this for users/learners are:</p>
        <p>Jane is 37 and works as an administrative assistant in a local medium-sized
company. As hobbies, she enjoys sewing and cycling in the local forests. She is
also interested in business management, and is considering either developing in
her current job to a more senior level or making a career change. Jane spends a
lot of time online at home and at her job. She has friends on Facebook with whom
she shares and discusses local places to go cycling, and others with whom she
discusses sewing techniques and possible projects, often through sharing YouTube
videos. Jane also follows MOOCs and forums related to business management,
on di erent topics. She often uses online resources such as Wikipedia and online
magazines. At school, she was not very interested in maths, which is needed
if she wants to progress in her job. She is therefore registered on Didactalia3,
connecting to resources and communities on maths, especially statistics.</p>
        <p>Jane has decided to take her learning seriously: She has registered to use
the AFEL dashboard through the Didactalia interface. She has also installed the
AFEL browser extension to include her browsing history, as well as the Facebook
app. She has not included in her dashboard her emails, as they are mostly related
to her current job, or Twitter, since she rarely uses it.</p>
        <p>Jane looks at the dashboard more or less once a day, as she is prompted by a
noti cation from the AFEL smart phone application or from the Facebook app,
to see how she has been doing the previous day in her online social learning. It
might for example say \It looks like you progressed well with sewing yesterday!
See how you are doing on other topics..." Jane, as she looks at the dashboard,
realizes that she has been focusing a lot on her hobbies and procrastinated on the
topics she enjoys less, especially statistics. Looking speci cally at statistics, she
realizes that she almost only works on it on Friday evenings, because she feels
guilty of not having done much during the week. She also sees that she is not
putting as much e ort into her learning of statistics as other learners, and not
making as much progress. She therefore makes a conscious decision to put more
focus on it. She adds new goals on the dashboard of the form \Work on
statistics during my lunch break every week day" or \Have achieved a 10% progress
compared to now by the same time next week". The dashboard will remind her
of how she is doing against those goals as she goes about her usual online social
learning activities. She also gets recommendations of things to do on Didactalia
and Facebook based on the indicators shown on the dashboard and her stated
goals.</p>
        <p>While this is obviously a ctitious scenario, it highlights the key challenges
faced by the project. The one speci cally addressed by this paper is the
identi cation of the topics on which the learning of the user is mostly focusing, so
that the activities related to those topics can be assessed in the context of the
relevant learning scope.
4</p>
      </sec>
      <sec id="sec-3-2">
        <title>Overview</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3 http://didactalia.net</title>
      <p>the learning scopes (the topics on which they are learning) and in assessing
their learning progress within those scopes. Based on all those information being
stored and indexed, the role of the GET API is to retrieve them and provide
to the AFEL application all the relevant data for a given user, including the
activities performed, in which learning scope they belong, and indicators of how
much they contribute to the learning trajectory of the learner.</p>
      <p>
        Although not the topic of this paper, a key challenge here is in identifying
the indicators that can support assessing the progress of a learner in a certain
learning scope. Based on theoretical work in educational psychology within the
AFEL project (see for example [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]), the approach taken is to try to recognize to
what extent encountering and processing a certain artifact (a resource) induced
learning, based on representing \frictions" that are bringing new knowledge or
new forms of knowledge to the learning. At the moment, we distinguish three
forms of \frictions", leading to three categories of indicators of learning (see [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]):
{ New concepts and topics: The simplest way in which we can think about
how an artifact could lead to learning is through its introduction of new
knowledge unknown to the learner. This is consistent with the traditional
\knowledge acquisition metaphor" of learning. In our scenario, this kind of
friction happens for example when Jane watches a video about a sewing
technique previously unknown to her. We call the indicator associated with
this form of friction coverage.
{ Increased complexity: While not necessarily introducing new concepts, an
artifact might relate to known concepts in a more complex way, where
complexity might relate to the granularity, speci city or interrelatedness with
which those concepts are treated in the artifact. In a social system, the
assumption of the co-evolution model is that the interaction between
individuals might enable such increases in understanding of the concepts being
considered through iteratively re ning them. In our scenario, this kind of
friction happens for example when Jane follows a statistics course which
is more advanced than the ones she had encountered before. We call the
indicator associated with this form of friction complexity.
{ New views and opinions: Similarly, known concepts might be introduced \in
a di erent light", through varying points of views and opinions enabling a
re nement of the understanding of the concepts treated. This is consistent
with the co-evolution model in the sense that it can be seen either as a
widening of the social system in which the learner is involved, or as the integration
into di erent social systems. In our scenario, this kind of friction happens
for example when Jane reads a critical review of a business management
methodology she has been studying. We call the indicator associated with
this form of friction diversity.
      </p>
      <p>While in the implementation of the AFEL application, all three indicators
are being used, we will here focus only on the rst one, as the two others are
the subject of speci c work. To simplify, we will therefore consider the problem
tackled as:</p>
      <p>How to identify learning scopes in a stream of activities, where learning scopes
represent a set of activities that cover a particular topic by the learner?</p>
      <p>While there are many ways in which this question could be answered, we
here consider the key requirements that the method should equally work in cases
where resources are already described with rich metadata (e.g. tags) and in cases
where we have no other information about the resources than their content. It
also needs to take into account the dynamic aspect of the scenario (that new
activities are constantly being added).</p>
      <p>We therefore describe in the next sections the three main steps of the method
represented in Figure 2.
5</p>
      <sec id="sec-4-1">
        <title>Enrichment</title>
        <p>The goal of the enrichment phase of the process described in Figure 2 is to extract
from the resources used in learning activities information about their content,
in the case such information is not already available. Indeed, in many cases if
we focus on speci c platforms, resources will be classi ed and associated with
subjects and topics. This is the case for example of Didactalia which associates
with each resource a set of tags, as provided by the users who contributed the
resources. Those tags can then be used to represent a general overview of the
content of the resource. However, when working with general online resources,
we cannot rely on the availability of such metadata.</p>
        <p>Using named entity recognition is a common approach to this problem. By
extracting from the textual content of the resource key entities and concepts that
are being mentioned, we can build a pro le of the resources that can potentially
be used to replace the tags available in existing platforms.</p>
        <p>We use DBpedia Spotlight4 as an o -the-shelf named entity recognition tool.
The advantage of DBpedia spotlight is that, practically, it is open source and
can be deployed locally, removing the need to rely on an external service. This
is especially important in our scenario as we need to process thousands of
potentially large texts for each user. Since it is based on Wikipedia, in addition,
the vocabulary of entities being covered is very wide and domain-independent,
with millions of entities being recognizable on all sorts of topics.</p>
        <p>The other advantage of DBpedia Spotlight is that the entity extracted are
part of the DBpedia Linked Dataset.5 DBpedia is a linked data version of
Wikipedia and, as such, also contains the taxonomy of categories of Wikipedia.
This is especially useful here since our goal is to group activities based on their
associated resources covering similar topics. To achieve this however, entities
extracted are insu cient. Indeed, resources might be on the same topic (e.g.
geography) without mentioning the same entities. Nevertheless, if they are indeed
of the same topic, it is more likely that the categories of the entities mentioned,
or their super-categories, will overlap.</p>
        <p>The goal of the generalization process is therefore to augment the pro le
created by named entity recognition by adding the categories that relate to the
entities found through named entity recognition to the pro le of each resource.
To make this e cient, it relies on an index of the \direct" categories of each
entity and on an index of the DBpedia category hierarchy. Each entity is then</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4 https://github.com/dbpedia-spotlight/dbpedia-spotlight</title>
    </sec>
    <sec id="sec-6">
      <title>5 http://dbpedia.org</title>
      <p>iteratively assigned categories at a higher level than its direct ones by climbing
up this hierarchy until a given level. This process is of course carried out o ine,
with the online process only executing named entity recognition, and adding the
pre-computed set of categories to the entity.</p>
      <p>It is important to note here that the pro le of a resource in this case is richer
than the set of tags that might be available in online platforms. It is not only
larger (the number of entities and categories found will generally be bigger than
the number of tags manually used to describe a resource), it also contains more
information, as multiple mentions of a given entity are counted, and multiple
references to a category are also taken into consideration. In this way, if more
than one entity are related to a given category, this category will have a stronger
weight during the clustering phase.
6</p>
      <sec id="sec-6-1">
        <title>Clustering</title>
        <p>The objective of the enrichment method described above is to build a pro le for
resources that the learner has been using that captures the key topics, so that
they could be grouped into general learning scopes. The next step is therefore to
cluster activities based on those pro les. Here too, there are many ways to cluster
based on the kind of resource pro les we have constructed in the previous step,
which very much represent vectors of term frequencies. However, considering the
nature of the scenario in which we are operating, we need to consider three key
requirements:
{ The clustering cannot be static: As the building the learning scopes is to be
carried out from the stream of user activities, it obviously cannot be built
in advance.
{ The clustering needs to be incremental : Very related to the previous
requirement, but adding that the clustering mechanism needs to be fast, the
approach is required to be incremental, i.e. new activities and resources need
to be added into an already constructed set of clusters, from previous
activities and resources. It is also important that the set of clusters do not evolve
dramatically due to the appearance of new activities and resources.
{ The number of clusters cannot be xed in advance : Many clustering
mechanisms take as a given parameter the target number of clusters. However, this
should in our case be automatically set, since we cannot decide in advance
for every user how many topics they are learning about.</p>
        <p>
          For those reasons, we adopted an incremental hierarchical clustering
approach. Hierarchical clustering [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] is a method by which clusters are formed
by progressively grouping items, creating a hierarchy of larger and larger
clusters. One advantage of hierarchical clustering is that it creates many clusters
of di erent sizes and levels, from which we can select afterwards based on their
properties (see next section), rather than having to choose a number of clusters
in advance.
        </p>
        <p>However, the basic algorithm for hierarchical clustering assumes that the
whole set of items to clusters are available. We therefore adapt it so that it can
work incrementally, and ful ll our two rst requirements. The basic algorithm
we use is described below and broad terms. It represents the function to add a
new item to an already existing cluster.</p>
        <p>function add(item, clusterset)
if clusterset is empty then
c = new cluster([item])
insert c in cluster set
else
find c the cluster in clusterset most similar to item
nc1 = new cluster(concat(c, item))
nc2 = new cluster(item)
p = parent(c)
remove c from children of p
add nc1 as child of p
add c as child of nc1
add nc2 as child of nc1</p>
        <p>By using this algorithm for every new items one after the other, a set of
clusters is built organized as a hierarchy. The advantage is that the clusterset
can be kept from one activity to the other, meaning that a new activity would
only have to be added to it, and will not be require large amounts of computation.
It is useful to note that
{ The similarity measure used can vary. In our initial tests, we use an
euclidean distance on the frequency vectors of terms (entities and categories)
as obtained above. Further tests with other similarity measures (e.g. based
on a cosine distance over TF.IDF vectors) are being conducted, giving better
results.
{ The creation of new a cluster computes a term frequency vector aggregating
the vectors of all included items using their average. Therefore, the similarity
is applied between those aggregated vectors and the original vector of the
item to be added.</p>
        <p>
          It is also useful to mention that this approach is not guarantied to obtain
the same clusterset as one that would have applied hierarchical clustering in a
non-incremental, static manner. Methods exist to achieve better approximations
than the proposed algorithm (see e.g. [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]) but those tend to be computationally
expensive. Also, as mentioned above, an advantage of this approach is that the
clusterset only changes in a very localized manner, keeping the established
structure mostly intact and is therefore less likely to lead to dramatically changing
results for the user.
        </p>
      </sec>
      <sec id="sec-6-2">
        <title>Cluster selection and labelling</title>
        <p>The last step to be considered is the selection of the set of clusters of activities
that are most representative of the learner's interests, and their labelling. Indeed,
we have obtained from the previous step a chusterset which is hierarchically
organised (it is, actually, a binary tree). The idea is to identify the ones that
seem to be most interesting from the point of view of the learning indicators
considered (see earlier description of the learning indicators).</p>
        <p>We therefore start by ranking the clusters in the clusterset according to a
score. While not taking into account the hierarchy, an important aspect of this
score is that it takes into account the temporal aspects of activities. Indeed, if
taking into account only the coverage indicator, as described above, the idea is
to measure for each activity how many new concepts (entities and categories)
it introduces into each of the clusters as a proportion of the concepts already
present there from previous activities. Taking that into account, we can compute
the average coverage of each cluster based on the included activities, e ectively
corresponding to a measure of how much, on average, an activity in this cluster
increases the coverage of the topics of the cluster. It is worth noting that, while
new activities (and therefore new clusters) a ect the score of certain clusters
and, obviously, their ranking based on this score, this can still be achieved
incrementally, i.e. we can update the scores of clusters based on new activities,
and update the ranking based on those changes without having to recompute it
every time.</p>
        <p>The results of the previous step is a ranked clusterset rankedcl, sorted in
descending order of the score considered. The selection process from there can be
reduced to selecting non-overlapping clusters that are highly ranked. To support
the non-overlapping property, we exploit the hierarchical relationship that exist
between clusters from the hierarchical clustering process, i.e. selected clusters
should be taken from distinct branches of the tree as per the algorithm below:
function select(rankedcl) returns selected
foreach c in ranked
select = true
foreach s in selected
if c is a (direct/indirect) child/parent of s</p>
        <p>select = false
if select</p>
        <p>insert c in selected</p>
        <p>This approach has the two advantages that, because of the non-overlapping
property of clusters being selected, we do not need to set a threshold for the
cluster score to be selected, since, once a cluster is selected, all overlapping
clusters are automatically removed from the candidate list. This means that the
results are a limited subset of all the clusters that are complementary in terms
of topic, and have the best coverage in score. This also means that clusters are
selected to represent learning activities, and further lters can be applied later
based on other indicators to emphasise the clusters related mostly to learning
activities.</p>
        <p>The nal results of this phase are a set of selected clusters grouping similar
activities that have on average a good contribution to the coverage of the topic of
the cluster, and characterised by a verctor of term frequencies from the entities
and categories in DBpedia. The last step is to label those clusters. For this, we
take the simple approach to choose as label the entity or category in the vector
with the greatest weight as label for a given cluster.
8</p>
      </sec>
      <sec id="sec-6-3">
        <title>Application in two case studies</title>
        <p>In order to test and validate the method described above, we apply it in two
di erent scenarios with di erent sources of data. Those two scenarios actually
rely on the same application, and on the same architecture depicted in Figure 1,
in the context of the AFEL project. These scenarios however di er in the data
sources they consider. In one case, the activity data is taken from an online social
platform, Didactalia, where there is a limited amount of activities relating to well
described resources, and clearly associated with learning. In the other case, the
data is taken from the general browsing history of the user, generating much
more and much more frequent activities, associated to resources that have no or
very little metadata, and which are not necessarily directly related to learning.</p>
        <p>Didactalia is an online, social learning platform. It includes more than 100,000
resources that are contributed and annotated by users, and shared within various
communities. The whole platform relies on linked data technologies, and enable
connecting resources with each other. We collect activity data from Didactalia
through using a javascript snippet similar in style to the one used by web
analytics tools such as Google Analytics or Piwik. The information about the resources
are provided through an RDF-based API on the Didactalia platform.</p>
        <p>The results of applying our application to Didactalia are shown in an example
in Figure 3. The application rst shows the detected learning scopes for the user
in a world cloud depending on the indicators considered. Each learning scope
can then be further explored, showing speci c indicators, recommendations, etc.
The detection of learning scopes here is therefore critical. While there are tens
of thousands of users or Didactalia, and we therefore collect millions of data
points over time, each user will only carry out between a few activities and in
the order of one hundred. The tags associated with the resources can be used
as the initial pro le vector for each activity/resource, possibly in combination
with the entities and categories from DBpedia. Considering the low number of
activities, the complexity of the process is less critical, although it needs to be
handled for a large number of users. Also, compared to the second scenario, each
of the learning scopes is more or less guarantied to be related to learning, since
they only capture activities that are carried out on a learning platform.</p>
        <p>On the other hand, the application of the our method to the user's whole
browsing history (shown in an example in Figure 4) has to deal with a lot of
activities for each users (up to thousands per day), with very little overlap and
not much description. The data in this case is collected through a specially
developed browser extension6 (compatible with browsers supporting web-ext).
The application functions very much in the same way, except that in this case the
learning scopes detected are much broader and have to be based on the entities
and categories from Dbpedia. In this case as well, the scale is much more of
an issue. The built hierarchical clusterset can have tens of thousands of nodes
for one user, and it would be unfeasible to rebuild it every time. It is therefore
important, as we have chosen to do, to update it incrementally every time a new
activity appears.</p>
        <p>While the accuracy of the learning scope detection is subjective, the
applications described above demonstrate that the proposed method, combining
incremental hierarchical clustering with activity pro les based on named entity
recognition and abstraction through DBpedia categories can indeed be applied
and adapts well to di erent contexts.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6 https://github.com/afel-project/browsing-history-webext</title>
      <sec id="sec-7-1">
        <title>Conclusion and Future Work</title>
        <p>In this paper, we described a method that can be used to dynamically and
e ciently detect the main topics a user is learning about (the learning scopes)
that can be used on applications such as the ones developed in the AFEL project
that rely on varied datasets, from small well-described ones, to large and open
ones for which metadata is not available. This method is currently deployed in
real-case scenarios within the AFEL project, and an evaluation is being carried
out of the bene ts they can generate with respect to the learning process of the
user, beyond the basic validation presented here through the two case studies
described.</p>
        <p>Naturally, the proposed method currently su ers from a number of
limitations which we intend to address as part of our future work. In particular, the
named entity recognition method employed currently, as well as the methods to
compute the learning indicators used in the applications other than the one
related to coverage, are currently designed only considering the English language.</p>
        <p>Alternatives and versions of the techniques and tools used are available for other
languages, but a language detection mechanism will be needed in order to be
able to direct the processing of a resource to the right version depending on the
main language of the textual component of that resource.</p>
        <p>This also raises a clear other limitation, i.e. that our method is dedicated to
online resources which content is mainly textual. This restricts its application
especially in the eLearning process where more and more multimedia resources
are being used. While this can be seen as a problem, methods are available to
retrieve textual description and content from many types of media that can be
used to alleviate this issue.</p>
        <p>Finally, as already mentioned, it appears obvious that the set of topics one
is mostly interested in learning about is a very subjective matter. We therefore
cannot expect for our method to ever be entirely accurate. Enabling
interaction with the constructed learning scopes and clusters seems therefore to be a
promising solution here. Indeed, we plan to integrate feedback from the user into
our approach, allowing them to mark certain clusters as \persistent" (i.e. they
should not disappear even if other clusters appear more important), irrelevant
(i.e. their activities should be either moved to other clusters or considered as
not interesting from the point of view of learning), merged (i.e. two learning
scopes might need to be combined) or as containing a di erent set of activities
(i.e. the user should be able to re-assign activities manually). The incremental
hierarchical clustering method described in this paper would therefore have to
be updated to be able to integrate those feedback.
10</p>
      </sec>
      <sec id="sec-7-2">
        <title>Acknowledgment</title>
        <p>This work has received funding from the European Union's Horizon 2020
research and innovation programme as part of the AFEL (Analytics for Everyday
Learning) project under grant agreement No 687916, and supported by the
Insight Centre for Data Analytics, funded by Science Foundation Ireland.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Naif R Aljohani and Hugh C Davis</surname>
          </string-name>
          .
          <article-title>Learning analytics in mobile and ubiquitous learning environments</article-title>
          .
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Carmichael</surname>
          </string-name>
          and
          <string-name>
            <given-names>Katy</given-names>
            <surname>Jordan</surname>
          </string-name>
          .
          <article-title>Semantic web technologies for education{ time for a turn to practice? Technology, Pedagogy</article-title>
          and Education,
          <volume>21</volume>
          (
          <issue>2</issue>
          ):
          <volume>153</volume>
          {
          <fpage>169</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Xin</given-names>
            <surname>Chen</surname>
          </string-name>
          , Mihaela Vorvoreanu, and
          <string-name>
            <given-names>Krishna</given-names>
            <surname>Madhavan</surname>
          </string-name>
          .
          <article-title>Mining social media data for understanding student's learning experiences</article-title>
          .
          <source>IEEE Transactions on Learning Technologies</source>
          ,
          <volume>7</volume>
          (
          <issue>3</issue>
          ):
          <volume>246</volume>
          {
          <fpage>259</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kathleen</surname>
          </string-name>
          J Cotton.
          <article-title>Monitoring student learning in the classroom</article-title>
          .
          <source>school improvement research series close-up# 4</source>
          .
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Mathieu d'Aquin and
          <string-name>
            <given-names>Nicolas</given-names>
            <surname>Jay</surname>
          </string-name>
          .
          <article-title>Interpreting data mining results with linked data for learning analytics: motivation, case study and directions</article-title>
          .
          <source>In Proceedings of the Third International Conference on Learning Analytics and Knowledge</source>
          , pages
          <volume>155</volume>
          {
          <fpage>164</fpage>
          . ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Mathieu</surname>
            <given-names>dAquin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alessandro</surname>
            <given-names>Adamou</given-names>
          </string-name>
          , Stefan Dietze, Besnik Fetahu, Ujwal Gadiraju, Ilire Hasani-Mavriqi,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Holtz</surname>
          </string-name>
          , Joachim Kimmerle, Dominik Kowald,
          <string-name>
            <given-names>Elisabeth</given-names>
            <surname>Lex</surname>
          </string-name>
          , et al. Afel:
          <article-title>Towards measuring online activities contributions to selfdirected learning</article-title>
          .
          <source>In Proceedings of EC-TEL 2017 workshop ARTEL</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Rebecca</given-names>
            <surname>Ferguson</surname>
          </string-name>
          and
          <article-title>Simon Buckingham Shum</article-title>
          .
          <article-title>Social learning analytics: ve approaches</article-title>
          .
          <source>In Proceedings of the 2nd international conference on learning analytics and knowledge</source>
          , pages
          <volume>23</volume>
          {
          <fpage>33</fpage>
          . ACM,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Fulantelli</surname>
          </string-name>
          , Davide Taibi, and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Arrigo</surname>
          </string-name>
          .
          <article-title>A semantic approach to mobile learning analytics</article-title>
          .
          <source>In Proceedings of the First International Conference on Technological Ecosystem for Enhancing Multiculturality</source>
          , pages
          <volume>287</volume>
          {
          <fpage>292</fpage>
          . ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Danita</given-names>
            <surname>Hartley</surname>
          </string-name>
          and
          <string-name>
            <given-names>Antonija</given-names>
            <surname>Mitrovic</surname>
          </string-name>
          .
          <article-title>Supporting learning by opening the student model</article-title>
          .
          <source>In International Conference on Intelligent Tutoring Systems</source>
          , pages
          <fpage>453</fpage>
          {
          <fpage>462</fpage>
          . Springer,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Denis</surname>
            <given-names>Helic</given-names>
          </string-name>
          , Hermann Maurer, and
          <string-name>
            <given-names>Nick</given-names>
            <surname>Scherbakov</surname>
          </string-name>
          .
          <article-title>Web based training: What do we expect from the system</article-title>
          .
          <source>In Proceedings of ICCE</source>
          , pages
          <volume>1689</volume>
          {
          <fpage>1694</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Riccardo</given-names>
            <surname>Mazza</surname>
          </string-name>
          and
          <string-name>
            <given-names>Vania</given-names>
            <surname>Dimitrova</surname>
          </string-name>
          .
          <article-title>Coursevis: A graphical student monitoring tool for supporting instructors in web-based distance courses</article-title>
          .
          <source>International Journal of Human-Computer Studies</source>
          ,
          <volume>65</volume>
          (
          <issue>2</issue>
          ):
          <volume>125</volume>
          {
          <fpage>139</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>Riccardo</given-names>
            <surname>Mazza</surname>
          </string-name>
          and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Milani</surname>
          </string-name>
          .
          <article-title>Gismo: a graphical interactive student monitoring tool for course management systems</article-title>
          .
          <source>In International Conference on Technology Enhanced Learning, Milan</source>
          , pages
          <fpage>1</fpage>
          <issue>{8</issue>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Colin</surname>
            <given-names>McCormack</given-names>
          </string-name>
          and
          <string-name>
            <given-names>David</given-names>
            <surname>Jones</surname>
          </string-name>
          .
          <article-title>Building a web-based education system</article-title>
          . John Wiley &amp; Sons, Inc.,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>Fionn</given-names>
            <surname>Murtagh</surname>
          </string-name>
          .
          <article-title>A survey of recent advances in hierarchical clustering algorithms</article-title>
          .
          <source>The Computer Journal</source>
          ,
          <volume>26</volume>
          (
          <issue>4</issue>
          ):
          <volume>354</volume>
          {
          <fpage>359</fpage>
          ,
          <year>1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>A.</given-names>
            <surname>Oeberst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kimmerle</surname>
          </string-name>
          , and
          <string-name>
            <given-names>U.</given-names>
            <surname>Cress</surname>
          </string-name>
          .
          <article-title>What is knowledge? who creates it? who possesses it? the need for novel answers to old questions</article-title>
          .
          <source>Mass collaboration and education</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. Lawrence C Ragan.
          <article-title>Good teaching is good teaching. an emerging set of guiding principles and practices for the design and development of distance education</article-title>
          . Cause/E ect,
          <volume>22</volume>
          (
          <issue>1</issue>
          ):
          <volume>20</volume>
          {
          <fpage>24</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Felipe Barbosa Araujo Ramos</surname>
            , Anne Lorayne, Antonio Alexandre Moura Costa, Reudismam Rolim de Sousa, Hyggo O Almeida, and
            <given-names>Angelo</given-names>
          </string-name>
          <string-name>
            <surname>Perkusich</surname>
          </string-name>
          .
          <article-title>Combining smartphone and smartwatch sensor data in activity recognition approaches: an experimental evaluation</article-title>
          .
          <source>In SEKE</source>
          , pages
          <volume>267</volume>
          {
          <fpage>272</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Arnaud</surname>
            <given-names>Ribert</given-names>
          </string-name>
          , Abdel Ennaji, and
          <string-name>
            <given-names>Yves</given-names>
            <surname>Lecourtier</surname>
          </string-name>
          .
          <article-title>An incremental hierarchical clustering</article-title>
          .
          <source>In Proceedings of the Vision Interface Conference</source>
          , pages
          <volume>586</volume>
          {
          <fpage>591</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Christian</surname>
            <given-names>Seeger</given-names>
          </string-name>
          , Alejandro Buchmann, and Kristof Van Laerhoven.
          <article-title>myhealthassistant: a phone-based body sensor network that captures the wearer's exercises throughout the day</article-title>
          .
          <source>In Proceedings of the 6th International Conference on Body Area Networks</source>
          , pages
          <fpage>1</fpage>
          <article-title>{7</article-title>
          .
          <string-name>
            <surname>ICST</surname>
          </string-name>
          (
          <article-title>Institute for Computer Sciences, SocialInformatics</article-title>
          and Telecommunications Engineering),
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Diego</surname>
            Zapata-Rivera and
            <given-names>Jim E</given-names>
          </string-name>
          <string-name>
            <surname>Greer</surname>
          </string-name>
          .
          <article-title>Exploring various guidance mechanisms to support interaction with inspectable learner models</article-title>
          .
          <source>In International Conference on Intelligent Tutoring Systems</source>
          , pages
          <fpage>442</fpage>
          {
          <fpage>452</fpage>
          . Springer,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>