<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Knowledge Base for Personal Information Management</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thomas Pellissier Tanon LTCI</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Télécom ParisTech ttanon@enst.fr</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>David Montoya Square Sense</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Fabian M. Suchanek LTCI</institution>
          ,
          <addr-line>Télécom ParisTech</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Pierre Senellart DI ENS, CNRS, PSL Research University &amp; Inria Paris &amp; LTCI</institution>
          ,
          <addr-line>Télécom ParisTech</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Serge Abiteboul Inria Paris &amp; DI ENS, CNRS, PSL Research University</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <abstract>
        <p>Internet users have personal data spread over several devices and across several web systems. In this paper, we introduce a novel open-source framework for integrating the data of a user from diferent sources into a single knowledge base. Our framework integrates data of diferent kinds into a coherent whole, starting with email messages, calendar, contacts, and location history. We show how event periods in the user's location data can be detected and how they can be aligned with events from the calendar. This allows users to query their personal information within and across diferent dimensions, and to perform analytics over their emails, events, and locations. Our system models data using RDF, extending the schema.org vocabulary and providing a SPARQL interface.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>Internet users commonly have their personal data spread over
several devices and services. This includes emails, messages, contact
lists, calendars, location histories, and many other. However,
commercial systems often function as data traps, where it is easy to
check information in but dificult to query and exploit it. For
example, a user may have all her emails stored with an email provider
– but cannot find out which of her colleagues she interacts most
frequently with. She may have all her location history on her phone
– but cannot find out which of her friends’ places she spends the
most time at. Thus, a user often has paradoxically no means to
make full use of data that she has created or provided. As more and
more of our lives happen in the digital sphere, users are actually
giving away part of their life to external data services.</p>
      <p>We aim to put the user back in control of her own data. We
introduce a novel framework that integrates and enriches personal
information from diferent sources into a single knowledge base
(KB) that lives on the user’s machine, a machine she controls. Our
system, Thymeflow, replicates data of diferent kinds from outside
services and thus acts as a digital home for personal data. This
provides the user with a high-level global view of that data, which
she can use for querying and analysis. All of this integration and
analysis happens locally on the user’s computer, thus guaranteeing
her privacy.</p>
      <p>Designing such a personal KB is not easy: Data of completely
diferent nature has to be modeled in a uniform manner, pulled into
the knowledge base, and integrated with other data. For example,
we have to find out that the same person appears with diferent
email addresses in address books from diferent sources. Standard
KB alignment algorithms do not perform well in our scenario, as we
show in our experiments. Furthermore, integration spans data of
diferent modalities: to create a coherent user experience, we need
to align calendar events (temporal information) with the user’s
location history (spatiotemporal) and place names (spatial).</p>
      <p>
        We provide a fully functional and open-source personal
knowledge management system. A first contribution of our work is the
management of location data. Such information is becoming
commonly available through the use of mobile applications such as
Google’s Location History [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. We believe that such data becomes
useful only if it is semantically enriched with events and people in
the user’s personal space. We provide such an enrichment.
      </p>
      <p>A second contribution is the adaptation of ontology alignment
techniques to the context of personal KBs. The alignment of persons
and organizations is rather standard. More novel are alignments
based on time (a meeting in the calendar and a GPS location), or
space (an address in contacts and a GPS location).</p>
      <p>Our third contribution is an architecture that allows the
integration of heterogeneous personal data sources into a coherent whole.
This includes the design of incremental synchronization, where
a change in a data source triggers the loading and treatment of
just these changes in the central KB. Conversely, the user is able to
perform updates on the KB, which are made persistent wherever
possible in the sources. We also show how to integrate knowledge
enrichment components into this process, such as entity resolution
and spatio-temporal alignments.</p>
      <p>As implemented, our system can provide answers to questions
such as: Who have I contacted the most in the past month (requires
alignments of diferent email addresses)? How many times did I go
to Alice’s place last year (requires alignment between contact list
and location history)? Where did I have lunch with Alice last week
(requires alignment between calendar and location history)?</p>
      <p>
        Our system, Thymeflow, was previously demonstrated in [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ].
It is based on an extensible framework available under an
opensource software license1. People can therefore freely use it, and
researchers can build on it.
      </p>
      <p>
        We first introduce out data model and sources in Section 2, and
then present the system architecture of Thymeflow in Section 3.
1https://github.com/thymeflow/thymeflow
Section 4 details our knowledge enrichment processes, and Section 5
our experimental results. Related work is described in Section 6.
Before concluding in Section 8, we discuss lessons learnt while
building and experimenting with Thymeflow in Section 7.
to associate it to geo-coordinates and richer place semantics. The
Facebook Graph API [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] also models events the user is attending
or interested in, with richer location data and list of attendees (a
list of names).
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>DATA MODEL</title>
      <p>In this section, we briefly describe the schema of the knowledge
base, and discuss the mapping of data sources to that schema.</p>
      <p>
        Schema. We use the RDF standard [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] for knowledge
representation. We use the namespace prefixes schema for http://schema.
org/, and rdf and rdfs for the standard namespaces of RDF and
RDF Schema, respectively. A named graph is a set of RDF triples
associated with a URI (its name). A knowledge base (KB) is a set of
named graphs.
      </p>
      <p>For modeling personal information, we use the schema.org
vocabulary when possible. This vocabulary is supported by Google,
Microsoft, Yahoo, and Yandex, and documented online. Wherever
this vocabulary is not fine-grained enough for our purposes, we
complement it with our own vocabulary, that lives in the namespace
http://thymeflow.com/personal# with prefix personal.</p>
      <p>Figure 1 illustrates a part of our schema. Nodes represent classes,
rounded colored ones are non-literal classes, and an edge with
label p from X to Y means that the predicate p links instances
of X to instances of type Y . We use locations, people,
organizations, and events from schema.org, and complement them with
more fine-grained types such as Stay, EmailAddress, and
PhoneNumber. Person and Organization classes are aggregated into a
personal:Agent class.</p>
      <p>
        Emails and contacts. We treat emails in the RFC 822 format [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
An email is represented as a resource of type schema:Email with
properties such as schema:sender, personal:primaryRecipient,
and personal:copyRecipient, which link to personal:Agent
instances. Other properties are included for the subject, the sent and
received dates, the body, the attachments, the threads, etc.
      </p>
      <p>Email addresses are great sources of knowledge. An email
address such as “jane.doe@inria.fr” provides the given and family
names of a person, as well as her afiliation. However, some email
addresses provide less knowledge and some almost none, e.g.,
“j4569@gmail.com”. Sometimes, email fields contain a name, as
in “Jane Doe &lt;j4569@gmail.com&gt;”, which gives us a name triple. In
our model, personal:Agent instances extracted from emails with
the same combination of email address and name are considered
indistinguishable (i.e., they are represented by the same URI). An
email address does not necessarily belong to an individual; it can
also belong to an organization, as in edbt-school-2013@imag.fr or
fancy_pizza@gmail.com. This is why, for instance, the sender, in
our data model, is a personal:Agent, and not a schema:Person.</p>
      <p>
        A vCard contact [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ] is represented as an instance of
personal:Agent with properties such as schema:familyName, and
schema:address. We normalize telephone numbers, based on a
country setting provided by the user.
      </p>
      <p>
        Calendar. The iCalendar format [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] can represent events. We
model them as instances of schema:Event, with properties such
as name, location, organizer, attendee, and date. The location is
typically given as a postal address, and we will discuss later how
Location history. Smartphones are capable of tracking the user’s
location over time using diferent positioning technologies:
satellite navigation, Wi-Fi, and cellular. Location history applications
continuously run in the background, and store the user’s location
either locally or on a distant server. Each point in the user’s location
history is represented by time, longitude, latitude, and horizontal
accuracy (the measurement’s standard error). We use the Google
Location History format, in JSON, as Google users can easily export
their history in this format. A point is represented by a resource
of type personal:Location with properties schema:geo, for
geographic coordinates with accuracy, and personal:time for time.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>SYSTEM ARCHITECTURE</title>
      <p>
        A personal knowledge base could be seen as a view defined over
personal information sources. The user would query this view in a
mediation style [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and the data would be loaded only on demand.
However, accessing, analyzing and integrating these data sources
on the fly would be expensive tasks. For this reason, Thymeflow
uses a warehousing approach. Data is loaded from external sources
into a persistent store and then enriched.
      </p>
      <p>Thymeflow is a web application that the user installs, providing it
with a list of data sources, together with credentials to access them
(such as tokens or passwords). The system accesses the data sources
and pulls in the data. All code runs locally on the user’s machine.
None of the data leaves the user’s computer. Thus, the user remains
in complete control of her data. The system uses adapters to access
the sources, and to transform the data into RDF. We store the data
in a persistent triple store, which the user can query using SPARQL.</p>
      <p>One of the main challenges in the creation of a personal KB is
the temporal factor: data sources may change, and these updates
should be reflected in the KB. Changes can happen during the initial
load time, while the system is asleep, or after some inferences have
already been computed. To address these dynamics, Thymeflow
uses software modules called synchronizers and enrichers. Figure 2
shows synchronizers on the left, and enrichers in the center.
Synchronizers are responsible for accessing data sources, enrichers
(see Section 4) for inferring new statements, such as alignments
between entities obtained by entity resolution.</p>
      <p>
        Modules are scheduled dynamically and may be triggered by
updates in the data sources (e.g., calendar entries) or by new pieces
of information derived in the KB (e.g., the alignment of a position in
the location history with a calendar event). The modules may also be
started regularly for particularly costly alignment processes. When
a synchronizer detects a change in a source, a pipeline of enricher
modules is triggered, as shown in Figure 2. Enrichers can also
use knowledge from external data sources, such as Wikidata [
        <xref ref-type="bibr" rid="ref44">44</xref>
        ],
Yago [
        <xref ref-type="bibr" rid="ref41">41</xref>
        ], or OpenStreetMap.
      </p>
      <p>Synchronizer modules are responsible for retrieving new data
from a data source. For each data source that has been updated, the
adapter for that particular source transforms the source updates
since last synchronization into a set of insertions/deletions in RDF.
dateTime</p>
      <sec id="sec-3-1">
        <title>EmailMessage</title>
      </sec>
      <sec id="sec-3-2">
        <title>EmailAddress PhoneNumber</title>
        <p>dateReceived/
dateSent
esntadrDtDataete/
text/
headline
string</p>
      </sec>
      <sec id="sec-3-3">
        <title>Stay</title>
      </sec>
      <sec id="sec-3-4">
        <title>Event</title>
      </sec>
      <sec id="sec-3-5">
        <title>Location</title>
        <p>GeoCoordinates
item
geo
geo
l
o
c
a
t
i
o
eo n
g
Place
addressLocality/
addressRegion
address
inReplyTo
recipsieennder/</p>
        <p>t
atendee/
organizer
name/
description
homweoLrokcLaotcioanti/on fliiitaaon
location
postalCode/
streetAddress/
postOficeBoxNumber
PostalAddress addressCountry
time
itagnm l/eagn
GeoVector velocity
u
d uncertainty
e
double
longitude/</p>
        <p>latitude
string
Legend
P1
Pn
X1
Xm</p>
        <sec id="sec-3-5-1">
          <title>Personal</title>
        </sec>
        <sec id="sec-3-5-2">
          <title>Information</title>
        </sec>
        <sec id="sec-3-5-3">
          <title>Sources</title>
        </sec>
        <sec id="sec-3-5-4">
          <title>External</title>
        </sec>
        <sec id="sec-3-5-5">
          <title>Sources</title>
          <p>l
i
a
m
e</p>
        </sec>
      </sec>
      <sec id="sec-3-6">
        <title>Agent</title>
      </sec>
      <sec id="sec-3-7">
        <title>Person</title>
        <p>e
on
h
telep
name</p>
        <p>Address
name
string
e/
givenNilyaNmame
fam
birthD
ate
dateTime</p>
      </sec>
      <sec id="sec-3-8">
        <title>Organization</title>
      </sec>
      <sec id="sec-3-9">
        <title>Country</title>
        <p>address
X
p
schema:X
schema:p</p>
        <p>X
p
personal:X
personal:p</p>
        <p>X
xsd:X
rdfs:subClassOf
This is of course relatively simple for data sources that track
modiifcations, e.g., CalDAV (calendar), CardDAV (contacts) and IMAP
(email). For others, this requires more processing. The result of this
process is a delta update, i.e., a set of updates to the KB since the
last time that particular source was considered.</p>
        <p>The KB records the provenance of each newly obtained piece
of information. Synchronizers record a description of the data
source, and enrichers record their own name. We use named graphs
to store the provenance. For example, the statements extracted
from an email message in the user’s email server will be
contained in a graph named with the concatenation of the server’s
email folder URL and the message id. The graph’s URI is itself an
instance of personal:Document, and is related to its source via
the personal:documentOf property. The source is an instance of
personal:Source and is in this case the email server’s URL.
Account information is included in an instance of personal:Account
via the personal:sourceOf property. Account instances allows us
to gather diferent kinds of data sources, (e.g., CardDAV, CalDAV
and IMAP servers) belonging to one provider (e.g., corporate IT
services) to which the user accesses through one identification. This
provenance can be used to answer queries such as “What meetings
were recorded in my work calendar for next Monday?”.</p>
        <p>Finally, the system allows the propagation of information from
the KB to the data sources. These can either be insertions/deletions
derived by the enrichers, or insertions/deletions explicitly specified
by the user. For instance, consider the information that diferent
email addresses correspond to the same person. This information
can be pushed to data sources, which may for example result in
performing the merge of two contacts in the user’s list of contacts.
To propagate the information to the source, we translate from the
structure and terminology of the KB back to that of the data source
and use the API of that source. The user has the means of controlling
this propagation, e.g., specifying whether contact information in
our system should be synchronized to her phone’s contact list.</p>
        <p>
          The user can update directly the KB by inserting or deleting
knowledge statements. Such updates to the KB are specified in the
SPARQL Update language [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. When no source is specified for
recording this new information, the system considers all the sources
that know the subject of the particular statement. For insertion, if
no source is able to register a corresponding insertion, the system
performs the insertion in a special locally persistent graph, called
the overwrite graph. For deletions, if one source fails to perform
a deletion (e.g., because the statement is read-only), the system
removes the statement from the KB anyway (even if the data is
still in some upstream source). A negative statement is added to
the overwrite graph. This negative statement will prevent using
a source statement to reintroduce the corresponding statement in
KB: The negative statement overwrites the source statement.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>ENRICHERS</title>
      <p>We describe the general principles of enricher modules. We then
describe two specific enrichments: agent matching and event
geolocation.</p>
      <p>After loading, enricher modules perform inference tasks such as
entity resolution, event geolocation, and other knowledge
enrichment tasks. An enricher works in a diferential manner: it takes as
input the current state of the KB, and a collection of changes ∆ i
that have recently happened. It computes a new collection ∆ i+1 of
enrichments. Intuitively, this allows reacting to changes in a data
source. When some ∆ 0 is detected (typically by some synchronizer),
the system runs a pipeline of enrichers to take these changes into
consideration. For instance, when a new entry is entered in the
calendar with an address, a geocoding enricher is called to locate it.
Another enricher will later attempt to match it with a position in
the location history. For performance, particularly costly enrichers
wait until there are enough changes, or when no more changes are
happening, before running on a batch of changes. This is the case
for the entity resolution enricher. We now present this enricher
and another one that has been incorporated into the system.
4.1</p>
    </sec>
    <sec id="sec-5">
      <title>Agent Matching</title>
      <p>
        Facets. The KB keeps information as close to the original data as
possible. Thus, the knowledge base will typically contain several
entities for the same person, if that person appears with diferent
names or diferent email addresses. We call such resources facets of
the same real-world agent. Diferent facets of the same agent will
be linked by the personal:sameAs relation. The task of identifying
equivalent facets has been intensively studied under diferent names
such as record linkage, entity resolution, or object matching [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In
our case, we use techniques that are tailored to the context of
personal KBs: identifier-based matching and attribute-based matching.
      </p>
      <p>Identifier-based matching. We can match two facets if they have
the same value for some particular attribute (such as an email
address or a telephone number), which, in some sense, identifies or
determines the entity. This approach is commonly used in personal
information systems (in research and industry) and gives fairly
good results for linking, e.g., facets extracted from emails and the
ones extracted from contacts. Such a matching may occasionally
be incorrect, e.g., when two spouses share a mobile phone or two
employees share the same customer relations email address. In our
experience, such cases are rare, and we postpone their study to
future work.</p>
      <p>Two agent facets with the same first and family names have,
for instance, a higher probability to represent the same agent than
two agent facets with diferent names, all other attributes held
constant. Besides names, attributes that can help determine a matching
include schema:birthDate, schema:gender, and schema:email.</p>
      <p>
        We tried holistic matching algorithms for graph alignments [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ]
that we adapted to our setting. The results turned out to be
disappointing (see Section 5). We believe this is due to the
following: (i) almost all agent facets have a schema:email, and possibly
a schema:name, but most of them lack other attributes that are
thus almost useless; (ii) names extracted from mails may contain
pseudonyms, abbreviations, or lack family names, which reduces
matching precision. (iii) we cannot reliably compute name
frequency metrics from the knowledge base, since a rare name may
appear many times for diferent email addresses if a person
happens to be a friend of the user. Therefore, we developed our own
algorithm, AgentMatch, which works as follows:
(1) We partition Agents using the equivalence relation
computed by matching identifying attributes.
(2) For each Agent equivalence class, we compute its
corresponding set of names, and, for each name, its number of
occurrences (in email messages, etc.).
(3) We compute Inverse Document Frequency (IDF) scores,
where the documents are the equivalence classes, and the
terms are the name occurrences.
(4) For each pair of equivalence classes, we compute a numerical
similarity between each pair of names using an approximate
string distance that finds the best matching of words between
the two names and then compares matching words using
another string similarity function (discussed below). The
similarity between two names is computed as a weighted mean
using the sum of word-IDFs as weights. The best matching
of words corresponds to a maximum weight matching in
the bipartite graph of words where weights are computed
using the second string similarity function. The similarity
(in [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]) between two equivalence classes is computed as a
weighted mean of name pair similarity using the product of
word occurrences as weights.
(5) Pairs for which the similarity is above a certain threshold
are considered to correspond to two equivalent facets.
The second similarity function we use is based on the Levenshtein
edit-distance, after string normalization (accent removal and
lowercasing). In our experiments, we have also tried the Jaro–Winkler
distance. For performance reasons, we use 2- or 3-gram-based
indexing of words in agent names, and only consider in step (4.) of the
process those Agent parts with some ratio S of q-grams in common
in at least one word. For instance, two Agent parts with names
“Susan Doe” and “Susane Smith” would be candidates.
4.2
      </p>
    </sec>
    <sec id="sec-6">
      <title>Geolocating Events</title>
      <p>We discuss how to geolocate events, e.g., how we can detect that
Monday’s lunch was at “Shana Thai Restaurant, 311 Mofett
Boulevard, Mountain View, CA 94043”. For this, we first analyze the
location history from the user’s smartphone to detect places where
the user stayed for a prolonged period of time. We then perform
some spatiotemporal alignment between such stays and the events
in the user’s calendar. Finally, we use geocoding to provide location
semantics to the events, e.g., a restaurant name and a street address.</p>
      <p>Detecting stays. Locations in the user’s location history can be
put into two categories: stays and moves. Stays are locations where
the user remained for some period of time (e.g., dinner at a
restaurant, gym training, ofice work), and moves are the others. Moves
usually correspond to locations along a journey from one place to
another, but might also correspond to richer outdoor activity (e.g.,
jogging, sightseeing). Figure 3 illustrates two stay clusters located
inside the same building.</p>
      <p>
        To transform the user’s location history into a sequence of stays
and moves, we perform time-based spatial clustering [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. The idea
is to create clusters along the time-axis. Locations are sorted by
increasing time, and each new location is either added to an
existing cluster (that is geographically close and that is not too old), or
added to a new cluster. To do so, a location is spatially represented
as a two dimensional unimodal normal distribution N (µ , σ 2). The
assumption of a normally distributed error is typical in the field of
processing location data. For instance, a cluster of size 1 formed
by location point p = (t , x, y, a), where t is the time, a the
accuracy, and (x, y) the coordinates, is represented by the distribution
P = N (µ P = (x, y), σP2 = a2). When checking whether location
p can be added to an existing cluster C represented by
distribution Q, the process computes the Hellinger distance [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] between
the distribution P and the normal distribution Q = N (µ Q , σ Q2 ):
H 2(P , Q) = 1 − r σ2σ2 P+σσQQ2 exp − 41 d(σµ 2P+, σµQQ2 )2 ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ], where
      </p>
      <p>Q P
d(µ P , µ Q ) is the geographical distance between cluster centers.
The Hellinger distance takes into account both the accuracy and
geographical distance between cluster centers, which allows us
to handle outliers no matter the location accuracy. The location
is added to C if this distance is below a certain threshold λ, i.e.,
H 2(P , Q) ⩽ λ2 &lt; 1. In our system, we used a threshold of 0.95.</p>
      <p>When p is added to cluster C, the resulting cluster is defined with
a normal distribution whose expectation is the arithmetic mean of
location point centers weighted by the inverse accuracy squared,
and whose variance is the harmonic mean of accuracies squared.
Formally, if a cluster C is formed by locations {p1, . . . , pn }, where
pi = (ti , xi , yi , ai ), then C is defined with distribution N (µ , σ 2)
where µ is the weighted arithmetic mean of location centers (xi , yi )
weighted by their inverse accuracy squared ai−2, and the variance
σ 2 is the harmonic mean of location accuracies squared ai−2.
µ = Xn (xi , yi ) Xn 1
i=1 ai2 i=1 ai2
!−1
σ 2 =</p>
      <p>Xn 1</p>
      <p>2
i=1 ai
!−1
The coordinates are assumed to have been projected to an Euclidean
plane locally approximating distances and angles on Earth around
cluster points. If n = 1, then µ = (x1, yi ) and σ 2 = a12, which
corresponds to the definition of a cluster of size 1.</p>
      <p>A cluster that lasted more than a certain threshold is a candidate
for being a stay. A dificulty is that a single location history (e.g.,
Google Location History) may record locations of diferent devices,
e.g., a telephone and a tablet. The identity of the device may not be
recorded. The algorithm understands that two far-away locations,
very close in time, must come from diferent devices. Typically, one
of the devices is considered to be stationary, and we try to detect a
movement of the other. Another dificulty comes when traveling in
high speed trains with poor network connectivity. Location trackers
will often give the same location for a few minutes, which leads to
the detection of an incorrect stay.</p>
      <p>Matching stays with events. After the extraction of stays using
the previous algorithm, the next step is to match these with calendar
events. Such a matching turns out to be dificult because: (i) the
location of an event (address or geo-coordinates) is often missing;
(ii) when present, an address often does not identify a geographical
entity, as in “John’s home” or “room C110”; (iii) in our experience,
starting times are generally reasonable (although a person may be
late or early for a meeting) but durations are often not meaningful
(around 70% of events in our test datasets were scheduled for 1 hour;
among the 1-hour events that we aligned, only 9% lasted between
45 and 75 minutes); (iv) some stays are incorrect.</p>
      <p>Because of (i) and (ii), we do not rely much on the location
explicitly listed in the user’s calendars. We match a stay with an
event primarily based on time: the time overlap (or proximity) and
the duration. In particular, we match the stay and the event, if the
ratio of the overlap duration over the entire stay duration is greater
than a threshold θ . As we have seen, event durations are often
unreliable because of (iii). Our method still yields reasonable results,
because it tolerates errors on the start of the stay for long stays
(because of their duration) and for short ones (because calendar
events are scheduled usually for at least one hour). If the event has
geographical coordinates, we filter out stays that are too far away
from that location (i.e., when the distance is greater than δ ). We
discuss the choice of θ and δ for this process in Section 5.</p>
      <p>
        Geocoding event addresses. Once stays associated with events,
we enrich events with rich place semantics (country, street name,
postal code, place name). If an event has an explicit address, we use
a geocoder. Thymeflow allows using diferent geocoders, e.g., the
Google Maps Geocoding API [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], which returns the geographic
coordinates of an address, along with structured place and address
data. The enricher only keeps the geocoder’s most relevant result
and adds its data (geographic coordinates, identifier, street address,
etc.) to the location in the knowledge base. For events that do not
have an explicit address but that have been matched to a stay, we
use the geocoder to transform the geographic coordinates of the
stay into a list of nearby places. The most precise result is added
as the event location. If the event has both an explicit address and
a match with a stay, we call the geocoder on this address, while
restricting the search to a small area around the stay coordinates.
5
      </p>
    </sec>
    <sec id="sec-7">
      <title>EXPERIMENTS</title>
      <p>In this section, we present the results of our experiments. We used
datasets from two real users, whom we call Angela and Barack.
Angela’s dataset consists of 7,336 emails, 522 calendar events, 204,870
location points, and 124 contacts extracted from Google’s email,
contact, calendar, and location history services. This corresponds
to 1.6M triples in our schema. Barack’s dataset consists of 136,301
emails, 3,080 calendar events, 1,229,245 location points, and 582
contacts extracted from the same sources. Barack’s emails cover a
period 5,540 days, locations cover 1,676 days. This corresponds to
10.3M triples, where 70.9 % come from the location history, 28.8 %
from emails, 0.3 % from calendars and less than 0.1 % from contacts.</p>
      <p>We measured the loading times of Angela’s dataset into the
system in two diefrent scenarios: from source data on the Internet
(using Google API, except for the location history which is not
provided by the API and was loaded from a file), and from source
data stored in local files. Loading took 19 and 4 minutes, respectively,
on a desktop computer (Intel i7-2600k 4-core, 3.4 GHz, 20 GB RAM,
SSD).
5.1
We evaluated the precision and recall of the AgentMatch algorithm
(Section 4) on Barack’s dataset. This dataset contains 40,483 Agent
instances with a total of 25,381 schema:name values, of which 17,706
are distinct; it also contains 40,455 schema:email values, of which
24,650 are distinct. To compute the precision and recall, we sampled
2,000 pairs of distinct Agents, and asked Barack to manually assign
to each possible pair a ground truth value (true/false). Barack was
provided with the email address and name of each agent, and was
allowed to query the KB to get extra information.</p>
      <p>We tested both Levenshtein and Jaro–Winkler as secondary
string distance, with and without IDF term weights. The term
qgram match ratio (S) was set to 0.6. We varied λ so as to maximize
the F1 value. Precision decreases while recall increases for
decreasing threshold values. Our baseline is IdMatch, which matches two
contacts if they have the same email address.</p>
      <p>
        As competitor, we considered PARIS [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ], an ontology alignment
algorithm that is parametrized by a single threshold. We used string
similarity for email addresses, and the name similarity metric used
by AgentMatch, except that it is applied to single Agent instances.
PARIS computes the average number of outgoing edges for each
relation. Since our dataset contains duplicates, we gave PARIS an
advantage by computing these values upfront.
      </p>
      <p>We also considered Google’s “Find duplicates” feature. Google
was not able to handle more than 27,000 contacts at the same time,
and so we had to run it multiple times in batches. Since the
final output depends on the order in which contacts were loaded,
we present two results, one for which the contacts were supplied
sorted by email address (Google1), and another for a random order
(Google2). Since Google’s algorithm failed to merge contacts that
IdMatch did merge, we also tested running IdMatch on Google’s
output (GoogleId) for both runs. We also tested Mac OS X contact
de-duplication feature. However, its result did not contain all the
meta data from the original contacts, so that we could not evaluate
this feature.</p>
      <p>The results are shown in Table 1. As expected, our baseline
IdMatch has a perfect precision, but a low recall (43%). Google,
likewise, gives preference to precision, but achieves a higher recall
than the baseline (50%). The recall improves further if the Google
is combined with IdMatch (61%). PARIS, in contrast, favors recall
(92%) over precision (83%), and achieves a better F1 value overall.
The highest F1-measure (95%) is reached for AgentMatch with the
Jaro–Winkler distance for a threshold of 0.825. It has a precision
comparable to Google’s, and a recall comparable to PARIS’s.
5.2</p>
    </sec>
    <sec id="sec-8">
      <title>Detecting Stays</title>
      <p>
        We evaluated the extraction of stays from the location history on
Barack’s dataset. We randomly chose 15 days, and presented him
a web interface with (1) the raw locations on a map, (2) the stays
detected by Thymeflow, and (3) the stays detected by his Google
Timeline [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Barack was then asked to annotate each stay as “true”,
i.e., corresponding to an actual stay, or “false”, and to report missing
stays, based on his memories. He was also allowed to use any other
available knowledge (e.g., his calendar). The exact definition of an
actual stay was left to Barack, and the reliability of his annotations
were dependent on his recollection. In total, Barack found 64
actual stays. Sometimes, an algorithm would detect an actual stay
as multiple consecutive stays. In that case, we counted one true
stay and counted the number of duplicates and the resulting move
duration in-between. For instance, an actual stay of 2 hours output
as two stays of 29 min and 88 min, with a short move of 3 minutes
in-between would count as 1 true stay, 1 duplicate and 3 minutes
of duplicate move duration.
      </p>
      <p>Table 2 shows the resulting precision and recall for each method,
with varying stay duration thresholds for Thymeflow. We also show
the duplicate ratio #D (number of duplicates over the number of true
stays) and the move duration ratio DΘ (duplicate move duration
over the total duration of true stays). Overall, Thymeflow obtains
results comparable to Google Timeline for stay extraction, with
better precision and recall, but more duplicates.</p>
      <p>Matching stays with events. We also evaluated the matching of
stays in the user’s location history with the events in their calendar.
We sampled stays from Angela and Barack’s datasets, and produced
all possible matchings to events, i.e., all matchings produced by the
algorithm whatever the threshold. Angela and Barack were then
asked to manually label the matches as correct or incorrect. The
matching process relies on two parameters, namely the duration
ratio threshold θ and the filtering distance δ . We varied θ and found
that a value of 0.2 leads to best F1 values. With this value, we varied
F1
δ , and found that the performance improves consistently with larger
values. This indicates that filtering stays which are too far from
event location coordinates (where available) should not be taken
into consideration. With these settings, the matching performs
quite well: We achieve a precision and recall of around 70%.
5.3</p>
    </sec>
    <sec id="sec-9">
      <title>Geocoding</title>
      <p>We evaluated diferent geocoding enrichers for (event, stay) pairs
described in Section 4. We used the Google Maps Geocoding API.
We considered three diferent inputs to this API: the event
location address attribute (Event), the stay coordinates (Stay), and the
event location attribute with the stay coordinates given as a bias
(StayEvent). We also considered a more precise version of Event,
which produces a result only if the geocoder returns a single
unambiguous location (EventSingle). Finally, we devised the method
StayEvent+Stay, which returns the StayEvent result if it exists, and
the Stay result otherwise.</p>
      <p>
        For each input, the geocoder gave either no result (M), a false
result (F), a true place (T), or just a true address (A). For instance,
an event occurring in “Hôtel Ritz Paris” is true if the output is for
instance “Ritz Paris”, while an output of “15 Place Vendôme, Paris”
would count as a true address. For comparison, we also evaluated
the place given by Google Timeline [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>Due to limitations of the API, geocoding from stay coordinates
mostly yielded address results (99.2% of the time). To better evaluate
these methods, we computed the number of times in which the
output was either a true place, or a true address (denoted T|A). For
those methods that did not always return a result, we computed a
precision metric PT|A (resp., PT), that is equal to the ratio of T|A
(resp., T) to the number of times a result was returned. We computed
a F1-measure based on the PT|A precision, and a recall assimilated
to the number of times the geocoder returned a result (1 − M).</p>
      <p>The evaluation was performed on 250 randomly picked (stay,
event) pairs in Barack’s dataset. The results are shown in
Table 3. The Google Timeline gave the right place or address only
17.2% of the time. The EventSingle method, likewise, performs
poorly, indicating that the places are indeed highly ambiguous. The
best precision (PT|A of 67.2%) is obtained by Geocoding with the
StayEvent, but this method returns a result only 50.0% of the time.
StayEvent+Stay, in contrast, can find the right place 28.4% of the
time, and the right place or address 50.0% of the time, which is our
best result. We are happy with this performance, considering that
around 45% of the event locations were room numbers without
mention of a building or place name (i.e., C101).
5.4</p>
    </sec>
    <sec id="sec-10">
      <title>Use Cases</title>
      <p>
        The user can query the KB using SPARQL [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. Additionally,
Thymeflow uses a modern triple store supporting full-text and
geospatial queries. Since the KB unites diferent data sources,
queries can seamlessly span multiple sources and data types. This
allows the user (Angela), to ask for instance:
• What are the phone numbers of her birthday party guests?
(combining information from the contacts and the emails)
• What places did she visit during her last trip to London?
(combining geocoding information with stays)
• For each person she meets more than 3 times a week, what
are the top 2 places where she usually meets that particular
person? (based on her calendar and location history)
Such queries are not supported by current proprietary cloud
services, which do not allow arbitrary queries.
      </p>
      <p>Hub of personal knowledge. Finally, the user can use the
bidirectionality of synchronization to enrich her existing services. For
instance, she can enrich her personal address book (CardDav) with
knowledge inferred by the system (e.g., a friend’s birth date
extracted from Facebook) using a SPARQL/Update query.
6</p>
    </sec>
    <sec id="sec-11">
      <title>RELATED WORK</title>
      <p>We now review the related work, on personal information
management, on information integration, and on the specific tasks of
location analysis and calendar matching.
6.1</p>
    </sec>
    <sec id="sec-12">
      <title>Personal Information Management</title>
      <p>
        This work is motivated by concept of personal information
management (PIM), taking the viewpoint of [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] as to what a PIM system
should be. [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] groups PIM research and development into the
following problems: finding and re-finding, keeping and organizing
personal information. We now present some notable contributions.
      </p>
      <p>Finding and Re-finding. PIM has been concerned with improving
how individuals go about retrieving a piece of information to meet
a particular need.</p>
      <p>
        For searching within the user’s personal computer, desktop
fulltext search tools capable have been developed for various operating
systems and platforms [
        <xref ref-type="bibr" rid="ref46">46</xref>
        ]. Search entries are for instance the files
and folders on the file-system, email messages, browser history
pages, calendar events, contacts, and applications. Search may be
performed on both the content and the metadata. In particular, the
IRIS [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and NEPOMUK [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] projects used knowledge
representation technologies to provide semantic search facilities and go
beyond search to provide facilities for exchanging data between
diferent applications within a single desktop computer.
      </p>
      <p>
        Other research eforts have focused on ameliorating the process
of finding things the user has already seen, using whatever context
or meta-information that the user remembers [
        <xref ref-type="bibr" rid="ref14 ref39 ref42">14, 39, 42</xref>
        ].
      </p>
      <p>Keeping. PIM has also addressed the question: What kind of
information should be captured and stored in digital form?</p>
      <p>
        A central idea of [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]’s vision is creating a device that is able to
digitally capture all of the experiences and acquired knowledge of the
user, so that it can act as a supplement to her memory. Lifelogging
attempts to fulfil this vision by visually capturing the world that we
see in our everyday lives [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. MyLifeBits is a notable documented
example [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. The advent of cheaper, more advanced and eficient
wearable devices able to capture the diferent aspects of one’s life
has made lifelogging indiscriminate of what it logs. Lifelogging
activities include recording a history of machine enabled tasks (e.g.,
communications, editing, web browser’s history), passively
capturing what we see and hear (e.g., via a wearable camera), monitoring
personal biometrics (e.g., steps taken, sleep quality), and logging
mobile device and environmental context (e.g., the user’s location,
smart home sensing).
      </p>
      <p>
        Diferent from lifelogging, which does not focus on the analysis
of the logged information, the quantified self is a movement to
incorporate data acquisition technology on certain focused aspects
of one’s daily life [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. The quantified self focuses on logging
experiences with a clearer understanding of the goals, such as exercise
levels for fitness and health care.
      </p>
      <p>Organizing. PIM also deals with the management and
organization of information. It is for instance concerned with the
management of privacy, security, distribution, and enrichment of
information.</p>
      <p>
        A personal data service (PDS) lets the user store, manage and
deploy her information in a structured way. The PDS may be used
to manage diferent identities and/or as central point of information
exchange between services. For instance, an application that
recommends new tracks based on what the user likes to listen may need
to use adapters and authenticate with diferent services keeping
a listening history. Instead, the PDS centralizes this information
and the application only needs an adapter to connect to this PDS.
Higgins [
        <xref ref-type="bibr" rid="ref43">43</xref>
        ], OpenPDS [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and the Hub of All Things [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] are
examples of PDSs.
      </p>
      <p>
        MyData [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ] describes a consent management framework that
lets user control the flow of data between a service that has
information about her and a service that uses this information. In
this framework, which is still at its early stage of development, a
central system holds credentials to access the diferent services on
the user’s behalf. The user specifies the rules by which flows of
information between any two of those services are authorized. The
central system is in charge of providing or revoking the necessary
authorizations on each of those services to implement these rules.
Contrary to a PDS, the actual data does not need to flow through
the central system. Two services may spontaneously share
information about the user with each other if legally entitled (e.g., two
public bodies), in which case the central system is notified. It is
an all-or-nothing approach that represents a paradigm shift from
currently implemented ad-hoc flows of personal information across
organizations.
      </p>
      <p>
        Organizing information as a time-ordered stream of documents
(a lifestream) has been proposed as a simple scheme for reducing
the time the user spends in manually organizing documents into
a classic hierarchical file system [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. It has the advantage of
providing unified view of the user’s personal information. Lifestreams
can be seen as a natural representation of lifelog information. The
Digital Me system uses this kind of representation to unify data
from diferent loggers [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ].
      </p>
      <p>
        For managing personal information, diferent levels of
organization and abstraction have been proposed. Personal data lakes [
        <xref ref-type="bibr" rid="ref45">45</xref>
        ]
and personal data spaces [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] ofer little integration and focus on
handling storage, metadata, and search. On the other end, personal
knowledge bases, which include more semantics, have been used:
Haystack [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ], SEMEX [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], IRIS [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and NEPOMUK [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. Such a
structure allows the flexible representation of things like “this file,
authored by this person, was presented at this meeting about this
project”. They integrate several sources of information, including
documents, media, email messages, contacts, calendars, chats, and
web browser’s history. However, these projects date from 2007 and
before and assume that most of the user’s personal information is
stored on her personal computer. Today, most of it is spread across
several devices [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Some proprietary service providers, such as Google and Apple,
have arguably come quite close to our vision of a personal
knowledge base. They integrate calendars, emails, and address books,
and allow smart exchanges between them. Some of them even
provide intelligent personal assistants that proactively interact with the
user. However, these are closed source proprietary solutions that
promote vendor lock-in. In response, open-source alternative
solutions have been developed, to cloud storage in particular, such as
ownCloud [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ] and Cozy [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. These have evolved into application
platforms that host various kinds of other user-oriented services
(e.g., email, calendar, contacts). They leverage multi-device
synchronization facilities and standard protocols to facilitate integration
with existing contact managers and calendars. Cozy is notable for
providing adapters for importing data from diferent kinds of
services (e.g., activity trackers, finance, social) into the system into a
document-oriented database. These tools bring the convenience of
modern software-as-a-service solutions while letting the user be in
control, not give away some of her privacy and free herself from
vendor lock-in.
6.2
      </p>
    </sec>
    <sec id="sec-13">
      <title>Information Integration</title>
      <p>
        Data matching (also known as record or data linkage, entity
resolution, object/field matching) is the task of finding records that refer
to the same entity across diferent sources. It is extensively utilized
in data mining projects and in large-scale information systems by
business, public bodies and governments [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Example application
areas include national census, the health sector, or fraud
detection. In the context of personal information, SEMEX [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] integrates
entity resolution facilities. SEMEX imports information from
documents, bibliography, contacts and email, and uses attributes as well
associations found between persons, institutions and conferences
to reconcile references. However, diferent from our work, the
integration is done at import time so the user cannot later manually
revoke it through an update, and incremental synchronization is not
handled. Recently, contact managers from known service providers
have started providing de-duplication tools for finding duplicate
contacts and merging them in bulk. However, these tools are often
restricted to contacts present in the user’s address book and do not
merge contacts from social networks or emails.
      </p>
      <p>
        Common standards, such as vCards and iCalendar, have
advanced the state of the art by allowing provider-independent
administration of personal information. There is also a proposed standard
for mapping vCard content and iCalendars into RDF [
        <xref ref-type="bibr" rid="ref27 ref6">6, 27</xref>
        ]. While
such standards are useful in our context, they do not provide the
means to match calenders, emails, and events, as we do. The only
set of vocabularies besides schema.org which provides a broad
coverage of all entities we are dealing with is the OSCAF
ontologies [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ]. But their development was stopped in 2013 and they
are not maintained anymore, contrary to schema.org which is
actively supported by companies like Google and widely used on the
web [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. Recently, a personal data service has been proposed that
reuses the OSCAF ontologies, but they use a relational database
instead of a knowledge base [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ].
6.3
      </p>
    </sec>
    <sec id="sec-14">
      <title>Location Analysis and Calendar Matching</title>
      <p>
        The ubiquity of networked mobile devices able to track users’
locations over time has been greatly utilized for estimating trafic
and studying mobility patterns in urban areas. Improvements in
accuracy and battery eficiency of mobile location technologies
have made possible the estimation of user activities and visited
places on a daily basis [
        <xref ref-type="bibr" rid="ref2 ref20 ref29">2, 20, 29</xref>
        ]. Most of these works have mainly
exploited sensor data (accelerometer, location, network) and readily
available geographic data. Few of them, however, have exploited the
user’s calendar and other available data for creating richer and more
semantic activity histories. Recently, a study has recognized the
importance of using the location history and social network
information for improving the representation of information contained
in the user’s calendar: e.g. for distinguishing genuine real-world
events from reminders [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ].
7
      </p>
    </sec>
    <sec id="sec-15">
      <title>DISCUSSION</title>
      <p>The RDF model seems well-suited for building a personal
knowledge base. However powerful, making full use of it relies on being
able to write and run performant queries. At this point, we cannot
focus on optimizing for a specific application. It is not yet clear
up to what extent Thymeflow should hold raw data (such as the
entire location history), which, depending on how it is used, may
be loaded on demand, in mediation style. Additionally, we would
like to raise the following issues that could drive future research
and development:
• Gathering data for experiments: The research community
might benefit from building and maintaining sets of
annotated multi-dimensional personal information for use in
different kinds of tasks. This is challenging, specially due to
privacy concerns.
• Opportunities: Internet companies that already hold a lot of
user data are not yet integrating everything they have in a
coherent whole, and are not performing as well as we think
they could. For instance, Google Location History does not
integrate the user’s calendar, unlike we do. We think that
there are still many opportunities to create new products
and functionalities from existing data alone.
• Inaccessible information: The hard truth is that many
popular Internet-based services still do not provide an API for
conveniently retrieving user data out of them, or that such
an API is not feature-complete (e.g., Facebook, WhatsApp).
8</p>
    </sec>
    <sec id="sec-16">
      <title>CONCLUSION</title>
      <p>The Thymeflow system integrates data from emails, calendars,
address books, and location history, providing novel functionalities
on top of them. It can merge diferent facets of the same agent,
determine prolonged stays in the location history, and align them
with events in the calendar.</p>
      <p>Our work is a unique attempt at building a personal knowledge
base. First, the system is complementary to and does not pretend
to replace the existing user experience, applications, and
functionalities, e.g., for reading/writing emails, managing a calendar,
organizing files. Second, we embrace personal information as being
fundamentally distributed and heterogeneous and focus on the need
of providing knowledge integration on top for creating completely
new services (query answering, analytics). Finally, while the system
could benefit from more advanced analysis such the extraction of
entities from rich text (e.g., emails), for linking them with elements
of the KB, our first focus is on enriching existing semi-structured
data, which improves the quality of data for use by other services.</p>
      <p>Our system can be extended in a number of directions,
including incorporating more data sources, extracting semantics from
text, complex analysis of users’ data and behavior. Future
applications include personal analytics, cross-vendor search, intelligent
event planning, recommendation, and prediction. Also, our system
could use simpler query language, perhaps natural language, or
even proactively interact with the user, in the style of Apple’s Siri,
Google’s Google Now, Microsoft’s Cortana, or Amazon Echo.</p>
      <p>While the data obtained by Thymeflow remains under the user’s
direct control, fully respecting her privacy, the data residing outside
of it may not. However, using Thymeflow, the user could have a
better understanding of what other systems know about her, which
is important first step in gaining control about it.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Serge</given-names>
            <surname>Abiteboul</surname>
          </string-name>
          , Benjamin André, and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Kaplan</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Managing your digital life</article-title>
          .
          <source>CACM 58</source>
          ,
          <issue>5</issue>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Ashbrook</surname>
          </string-name>
          and
          <string-name>
            <given-names>Thad</given-names>
            <surname>Starner</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Using GPS to learn significant locations and predict movement across multiple users</article-title>
          .
          <source>Personal and Ubiquitous Computing</source>
          <volume>7</volume>
          ,
          <issue>5</issue>
          (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Vannevar</given-names>
            <surname>Bush</surname>
          </string-name>
          .
          <year>1945</year>
          .
          <article-title>As We May Think</article-title>
          . The
          <string-name>
            <surname>Atlantic</surname>
          </string-name>
          (
          <year>1945</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Adam</given-names>
            <surname>Cheyer</surname>
          </string-name>
          , Jack Park, and Richard Giuli.
          <year>2005</year>
          . IRIS: Integrate. Relate. Infer. Share.
          <source>Technical Report. DTIC Document.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Peter</given-names>
            <surname>Christen</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Data matching: concepts and techniques for record linkage, entity resolution, and duplicate detection</article-title>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Dan</given-names>
            <surname>Connolly</surname>
          </string-name>
          and
          <string-name>
            <given-names>Libby</given-names>
            <surname>Miller</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>RDF Calendar - an application of the Resource Description Framework to iCalendar Data</article-title>
          . http://www.w3.org/TR/rdfcal/.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Cozy</given-names>
            <surname>Cloud</surname>
          </string-name>
          .
          <year>2016</year>
          . Cozy - Simple, versatile, yours. (
          <year>2016</year>
          ). https://cozy.io/
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>David</surname>
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Crocker</surname>
          </string-name>
          .
          <year>1982</year>
          .
          <article-title>Standard for the format of ARPA Internet text messages</article-title>
          .
          <source>RFC 822</source>
          . IETF. https://tools.ietf.org/html/rfc822
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Richard</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          , David Wood,
          <string-name>
            <given-names>and Markus</given-names>
            <surname>Lanthaler</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>RDF 1.1 Concepts and Abstract Syntax</article-title>
          . http://www.w3.org/TR/2014/REC-rdf11
          <string-name>
            <surname>-</surname>
          </string-name>
          concepts-
          <volume>20140225</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Yves-Alexandre de Montjoye</surname>
          </string-name>
          , Erez Shmueli,
          <string-name>
            <surname>Samuel S Wang</surname>
          </string-name>
          , and Alex Sandy Pentland.
          <year>2014</year>
          .
          <article-title>openpds: Protecting the privacy of metadata through safeanswers</article-title>
          .
          <source>PloS one 9</source>
          ,
          <issue>7</issue>
          (
          <year>2014</year>
          ),
          <year>e98790</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B.</given-names>
            <surname>Desruisseaux</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Internet Calendaring and Scheduling Core Object Specification (iCalendar)</article-title>
          .
          <source>RFC 5545</source>
          . IETF. https://tools.ietf.org/html/rfc5545
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Jens-Peter Dittrich</surname>
          </string-name>
          and Marcos Antonio Vaz Salles.
          <year>2006</year>
          .
          <article-title>iDM: A Unified and Versatile Data Model for Personal Dataspace Management</article-title>
          .
          <source>In Proceedings of the 32nd International Conference on Very Large Data Bases</source>
          , Seoul, Korea,
          <source>September 12-15</source>
          ,
          <year>2006</year>
          .
          <fpage>367</fpage>
          -
          <lpage>378</lpage>
          . http://dl.acm.org/citation.cfm?id=
          <fpage>1164160</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Xin</given-names>
            <surname>Dong</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alon Y.</given-names>
            <surname>Halevy</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>A Platform for Personal Information Management and Integration</article-title>
          . In CIDR.
          <volume>119</volume>
          -
          <fpage>130</fpage>
          . http://www.cidrdb.org/cidr2005/ papers/P10.pdf
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Susan</surname>
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Dumais</surname>
            , Edward Cutrell,
            <given-names>Jonathan J.</given-names>
          </string-name>
          <string-name>
            <surname>Cadiz</surname>
          </string-name>
          , Gavin Jancke, Raman Sarin, and
          <string-name>
            <surname>Daniel</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Robbins</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Stuf I've seen: a system for personal information retrieval and re-use</article-title>
          .
          <source>In SIGIR 2003: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 28 - August 1</source>
          ,
          <year>2003</year>
          , Toronto, Canada.
          <fpage>72</fpage>
          -
          <lpage>79</lpage>
          . DOI:https://doi.org/10.1145/860435. 860451
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Facebook</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>The Graph API</article-title>
          . (
          <year>2016</year>
          ). https://developers.facebook.com/docs/ graph-api/
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Eric</given-names>
            <surname>Freeman</surname>
          </string-name>
          and
          <string-name>
            <given-names>David</given-names>
            <surname>Gelernter</surname>
          </string-name>
          .
          <year>1996</year>
          .
          <article-title>Lifestreams: A storage model for personal data</article-title>
          .
          <source>ACM SIGMOD Record 25</source>
          ,
          <issue>1</issue>
          (
          <year>1996</year>
          ),
          <fpage>80</fpage>
          -
          <lpage>86</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Hector</given-names>
            <surname>Garcia-Molina</surname>
          </string-name>
          , Yannis Papakonstantinou, Dallan Quass, Anand Rajaraman, Yehoshua Sagiv, Jefrey Ullman, Vasilis Vassalos, and
          <string-name>
            <given-names>Jennifer</given-names>
            <surname>Widom</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>The TSIMMIS approach to mediation: Data models and languages</article-title>
          .
          <source>Journal of intelligent information systems 8</source>
          ,
          <issue>2</issue>
          (
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Paula</surname>
            <given-names>Gearon</given-names>
          </string-name>
          , Alexandre Passant, and
          <string-name>
            <given-names>Axel</given-names>
            <surname>Polleres</surname>
          </string-name>
          .
          <source>2013. SPARQL 1.1 Update. https://www.w3.org/TR/sparql11-update/.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Jim</surname>
            <given-names>Gemmell</given-names>
          </string-name>
          , Gordon Bell, and
          <string-name>
            <given-names>Roger</given-names>
            <surname>Lueder</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>MyLifeBits: a personal database for everything</article-title>
          .
          <source>Commun. ACM 49</source>
          ,
          <issue>1</issue>
          (
          <year>2006</year>
          ),
          <fpage>88</fpage>
          -
          <lpage>95</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Google</surname>
          </string-name>
          .
          <year>2016</year>
          . Google Maps Timeline. (
          <year>2016</year>
          ). https://www.google.fr/maps/ timeline
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Google</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Google Maps APIs</article-title>
          . (
          <year>2017</year>
          ). https://developers.google.com/maps/ documentation/
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>RV</given-names>
            <surname>Guha</surname>
          </string-name>
          , Dan Brickley, and
          <string-name>
            <given-names>Steve</given-names>
            <surname>Macbeth</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Schema.org: Evolution of structured data on the web</article-title>
          .
          <source>CACM 59</source>
          ,
          <issue>2</issue>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Cathal</surname>
            <given-names>Gurrin</given-names>
          </string-name>
          , Alan F Smeaton, and
          <string-name>
            <surname>Aiden R Doherty</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Lifelogging: Personal big data</article-title>
          .
          <source>Foundations and trends in information retrieval 8</source>
          ,
          <issue>1</issue>
          (
          <year>2014</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>125</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Siegfried</surname>
            <given-names>Handschuh</given-names>
          </string-name>
          , Knud Möller, and
          <string-name>
            <given-names>Tudor</given-names>
            <surname>Groza</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>The NEPOMUK project-on the way to the social semantic desktop</article-title>
          .
          <source>In I-SEMANTICS.</source>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Steve</surname>
            <given-names>Harris</given-names>
          </string-name>
          , Andy Seaborne, and Eric Prud'
          <fpage>hommeaux</fpage>
          .
          <year>2013</year>
          .
          <article-title>SPARQL 1.1 Query Language</article-title>
          . http://www.w3.org/TR/sparql11-query/.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>HATDeX</given-names>
            <surname>Ltd</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Hub of All Things</article-title>
          . (
          <year>2017</year>
          ). https://hubofallthings.com
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Renato</given-names>
            <surname>Iannella and James McKinney</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>vCard Ontology - for describing People and Organizations</article-title>
          . http://www.w3.org/TR/2014/NOTE-vcard-rdf-
          <volume>20140522</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>William</given-names>
            <surname>Jones</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jaime</given-names>
            <surname>Teevan</surname>
          </string-name>
          .
          <year>2011</year>
          . Personal Information Management. University of Washington Press, Seattle, WA U.S.A.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29] Jong Hee Kang,
          <string-name>
            <given-names>William</given-names>
            <surname>Welbourne</surname>
          </string-name>
          , Benjamin Stewart, and
          <string-name>
            <given-names>Gaetano</given-names>
            <surname>Borriello</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Extracting Places from Traces of Locations</article-title>
          . In WMASH.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>David R Karger</surname>
            , Karun Bakshi, David Huynh,
            <given-names>Dennis</given-names>
          </string-name>
          <string-name>
            <surname>Quan</surname>
            ,
            <given-names>and Vineet</given-names>
          </string-name>
          <string-name>
            <surname>Sinha</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Haystack: A customizable general-purpose information management tool for end users of semistructured data</article-title>
          .
          <source>In Proc. of the CIDR Conf.</source>
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>Tom</given-names>
            <surname>Lovett</surname>
          </string-name>
          ,
          <string-name>
            <surname>Eamonn O'Neill</surname>
            ,
            <given-names>James</given-names>
          </string-name>
          <string-name>
            <surname>Irwin</surname>
            , and
            <given-names>David</given-names>
          </string-name>
          <string-name>
            <surname>Pollington</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>The calendar as a sensor: analysis and improvement using data fusion with social networks and location</article-title>
          .
          <source>In UbiComp.</source>
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>David</given-names>
            <surname>Montoya</surname>
          </string-name>
          , Thomas Pellissier Tanon, Serge Abiteboul, and
          <string-name>
            <given-names>Fabian</given-names>
            <surname>Suchanek</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Thymeflow, A Personal Knowledge Base with Spatio-temporal Data</article-title>
          .
          <source>In CIKM. Demonstration paper.</source>
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Nepomuk</surname>
            <given-names>Consortium</given-names>
          </string-name>
          <source>and OSCAF</source>
          .
          <year>2007</year>
          . OSCAF Ontologies.
          <article-title>(</article-title>
          <year>2007</year>
          ). http: //oscaf.sourceforge.net/
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Mikhail</surname>
            <given-names>S</given-names>
          </string-name>
          <string-name>
            <surname>Nikulin</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Hellinger distance</article-title>
          .
          <source>Encyclopedia of Mathematics</source>
          (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35] ownCloud.
          <year>2016</year>
          .
          <article-title>ownCloud - A safe home for all your data</article-title>
          . (
          <year>2016</year>
          ). https: //owncloud.org/
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>S.</given-names>
            <surname>Perreault</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>vCard Format Specification</article-title>
          .
          <source>RFC 6350</source>
          . IETF. https://tools.ietf. org/html/rfc6350
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>A</given-names>
            <surname>Poikola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K</given-names>
            <surname>Kuikkanieni</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H</given-names>
            <surname>Honko</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>MyData - A Nordic Model for human-centered personal data management and processing</article-title>
          . (
          <year>2014</year>
          ). https://www.lvm.fi/documents/20181/859937/MyData-nordic-
          <source>model/ 2e9b4eb0-68d7-463b-9460-821493449a63?version=1</source>
          .
          <fpage>0</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <surname>Mats</surname>
            <given-names>Sjöberg</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hung-Han</surname>
            <given-names>Chen</given-names>
          </string-name>
          , Patrik Floréen, Markus Koskela, Kai Kuikkaniemi, Tuukka Lehtiniemi, and
          <string-name>
            <given-names>Jaakko</given-names>
            <surname>Peltonen</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Digital Me: Controlling and Making Sense of My Digital Footprint</article-title>
          . (
          <year>2016</year>
          ). http://reknow.fi/dime/
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <surname>Craig</surname>
            <given-names>AN</given-names>
          </string-name>
          <string-name>
            <surname>Soules and Gregory R Ganger</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Connections: using context to enhance file search</article-title>
          .
          <source>ACM SIGOPS operating systems review 39</source>
          ,
          <issue>5</issue>
          (
          <year>2005</year>
          ),
          <fpage>119</fpage>
          -
          <lpage>132</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <surname>Fabian</surname>
            <given-names>M Suchanek</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Serge</given-names>
            <surname>Abiteboul</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Pierre</given-names>
            <surname>Senellart</surname>
          </string-name>
          .
          <year>2011</year>
          . PARIS:
          <article-title>Probabilistic alignment of relations, instances, and schema</article-title>
          .
          <source>PVLDB 5</source>
          ,
          <issue>3</issue>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <surname>Fabian</surname>
            <given-names>M Suchanek</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Gjergji</given-names>
            <surname>Kasneci</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Gerhard</given-names>
            <surname>Weikum</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Yago: a core of semantic knowledge</article-title>
          .
          <source>In WWW.</source>
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>Jaime</given-names>
            <surname>Teevan</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>The re: search engine: simultaneous support for finding and re-finding</article-title>
          .
          <source>In Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology</source>
          , Newport, Rhode Island, USA, October 7-
          <issue>10</issue>
          ,
          <year>2007</year>
          .
          <fpage>23</fpage>
          -
          <lpage>32</lpage>
          . DOI:https://doi.org/10.1145/1294211.1294217
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>Paul</given-names>
            <surname>Trevithick</surname>
          </string-name>
          and
          <string-name>
            <given-names>Mary</given-names>
            <surname>Ruddy</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Higgins - Personal Data Service</article-title>
          . (
          <year>2012</year>
          ). http://www.eclipse.org/higgins/
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>Denny</given-names>
            <surname>Vrandečić</surname>
          </string-name>
          and
          <string-name>
            <given-names>Markus</given-names>
            <surname>Krötzsch</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Wikidata: A Free Collaborative Knowledgebase</article-title>
          .
          <source>CACM 57</source>
          ,
          <issue>10</issue>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>Coral</given-names>
            <surname>Walker</surname>
          </string-name>
          and
          <string-name>
            <given-names>Hassan</given-names>
            <surname>Alrehamy</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Personal Data Lake with Data Gravity Pull</article-title>
          .
          <source>In Fifth IEEE International Conference on Big Data and Cloud Computing</source>
          ,
          <source>BDCloud</source>
          <year>2015</year>
          , Dalian, China,
          <source>August 26-28</source>
          ,
          <year>2015</year>
          .
          <fpage>160</fpage>
          -
          <lpage>167</lpage>
          . DOI:https://doi.org/ 10.1109/BDCloud.
          <year>2015</year>
          .62
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <surname>Wikipedia</surname>
            <given-names>contributors.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>List of search engines - Desktop search engines</article-title>
          . (
          <year>2016</year>
          ). https://en.wikipedia.org/w/index.php?title=List_of_search_engines
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>