=Paper= {{Paper |id=Vol-2063/dal-paper3 |storemode=property |title=A Study of Intensional Concept Drift in Trending DBpedia Concepts |pdfUrl=https://ceur-ws.org/Vol-2063/dal-paper3.pdf |volume=Vol-2063 |authors=Albert Meroño-Peñuela,Efstratios Kontopoulos,Sándor Darányi,Yiannis Kompatsiaris |dblpUrl=https://dblp.org/rec/conf/i-semantics/Merono-PenuelaK17 }} ==A Study of Intensional Concept Drift in Trending DBpedia Concepts== https://ceur-ws.org/Vol-2063/dal-paper3.pdf
       A Study of Intensional Concept Drift in Trending DBpedia
                               Concepts
                    Albert Meroño-Peñuela                                                  Efstratios Kontopoulos
                 Department of Computer Science                                      Information Technologies Institute
                  Vrije Universiteit Amsterdam                                             Thessaloniki, Greece
                  Amsterdam, The Netherlands                                                  skontopo@iti.gr
                      albert.merono@vu.nl

                          Sándor Darányi                                                    Ioannis Kompatsiaris
       Swedish School of Library and Information Science                             Information Technologies Institute
                      University of Borås                                                  Thessaloniki, Greece
                        Borås, Sweden                                                           ikom@iti.gr
                    sandor.daranyi@hb.se

ABSTRACT                                                                 Oxford Dictionary of English1 shows how definitions attributed to
Concept drift refers to the phenomenon that concepts change their        words are different in different periods of history. In the Dutch
intensional composition, and therefore meaning, over time. It is a       historical censuses (1795-1971) [15] the taxonomy of occupations
manifestation of content dynamics, and an important problem with         shows an extraordinary variation every decade, in line with the
regard to access and scalability in the Web of Data. Such drifts go      major transformations of labor in the society of that time. We call
back to contextual influences due to social embedding as suggested       the change of meaning of concepts over time concept drift. Concept
by e.g. topic analysis, news detection, and trends in social networks.   drift can have drastic effects in the performance of a system, like
Using DBpedia as a source of timestamped Linked Open Data, we            changing queries and inconsistent analyses.
analyze the interaction between a sample of popular keywords,                What causes concept drift to occur in these systems? In the spe-
as recorded by Google Trends, and their respective concept drifts        cific setting of the Semantic Web [3] (now also referred to as Web of
in DBpedia. For the latter task, we deploy SemaDrift, an ontology        Data), concepts in ontologies and taxonomies are regularly updated
evolution platform for detecting and measuring content dislocation       by humans in order to “reflect changes in the real world, changes in
dependent on context modification. Our hypothesis is that social         user requirements, and drawbacks in the initial design” [23]. Hence,
embedding and awareness is an important trigger for concept drift        concept drift in semantic systems has a traceable and direct origin
in crowdsourced knowledge bases on the Web.                              in humans. However, the more recent trend on Linked Data [9] in
                                                                         the Semantic Web, rather than manually building these ontologies
KEYWORDS                                                                 and taxonomies, has automated the way in which semantic systems
                                                                         obtain their concepts. A canonical example is DBpedia [14], which
Concept Drift, Semantic Web, DBpedia, Wikipedia, Google Trends
                                                                         relies largely on automated knowledge extraction methods to cre-
                                                                         ate Linked Data out of Wikipedia2 . In this situation, the causes of
                                                                         concept drift become more difficult to trace.
                                                                             There are various plausible explanations for the origin of concept
                                                                         drift in complex systems. One of them is the interaction of evolv-
                                                                         ing context with evolving content. Social awareness (instigated by
                                                                         events or the media) triggers a process of knowledge sharing on
                                                                         the Web. This process often results in changes in knowledge bases,
                                                                         which may have an impact in the meaning of concepts. Wikipedia,
1   INTRODUCTION                                                         the biggest collaboratively-built knowledge base of the Web, has
                                                                         been criticized for “allegedly exhibiting systemic bias, presenting a
Rather than remaining stable, permanent, and fixed, the meaning
                                                                         mixture of truths, half truths, and some falsehoods, and, in contro-
of concepts changes over time. The Historical Thesaurus of the
                                                                         versial topics, being subject to manipulation and spin” [18]. It is then
                                                                         worth considering whether the controversy, novelty or burst of a
                                                                         topic has an impact on how reality is formally defined in knowledge
                                                                         bases derived from Wikipedia, such as DBpedia [14].
                                                                             In this paper we propose a framework to measure the influence of
                                                                         user engagement on the Web with its effects on concept drift in Web
                                                                         crowd-sourced databases. We are interested in the process of public
© 2017 Copyright held by the author/owner(s).
SEMANTiCS 2017 workshop proceedings: Drift-a-LOD                            1 http://public.oed.com/historical-thesaurus-of-the-oed/
September 11-14, 2017, Amsterdam, Netherlands                               2 https://www.wikipedia.org/
Drift-a-LOD2017, September 2017, Amsterdam, The Netherlands                                                                   A. Meroño-Peñuela et al.


opinion influencing the feature composition of concepts as captured          In general terms, another relevant track is research into time
by automatic means. Hence, our research question is: what patterns        series of content. From a Natural Language Processing (NLP) per-
of influence can we discern between trends in queries by Web users, and   spective, a typical example is to study diachronic collocations: a
concept drift in crowd-sourced databases? To address this question,       word’s company (its collocates) may change over time, reflecting
we propose a tool chain that quantifies the trendiness of Web queries,    changes in that word’s meaning and/or in the focus of the discourse
and confronts it with measures of concept drift for Linked Data. This     in which it is embedded. However, traditional collocation extrac-
tool chain consists of SemaDrift [20], a concept drift measuring          tors treat the underlying text corpus as a homogenous whole, and
platform; Google Trends3 , an index of the popularity of Web user         thus cannot adequately account for such diachronic changes in
queries over time; and the different versions of DBpedia accessible       a word’s collocation behavior, hence the need for a combination
via Linked Data Fragments (LDF) [27].                                     of diachrony and contextuality [10]. From an information science
    Concretely, the contributions of this paper are:                      perspective, the study of conceptual dynamics [4] offers another
     • An automated and systematic way for retrieving time-specific       comprehensive set of considerations. By the mathematical models
        concept intensions from Linked Data sources (Section 3.1);        they exploit, both tracks preserve the underlying contextual depen-
     • A framework for studying the relationship between the pop-         dency of word content or meaning, ultimately going back to Harris’
        ularity of Web user queries and the drift of their associated     distributional hypothesis [8].
        concepts over time (Section 3);
     • An experimental application of this framework to recent            3     TRENDING CONCEPTS AND CONCEPT
        Web trending queries and the latest snapshots of DBpedia                DRIFT
        (Section 4).
                                                                          In this section we describe a workflow for studying the relationship
                                                                          between the popularity of Web user queries and the drift in concepts
2    RELATED WORK                                                         contained therein:
The problems of semantic change and drift concern various research
fields. In the areas of Semantic Web and knowledge representation,            (1) We use an extended LDF client to systematically retrieve
ontology evolution [13] addresses “the timely adaptation of an                    time-specific concept intensions of a chosen concept C (see
ontology and consistent propagation of changes to dependent arti-                 Section 3.2) from compatible Linked Data sources with the
facts” [1]. Features of evolution have been studied [22] and used                 Linked Data Fragments backend4 ;
for prediction using machine learning [17]. Gonçalves et al. [7]              (2) Using the concept intensions retrieved in the previous step,
use Description Logics to calculate differences between ontologies                we use SemaDrift [19] to measure intensional concept drift
(so-called semantic diffs). Wang et al. [28] define the semantics of              over int(C). This represents how much the concept C has
concept change and drift, and how to identify them. General sur-                  drifted in a certain time period;
veys of semantic change in other fields, including language, have             (3) Finally, we confront values of Trend and Drift, and we
recently appeared [20]. On the use of trends of Web user queries                  observe the relationship between measurements of concept
and changing semantics, the work by Tiddi et al. [26] illustrates                 drift for the concept C, and measurements of popularity for
the use of knowledge from the Semantic Web to explain patterns in                 a Web user query q(C) that matches C.
data, in particular on finding causes for trending queries in Google
Trends. To the best of our knowledge, no previous work addresses          3.1     Temporal DBpedia Concepts with LDF
the cause-effect relationship between trends and concept drift.           Wikipedia is “a free online encyclopedia with the aim to allow any-
   Standard means of observing changes in content include e.g.            one to edit articles”5 . Aligning with the mission of Linked Data
recognizing news in texts by topic detection and tracking [2], and        and the Semantic Web, DBpedia [14] aims at extracting structured
new event or burst detection [16], which are in essence similar to        content from Wikipedia, providing a means for semantically query-
time series analysis. Significant solutions range from extracting         ing relationships and properties of its content. We assume this
time-varying features from texts [24] to constructing timelines           structured content of DBpedia resources to formally represent the
for event classification based on word usage statistics [25] and          meaning of their associated concepts. In this first step, we select a
personalized newsfeeds based on information novelty [6]. In the           concept of interest C, and we query DBpedia to get the intension
latter, the inter- and intra-document dynamics of documents is            of C, int(C) (i.e. its defining properties), at various points in time.
considered to model how information evolves over time from article           Querying massive Linked Data sources like DBpedia entails
to article, as well as within individual articles. Such methods can be    various challenges. One approach includes submitting processing-
applied to the analysis of temporal dynamics in online text streams       intensive queries to SPARQL endpoints; another approach is to
such as newsfeed or e-mail [11, 12], or chronologically ordered           download and locally query massive data dumps that are possibly
documents [5]. These are models typically based on graph theory           not up-to-date. Linked Data Fragments (LDF) provide a conceptual
vs. vector space methods vs. probability theory, capturing local vs.      framework that delivers a uniform view on RDF interfaces, aim-
global context of content as a basis of the results, therefore our        ing to minimize server resource usage while still enabling clients
current models of content are context-dependent. However, this
dependency, although acknowledged, is typically not quantified, a
                                                                              4 To the best of our knowledge, currently DBpedia is the only Linked Data source
precondition for improved models.
                                                                          with such support.
    3 https://trends.google.com/                                              5 https://en.wikipedia.org/wiki/Wikipedia
A Study of Intensional Concept Drift in Trending DBpedia Concepts         Drift-a-LOD2017, September 2017, Amsterdam, The Netherlands


to query data sources efficiently [27]. In this work, we have de-
ployed an openly available Java LDF client6 , which we extended
for measuring intensional drift via the SemaDrift API.

3.2    Concept Drift and SemaDrift
To measure concept change between two versions of an ontology,
we use the concept drift framework proposed by Wang et al. [28],
which quantifies the change of meaning of concepts over time.
In this framework, the meaning of a concept C is defined as the
combination of its intension, extension, and label. The intension of
C, int(C), is the set of formal, explicit properties that axiomatically
define C. The extension of C, ext(C), is the set of its instances. The
label of C, label(C), is a human-readable string representing C.          Figure 1: Chosen concepts and GT scores (2014-01 – 2016-04).
   Over time, int(C), ext(C) and label(C) can change, and compro-
mise the identity and traceability of C. To address this, the frame-
work assumes that int(C) is the disjoint union of rigid and non-rigid     example, the DBpedia concept Terrorism in the European Union only
sets of properties, int(C) = intr (C) ∪ intnr (C)). intr (C) uniquely     matches the search-term terrorism in europe in GT. In this workflow
identifies C by some essential properties that do not change. This        we align C and q(C) manually. Next, we normalize the GT scores
allows the comparison of two variants of a concept at different           by picking a comparatively popular and stable topic over time that
points in time, even if intnr (C), ext(C) or label(C) change.             sets the maximum score (e.g. iPhone).7 All subsequent trend scores
   If two variants of C at two different times have identical int(C),     for other concepts are relative to this reference concept. We define
ext(C) and label(C), then there is no concept drift. Otherwise, the       the GT score for a concept C at time t as GT (C, t). Finally, we define
framework defines intensional, extensional, and label similarity          the two proxies of popularity and trendiness of a concept, p(C), t(C),
functions simint 7→ [0, 1], simex t 7→ [0, 1], siml abel 7→ [0, 1] to     as the arithmetic mean and standard deviation over the GT scores,
quantify meaning similarity. Then, there is extensional (intensional,     respectively:
label) concept change between two variants of C, C ′ and C ′′ , iff
                                                                                                               q Í
                                                                                 p(C) = n1 GT (C, t), t(C) = n1 (GT (C, t) − p(C))2
                                                                                           Í
simex t (C ′, C ′′ ) , 1.
   Using the above definitions as its foundation, SemaDrift [20]          4     PRELIMINARY EVALUATION
constitutes a cutting edge suite of metrics and tools for measuring
                                                                          In order to evaluate our framework, we propose a preliminary
concept drift in different versions of an ontology, under an ontol-
                                                                          experiment to measure the relationship of Web user queries in the
ogy evolution perspective. As demonstrated in [21], SemaDrift is
                                                                          intensional concept drift of DBpedia concepts between January of 2014
totally domain agnostic, offering the capability of applying the un-
                                                                          and April of 2016. By this we adapt Harris’ distributional hypothesis
derlying metrics and methods to any ontology originating from
                                                                          to RDF statements, i.e. we assume that intensional concept drifts go
any domain of application. The platform consists of (a) an API
                                                                          back to the social embedding of the detection environment, in other
for programmatically accessing the core drift measuring meth-
                                                                          words, the feature composition of concepts is context-dependent.
ods, (b) a Protégé plug-in [19], and, (c) a standalone desktop ap-
                                                                             To do so, we sample a small (N = 11) set of DBpedia concepts C
plication. The full suite is available at http://mklab.iti.gr/project/
                                                                          and their equivalent search-terms q(C) GT scores on that period.
semadrift-measure-semantic-drift-ontologies.
                                                                          The chosen concepts, together with their GT scores over time,
   In this work we are deploying the core SemaDrift API, and we are
                                                                          are shown in Figure 1. We chose these concepts considering one
particularly monitoring intensional drifts of DBpedia concepts; i.e.
                                                                          interest group, with both trendy and popular concepts (iPhone,
each DBpedia entry is essentially a class instance with associated
                                                                          Donald Trump, Pokemon); and a control group, with concepts of
properties, thus it makes no sense to measure drifts in its extension
                                                                          scarce trendiness and popularity (Mona Lisa, Colonization of Mars,
(instances have no extension) or label (entries in DBpedia maintain
                                                                          Battle of Stalingrad).
their labels unaltered).

3.3    Confronting Trends with Drift                                      4.1     Results
                                                                          We use SemaDrift to calculate the intensional concept drift values of
Google Trends (GT) is a Web service that shows how often a par-
                                                                          int(C) for the chosen set of concepts of Figure 18 . Figure 2 confronts
ticular search-term is entered relative to the total search-volume of
                                                                          these intensional concept drift values with their popularity/trendi-
the Google Search engine. For example, it is possible to compare the
                                                                          ness p(C), t(C) scores derived from GT.
relative volume of queries between the search terms Donald Trump
                                                                             In Figure 2 we can observe an expected distribution over the
and climate change in a certain time period. These relative volumes
                                                                          x-axis of non-trendy vs. trendy concepts, to the left and the right,
of search-terms are given with a measurement from 0 (no volume)
                                                                          respectively. However, the patterns of intensional concept drift
to 100 (maximum volume). In order to obtain these, a matching
                                                                          with respect to variations in trends are not as expected. Quite
needs to be made between the chosen concept of interest C and
its corresponding search-term query, q(C), which is not trivial. For          7 We do this by using GT’s Most searched feature over matching time periods.
                                                                              8 A detailed table with all drifting values and relevant predicates can be found at
   6 https://github.com/LinkedDataFragments/Client.Java                   https://goo.gl/yQ531r.
Drift-a-LOD2017, September 2017, Amsterdam, The Netherlands                                                                                       A. Meroño-Peñuela et al.


                                                                                                 News Transcription and Understanding Workshop.
                                                                                             [3] Tim Berners-Lee, James Hendler, and Ora Lassila. 2001. The Semantic Web.
                                                                                                 Scientific American 284, 5 (2001), 34–43.
                                                                                             [4] S. Darányi and P. Wittek. 2013. Demonstrating Conceptual Dynamics in an
                                                                                                 Evolving Text Collection. Journal of the American Society for Information Science
                                                                                                 and Technology 64, 12 (2013), 2564–2572. DOI:http://dx.doi.org/10.1002/asi.22940
                                                                                             [5] G.P.C. Fung, J.X. Yu, P.S. Yu, and H. Lu. 2005. Parameter free bursty events
                                                                                                 detection in text streams. In Proceedings of VLDB-05, 31st International Conference
                                                                                                 on Very Large Data Bases. Trondheim, Norway, 181–192.
                                                                                             [6] E. Gabrilovich, S. Dumais, and E. Horvitz. 2004. Newsjunkie: providing personal-
                                                                                                 ized newsfeeds via analysis of information novelty. In Proceedings of WWW-04,
                                                                                                 13th Int. Conf. on the World Wide Web. New York City, NY, USA, 482–490.
                                                                                             [7] R. S. Gonçalves, B. Parsia, and U. Sattler. 2011. Analysing Multiple Versions of an
                                                                                                 Ontology: A Study of the NCI Thesaurus. In Proceedings of the 24th Int. Workshop
                                                                                                 on Description Logics (DL 2011), Vol. 745. CEUR Workshop Proceedings.
                                                                                             [8] Z. Harris. 1970. Distributional structure. In Papers in structural and transforma-
                                                                                                 tional Linguistics, Z. Harris (Ed.). Humanities Press, NY, USA, 775–794.
       Figure 2: Trendiness vs. intensional concept drift.                                   [9] Tom Heath and Christian Bizer. 2011. Linked Data: Evolving the Web into a Global
                                                                                                 Data Space (1st ed.). Morgan and Claypool. 1–136 pages.
                                                                                            [10] Bryan Jurish. 2016. Diachronic Collocations and Genre: a case for DiaCollo?.
                                                                                                 In Diachronic Corpora, Genre, and Language Change, Richard Jason Whitt (Ed.).
the contrary: the highest concept drift measurements correspond                                  22–24. http://kaskade.dwds.de/~jurish/pubs/jurish2016genre.pdf
to concepts with the lowest popularity/trend scores. In particular,                         [11] J. Kleinberg. 2003. Bursty and hierarchical structure in streams. Data Mining and
concepts like Mona Lisa, climate change and Battle of Stalingrad                                 Knowledge Discovery 7, 4 (2003), 373–397.
                                                                                            [12] J. Kleinberg. 2006. Temporal dynamics of on-line information streams. Data
have very low t(C) scores (0.13, 0.33, 0.09) but very high concept                               Stream Management: Processing High-Speed Data Streams (2006).
drift (1.56, 1.73, 1.74). Contrarily, concepts with the highest t(C)                        [13] P. De Leenheer and T. Mens. 2008. Ontology Evolution: State of the Art and
scores, such as Donald Trump (8.99), Pokemon (2.29) and iPhone                                   Future Directions. In Ontology Management for the Semantic Web, Semantic Web
                                                                                                 Services, and Business Applications. Springer.
(6.75)), have increasing values of concept drift (1.53, 1.16, 1.36) but                     [14] J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S.
never reach that of the non-trendy concepts. Less popular, but very                              Hellmann, M. Morsey, P. van Kleef, S. Auer, and C. Bizer. 2014. DBpedia - A Large-
                                                                                                 scale, Multilingual Knowledge Base Extracted from Wikipedia. Semantic Web –
trendy concepts such as Donald Trump change their relevance when                                 Interoperability, Usability, Applicability (2014). http://www.semantic-web-journal.
observing p(C), but the tendency to score less concept drift prevails.                           net/system/files/swj558.pdf.
   These two unexpected patterns could be explained by the experts                          [15] Albert Meroño-Peñuela, Christophe Guéret, Ashkan Ashkpour, and Stefan
                                                                                                 Schlobach. 2015. CEDAR: The Dutch Historical Censuses as Linked Open Data.
vs crowds hypothesis. Under this hypothesis, most significant edits                              Semantic Web – Interoperability, Usability, Applicability (2015). In press.
in Wikipedia in a concept C (which derive in high drift scores) are                         [16] R. Papka. 1999. On-line new event detection, clustering, and tracking. Ph.D.
poorly explained by querying trends over C, but much related to                                  Dissertation. University of Massachusetts Amherst.
                                                                                            [17] Catia Pesquita and Francisco M. Couto. 2012. Predicting the Extension of Biomed-
a tiny amount of Wikipedia curators (the “experts”) taking care                                  ical Ontologies. PLoS Computational Biology 8, 9 (2012), e1002630.
of domain-expert content (i.e. Mona Lisa, Battle of Stalingrad). So,                        [18] Michael Petrilli. 2008. Wikipedia or Wickedpedia? http://educationnext.org/
                                                                                                 wikipedia-or-wickedpedia/, Education Next 8, 2 (2008).
experts would be responsible of concept drift in less trendy topics.                        [19] T. G. Stavropoulos, S. Andreadis, E. Kontopoulos, M. Riga, P. Mitzias, and I. Kom-
However, the “crowds” seem to be able to influence concept drift                                 patsiaris. 2017. The SemaDrift Protégé Plugin to Measure Semantic Drift in
approximately linearly (Pokemon, iPhone, Donald Trump) beyond a                                  Ontologies: Lessons Learned. In Knowledge Engineering and Knowledge Manage-
                                                                                                 ment (EKAW 2016), Vol. 10180. Springer, Cham, 29–39.
certain trendiness threshold. This would explain high-quantity/low-                         [20] T. G. Stavropoulos, S. Andreadis, M. Riga, E. Kontopoulos, P. Mitzias, and I.
quality edits in Wikipedia derived from controversy and popularity,                              Kompatsiaris. 2016. A Framework for Measuring Semantic Drift in Ontologies.
and relate to the popularity required to score some increasing con-                              In Proceedings of SuCCESS-16, 1st Int. Workshop on Semantic Change & Evolving
                                                                                                 Semantics, co-located with the 12th European Conference on Semantics Systems
cept drift by non-experts. Despite this, highest trend values do not                             (SEMANTiCS-16).
seem to involve deep intensional changes in concepts, which only                            [21] T. G. Stavropoulos, E. Kontopoulos, A. Meroño Peñuela, S. Tachos, S. Andreadis,
                                                                                                 and I. Kompatsiaris. 2017. Cross-domain Semantic Drift Measurement in Ontolo-
occur in expert curated, low-trendiness concepts.                                                gies Using the SemaDrift Tool and Metrics. In 3rd Workshop on Managing the
                                                                                                 Evolution and Preservation of the Data Web (MEPDaW 2017).
5    CONCLUSION AND FUTURE WORK                                                             [22] Ljiljana Stojanovic. 2004. Methods and Tools for Ontology Evolution. Ph.D. Disser-
                                                                                                 tation. University of Karlsruhe.
In this paper, we study the influence of trending Web queries over                          [23] Ljiljana Stojanovic and Boris Motik. 2002. Ontology Evolution within Ontology
the fundamental properties of collaborative Web knowledge bases.                                 Editors. In Evaluation of Ontology-based Tools Workshop, 13th Int. Conf. on Knowl-
                                                                                                 edge Engineering and Knowledge Management (EKAW 2002), Vol. 62. CEUR-WS.
In the period of 2014 January-2016 April and a small sample of                              [24] R. Swan and J. Allan. 1999. Extracting significant time varying features from
concepts with variable popularity, we find patterns that fit the                                 text. In Proceedings of CIKM-99, 8th International Conference on Information and
possible explanation of two conflicting trends (“experts vs. crowds”)                            Knowledge Management. Kansas City, MO, USA, 38–45.
                                                                                            [25] R. Swan and D. Jensen. 2000. Timemines: Constructing timelines with statistical
with competing influence on intensional concept drift. We plan                                   models of word usage. In Proceedings of KDD-2000 Workshop on Text Mining.
to add scalability to our framework in order to confirm the above                                Boston, MA, USA, 73–80.
                                                                                            [26] Ilaria Tiddi. 2016. Explaining Data Patterns using Knowledge from the Web of Data.
findings, and to investigate automatic mapping methods between                                   Ph.D. Dissertation. Knowledge Media Institute, The Open University.
concepts and their corresponding search-term queries.                                       [27] R. Verborgh, M. van der Sande, O. Hartig, J. Van Herwegen, L. De Vocht, B.
                                                                                                 De Meester, G. Haesendonck, and P. Colpaert. 2016. Triple Pattern Fragments:
                                                                                                 a Low-cost Knowledge Graph Interface for the Web. Journal of Web Semantics
REFERENCES                                                                                       37–38 (2016), 184–206. DOI:http://dx.doi.org/doi:10.1016/j.websem.2016.03.003
 [1] Alexander Mäedche, Boris Motik, and Ljiljana Stojanovic. 2003. Managing multi-         [28] S. Wang, S. Schlobach, and M. C. A. Klein. 2010. What Is Concept Drift and How
     ple and distributed ontologies in the Semantic Web. The VLDB Journal — The                  to Measure It?. In Knowledge Engineering and Management by the Masses - 17th
     International Journal on Very Large Data Bases 12, 4 (2003), 286–300.                       Int. Conf., EKAW 2010. Proceedings. LNCS 6317, Springer, 241–256.
 [2] J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. 1998. Topic detection
     and tracking pilot study: Final report. In Proceedings of the DARPA Broadcast