=Paper= {{Paper |id=Vol-2063/dal-paper4 |storemode=property |title=Exploring Concept Representations for Concept Drift Detection |pdfUrl=https://ceur-ws.org/Vol-2063/dal-paper4.pdf |volume=Vol-2063 |authors=Oliver Becher,Laura Hollink,Desmond Elliott |dblpUrl=https://dblp.org/rec/conf/i-semantics/BecherHE17 }} ==Exploring Concept Representations for Concept Drift Detection== https://ceur-ws.org/Vol-2063/dal-paper4.pdf
 Exploring Concept Representations for Concept Drift Detection
                Oliver Becher                                   Laura Hollink                               Desmond Elliott
     Centrum Wiskunde & Informatica                  Centrum Wiskunde & Informatica                      University of Edinburgh
       Amsterdam, The Netherlands                      Amsterdam, The Netherlands                           United Kingdom
             becher@cwi.nl                                   hollink@cwi.nl                                d.elliott@ed.ac.uk
ABSTRACT                                                                  be measured from changes in how it appears in the context of a
We present an approach to estimating concept drift in online news.        document collection. This is different from other work on concept
Our method is to construct temporal concept vectors from topic-           change in KOSs in the sense that we ignore changes in the structure
annotated news articles, and to correlate the distance between the        of the KOS.
temporal concept vectors with edits to the Wikipedia entries of              This paper is an initial step towards understanding how the con-
the concepts. We find improvements in the correlation when we             text of a concept can be represented to effectively capture concept
split the news articles based on the amount of articles mentioning        change. Our representation is based on the co-occurrence between
a concept, instead of calendar-based units of time.                       concepts that appear as annotations of documents in a diachronic
                                                                          collection: if two concepts co-occur if they are annotations of the
KEYWORDS                                                                  same document. Hence, a concept can be seen as a vector of co-
                                                                          occurrence counts with other concepts in the KOS. Concept change
Concept drift, Vector representations, News, Wikipedia edits
                                                                          can then be measured by comparing vectors created for different
                                                                          time spans in the collection. We experiment with various versions
                                                                          of this basic idea, and apply it to detect change in an annotated
                                                                          document collection: the ION dataset of 300k online news articles,
                                                                          annotated with Wikipedia pages [4].
                                                                             To evaluate our method, we use Wikipedia edit counts. This is
                                                                          based on the idea that a Wikipedia article is edited when a change
1   INTRODUCTION                                                          to the page was needed; hence, a higher number of edits may signify
Concepts in Knowledge Organisation Systems (KOSs) are used                a change in the underlying concept. Generally speaking, evaluation
to provide structured annotations and background knowledge in             of concept drift detection methods is hampered by a lack of large
a wide variety of applications. They enhance interoperability be-         scale evaluation datasets. Wikipedia edits are not to be seen as a
tween datasets and enable structured access to annotated document         gold standard of concept drift. While some edits might be due to
collections. These benefits, however, are compromised when con-           a change in the concept, others might be, for example, additions
cept change (or drift) occurs. Wang et al. [8] define three types of      of missing information or corrections of previous mistakes. Our
concept drift: (1) change in the intension of the concept, defined        assumption is that even though Wikipedia edit counts are a noisy
as the definition or the properties of the concept; (2) change in the     signal with respect to concept change, a correlation between our
extension, or the instances, of a concept; and (3) change in the label    change scores and the edits counts does say something about the
of the concept. Each type of concept drift may lead to problems           effectiveness of our method.
for applications working with KOSs. For example, an annotation
of a document may become invalid if the intension of the concept
                                                                          2 REPRESENTING CONCEPTS
changes. Correspondences between two concepts in different KOSs
may become incorrect if the extension of one of them changes.             2.1 Creating Concept Vectors
A user’s keyword query on a historic corpus may be interpreted            Given a concept vocabulary C with N concepts, we create vector
incorrectly if the (prevalent) label to refer to a concept has changed.   representations of the concepts through their usage in a document
   Significant progress has been made in the detection of meaning         collection.
change of words (e.g., [3, 9]). They are based on distributional             We assume there is a collection of time-ordered documents D.
methods, where the meaning of a word is defined as the context in         A document di is annotated with M topic annotations t 1 , . . . , t M ,
which it appears. A change in context over time may then signify a        drawn from a total of T topics. Each document in the collection
change in meaning. In this paper, we study change in the meaning of       can be represented as a binary document topic vector, di ∈ R 1xT .
concepts in a KOS. Drawing inspiration from work on word-change           An element in the document topic vector takes a value of 1 if the
detection, we aim to explore whether the change of a concept can          document has been annotated with that topic. We also assume a
                                                                          function f: T → C that maps between the topic annotation and
                                                                          concept vocabulary.
                                                                             We construct a concept vector cj for each concept in our vocabu-
© 2017 Copyright held by the author/owner(s).                             lary c 1 , . . . , c N from co-occurrence counts of the topic annotations
                                                                          in documents in the document collection. The set of concept vec-
SEMANTiCS 2017 workshop proceedings: Drift-a-LOD
                                                                          tors forms a sparse matrix C ∈ R N ×N , where each row defines a
September 11-14, 2017, Amsterdam, Netherlands
                                                                          concept through co-occurrence with other concepts.
Drift-a-LOD’17, September 2017, Amsterdam, Netherlands                                                    Oliver Becher, Laura Hollink, and Desmond Elliott


    Our concept vectors are co-occurrence counts. We reduce the                              We define our concept vocabulary C as a subset of TextRazor’s
effect of frequently occurring concepts by re-weighting the vec-                          topic vocabulary T: we retain only topics that are associated with at
tors using a TF-IDF-like weighting scheme, so that tf-idf(c i , c j ) =                   least 2 articles. In preliminary experiments, we found that concepts
t f (c i , c j ) ∗ id f (c i ), where t f (c i , c j ) is the number of times that con-   that are associated with too few articles have sparse representations
cept c i co-occurs with concept c j and id f (c i ) = loд d f (C,c           N     + 1,   resulting in unrealistic change scores between the representations.
                                                                                i)
with d f (C, c i ) as the count of c i concept annotations in the entire                  This leaves us with N=70,000 concepts. The mapping function f: T →
concept vocabulary C.                                                                     C is trivial in this case. However, the structured nature of Wikipedia,
                                                                                          and the links that it provides to other concept vocabularies, provide
2.2      Temporal Concept Vectors                                                         starting points for other mapping functions, allowing us to explore
                                                                                          other concept vocabularies in the future.
Recall that we are interested in measuring the change in the mean-
                                                                                             We construct concept vectors using the method outlined in Sec-
ing of a concept over time. We redefine C to include a temporal
                                                                                          tion 2. The vocabulary of the concept vectors is defined over the
dimension, V ∈ R N x N x K , where the third dimension represents
                                                                                          Wikipedia entries, therefore it is trivial to map the topic annotations
K units of time, and k Vk = V ∈ R N x N . There are many ways
                      Í
                                                                                          to the concept vectors.
to define K: the document collection can be split into days, weeks,
months, or any other valid approach to splitting the collection ac-
cording to the sequential ordering of the documents. Note that the                        3.2    Visualization
co-occurrence statistics over topic annotations needs to be calcu-                        To visualize the change that a concept c has undergone, we create
lated such that only documents timestamped between consecutive                            a stream graph [1] of the temporal concept vectors of c. Figure 1,
units of time are used in the calculation, i.e. t=s 1 and t=s 2 are used                  for example, plots the temporal vectors of the Wikipedia concept
to define a temporal concept vector vj,s2 at t=s 2 .                                      Police. Each ‘stream’ represents a concept that co-occurs with Police
                                                                                          in the document collection. The thickness of the line represents
2.3      Temporal Vector Distance                                                         the co-occurrence count at a certain time period. Since stream
We measure the change in the meaning of concepts by comparing                             graphs are suited to convey changes over time of only a limited
the vectors in the temporal concept matrix between subsequent                             number of concepts, we select only those that occur most frequently.
units of time. Specifically, we measure the change in a concept cj                        Specificaly, we create ’streams’ for only those concepts that are
between time k and k − 1 using a similarity metric sim(·, ·):                             among the top 5 most frequently co-occurring concepts in any of
                                                                                          the temporal concept vectors of concept c.
               distance(vj , s, s-1) = sim(vj,s , vj,s−1 )        (1)                        In figures 1, 2, and 3 we plot two concepts for which the average
   We experiment with two similarity metrics: cosine similarity, pre-                     change is low (measured as a high average cosine similarity between
viously used to detect concept drift [7], and KL-divergence (when                         12 temporal vectors) and one where the average change score is
the vectors represent distributions).                                                     high. Figure 1 shows that Police is a stable concept: the top five
                                                                                          most frequently occurring concepts remain frequent troughout the
3  APPLICATION TO AN ANNOTATED NEWS                                                       year, and the volume of documents in which they co-occur hardly
   COLLECTION                                                                             fluctuates. However, a concept might change on a larger time scale
                                                                                          than given in the data. Nonetheless, Police seems to be more stable
3.1 Dataset and Model Application                                                         than other concepts in the time span.
We explore our method for constructing concept representations                               The concept Labour_Party (Figure 2) is stable as well: although
and measuring concept change with a dataset of online news ar-                            there is a burst in the volume of documents about this concept, there
ticles [4]. This data set contains news articles together with topic                      is hardly a change in which concepts co-occur in these documents.
annotations and images in their natural textual context. The rich-                        In other words, there is change in how much reporting there is
ness of information and meta data in this dataset can give many                           about the Labour_Party, but not in how they are reported.
ways to define and explore concepts, while a defined structure of                            Figure 3 shows the streamgraph of the New York University. We
the data helps to use it reliably and consistently.                                       can see that the most co-occurring topics are constantly chang-
   The dataset contains articles published online between August                          ing in the streamgraph, both in periods with a high volume of
2014 – August 2015. In total, it includes more than 300K articles from                    documents and in periods with a low volume of documents. This
five publishers across British and US English sources: Daily Mail,                        suggests changes in how much and how New York University has
The Independent, New York Times, Huffington Post, and the Wash-                           been reported in the news.
ington Post. The articles are annotated with topics using TextRazor1 .
TextRazor uses Wikipedia as a topic vocabulary. This vocabulary                           4 TOWARDS A QUANTITATIVE EVALUATION
ranges from narrowly defined concepts, e.g., The United States
Women’s Soccer Team or Electromagnetism, to broader concepts,                             4.1 Measuring Concept Change
e.g., Sport or Science. The average number of topic annotations per                       Concept change detection is hard to evaluate for a lack of gold
article is 25 broad ’Category pages’ and 5 specific (non-category)                        standard datasets [5]. Kenter et al. [6] use a small sets of 21 human-
pages, giving in total 122,000 distinct topic annotations on all arti-                    judged change scores. Frermann and Lapata [2] indirectly evaluate
cles.                                                                                     change detection by using it in an application for which a gold
1 http://www.textrazor.com                                                                standard exists, namely the SemEval task for dating text. To the best
Exploring Concept Representations for Concept Drift Detection                  Drift-a-LOD’17, September 2017, Amsterdam, Netherlands




  Figure 1: WP:Police streamgraph shows a stable set of
  top-5 concepts in its temporal vectors. (See Section 3.2
  for more details.)
                                                                                 Figure 4: Scatter plot of average cosine simi-
                                                                                 larities and annotation count of all concepts


                                                                           We perform an experiment on 964 concepts. Since it seems likely
                                                                        that the number of articles that a concept is related to plays a role,
                                                                        we draw a stratified random sample from our concept vocabulary to
                                                                        include both frequently and infrequently used concepts. We select
                                                                        three different strati of even size. Group 1 contains concepts which
                                                                        are related to more than 500 articles. Group 2 contains concepts
  Figure 2: WP:Labour_Party_(UK) streamgraph has a sta-                 which are related to at least 200 articles but not more than 500.
  ble set of top-5 concepts but a lot of activity centered              Group 3 contains concepts with at least 24 articles but less than 200.
  around a specific time.                                               The sample includes only concepts that map to ‘regular’ Wikipedia
                                                                        pages and not Category pages.
                                                                           Figure 4 plots the number of articles that a concept is related to
                                                                        against the average cosine similarity between the temporal vectors
                                                                        of that concept. This shows that the more frequent a concept is used
                                                                        as an annotation, the higher the average cosine similarity, i.e., the
                                                                        lower the change. This is analogous to the change of meaning of
                                                                        words [3], where the semantic changes of words scale with inverse
                                                                        frequency, known as the law of conformity.
                                                                           We compare four models, each with different settings regarding
                                                                        the way that time units are set, the use of TF-IDF, and the choice
                                                                        of similarity measure (either cosine similarity or KL-divergence).
  Figure 3: WP:New_York_University streamgraph under-
  goes constantly shifting concept representation in our                4.2    Models
  dataset.
                                                                           4.2.1 Fixed Time Bins (Cosine). Starting with the most basic
                                                                        setup of our method, we calculate temporal concept vectors for time
                                                                        frames (or bins) of a fixed duration. With n time frames, each frame
of our knowledge, large scale datasets to directly valuate change       covers an n/year th of the dataset. For example, with 52 frames,
detection, do not exist.                                                each frame covers exactly one week. We use the cosine similarity
   For our application, we explore the use of Wikipedia edit rates to   to calculate change scores between each temporal concept vector.
evaluate our method of concept change representation. We believe
that the act of editing a Wikipedia page can signal a change in the        4.2.2 Flexible Time Bins (Cosine). In this model, we calculate
information that is relevant to that entry.                             temporal concept vectors for time periods that each cover a fixed
   Specifically, given a concept c and a pre-defined K units of time,   amount of articles. Thus, time frames differ in length of days rather
we measure changes scores as the consecutive temporal vector dis-       than amount of data. The amount of articles per bin depends on the
tances for concept c (Section 2.3); then, we count the number of        total amount of articles available per concept. Analogous to Fixed
Wikipedia edits to the aligned article during each of the K units of    Time Bins, we create n bins, therefore we assign a nth of the total
time. We evaluate our method by measuring the Spearman correla-         amount of articles to each bin. However, a concept may have such
tion between the change scores (i.e. the temporal vector distances)     an amount of articles that does not split evenly into n bins. Thus, it
and the Wikipedia edit counts. The higher the correlation, the more     may be split into more than n bins. With these vectors, we use the
accurately the temporal concept vectors can estimate the rate of        cosine similarity to calculate change scores. We use the same time
change of the Wikipedia entries.                                        frames to bin the Wikipedia edits and estimate a correlation.
Drift-a-LOD’17, September 2017, Amsterdam, Netherlands                                        Oliver Becher, Laura Hollink, and Desmond Elliott

                Run / n bins > 100 > 52 > 24 > 12 > 6
     Fixed Time Bins (Cos) 0.07     0.18 0.22 0.36 0.26
   Flexible Time Bins (Cos) -0.2 -0.19 -0.14 0.03 0.33
             - TF-IDF (Cos) -0.2    -0.2 -0.13 0.0 0.26
    Flexible Time Bins (KL) 0.23    0.25 0.29 0.19 -0.3
Table 1: Average Spearman correlation between concept sim-
ilarity scores and Wikipedia edits. Negative correlations are
good for cosine similarity; positive correlations are good for
KL-divergence.




   4.2.3 Flexible Time Bins (No TF-IDF, Cosine). Exactly the same
as Flexible Time Bins (Cosine) except we do not re-weight the                          (a) n=12                                          (b) n=100
temporal concept vectors using TF-IDF.

   4.2.4 Flexible Time Bins (KL-divergence). This model is identical      Figure 5: Distribution of Spearman correlations for Flexi-
to Flexible Time Bins except we measure the distance between              ble Time Bins (KL) with n=12 (left) or n=100 (right) time
temporal concept vectors using Kullback-Leibner divergence (KL)           bins. The ratio of positive/negative correlations is much im-
instead of cosine similarity.                                             proved by having more time bins.


4.3    Results                                                            explored to what extend concept change can be evaluated by corre-
We collect Spearman correlation coefficients for 964 concepts using       lating the distance between its temporal concept vectors and edits
different numbers of time frames (6, 12, 24, 52, and 100). Table 1        to the Wikipedia article corresponding to the concept.
shows the average correlation over concepts that are significantly           We found that a flexible approach to defining a window of time
correlated with Wikipedia edits. Note that the experiments with Co-       was more successful than using calendar-based windows of time.
sine similarity measure between temporal concepts should return a         We also found that having more windows of time resulted in better
negative correlation, while the experiments with the KL-divergence        correlations between the temporal vector distances and Wikipedia
distance should return a positive correlation. The results in Table 1     article edits.
show that the performance of the models decreases as we decrease             Future work includes an analysis of which types of concepts
the number of time bins.                                                  correlate to Wikipedia edits counts, to get more insights into the
   The Fixed Time Bins (Cosine) model only returns positive cor-          use of Wikipedia as an evaluation tool. Similarly, we could look into
relations, indicating that fixed units of time (in this case, splitting   the types of edits made on Wikipedia to distinguish actual change
the articles into months) does not act as a reliable proxy for con-       from simple growth of an article.
cept change in our dataset. The Flexible Time Bin experiments
(Cosine) and (-TF-IDF) are better correlated with Wikipedia edits         REFERENCES
than the Fixed Time Bin model. We do not find a difference in not          [1] Lee Byron and Martin Wattenberg. 2008. Stacked Graphs - Geometry and Aes-
                                                                               thetics. IEEE transactions on visualization and computer graphics 14 6 (2008),
re-weighting the concept vectors using TF-IDF. Finally, we find                1245–52.
a small improvement from using KL-divergence as the temporal               [2] Lea Frermann and Mirella Lapata. 2016. A Bayesian Model of Diachronic Meaning
vector distance metric instead of Cosine similarity. Throughout, we            Change. Transactions of the ACL 4 (2016), 31–45.
                                                                           [3] William L Hamilton, Jure Leskovec, and Dan Jurafsky. 2016. Diachronic
can see that the number of temporal bins n is a crucial parameter              word embeddings reveal statistical laws of semantic change. arXiv preprint
in our experiment.                                                             arXiv:1605.09096 (2016).
                                                                           [4] Laura Hollink, Adriatik Bedjeti, Martin van Harmelen, and Desmond Elliott. 2016.
   We performed a follow-up analysis of the effect of the number               A Corpus of Images and Text in Online News. (2016).
of temporal bins. The histograms in Figures 5a to 5b show the              [5] Laura Hollink, Sándor Darányi, Albert Meroño Peñuela, and Efstratios Kontopou-
distributions of the Spearman correlations for the Flexible Time Bins          los. 2017. First Workshop on Detection, Representation and Management of
                                                                               Concept Drift in Linked Open Data: Report of the Drift-a-LOD2016 Workshop:
(KL) model with n=12 or n=100. We find that the ratio of positively            Front Matter.. In Knowledge Engineering and Knowledge Management. EKAW 2016
correlations to negative correlations is substantially reduced by              (Lecture Notes in Computer Science), Vol. 10180. 15–18.
having more time bins. More time bins clearly improves the quality         [6] Tom Kenter, Melvin Wevers, Pim Huijnen, and Maarten de Rijke. 2015. Ad hoc
                                                                               monitoring of vocabulary shifts over time. In Proceedings of the 24th International
of the concept vectors.                                                        Conference on Information and Knowledge Management. 1191–1200.
                                                                           [7] Astrid van Aggelen, Laura Hollink, and Jacco van Ossenbruggen. 2016. Combining
                                                                               distributional semantics and structured data to study lexical change. In European
5     CONCLUSION AND FUTURE WORK                                               Knowledge Acquisition Workshop. Springer, 40–49.
We explored concept change using vector space concept repre-               [8] Shenghui Wang, Stefan Schlobach, and Michel Klein. 2011. Concept drift and how
                                                                               to identify it. Web Semantics: Science, Services and Agents on the World Wide Web
sentations. The concept vectors were constructed from topic co-                9, 3 (2011), 247 – 265. https://doi.org/10.1016/j.websem.2011.05.003 Semantic
occurrence in a large collection of online news articles. We in-               Web Dynamics Semantic Web Challenge, 2010.
                                                                           [9] Yating Zhang, Adam Jatowt, and Katsumi Tanaka. 2016. Towards understand-
troduced a temporal aspect to the vectors by requiring the co-                 ing word embeddings: Automatically explaining similarity of terms. 2016 IEEE
occurrences to happen within pre-defined windows of time. We                   International Conference on Big Data (Big Data) (2016), 823–832.