=Paper= {{Paper |id=Vol-2699/paper22 |storemode=property |title=Visualizing and Quantifying Vocabulary Learning During Search |pdfUrl=https://ceur-ws.org/Vol-2699/paper22.pdf |volume=Vol-2699 |authors=Nilavra Bhattacharya,Jacek Gwizdka |dblpUrl=https://dblp.org/rec/conf/cikm/BhattacharyaG20 }} ==Visualizing and Quantifying Vocabulary Learning During Search== https://ceur-ws.org/Vol-2699/paper22.pdf
Visualizing and Quantifying Vocabulary Learning
During Search
Nilavra Bhattacharyaa , Jacek Gwizdkaa
a School of Information, The University of Texas at Austin, USA



                                          Abstract
                                          We report work in progress for visualizing and quantifying learning during search. Users initiate a search session with a
                                          Pre-Search Knowledge state. During search, they undergo a change in knowledge. Upon conclusion, users attain a Post-
                                          Search Knowledge state. We attempt to measure this dynamic knowledge-change from a stationary reference point: Expert
                                          Knowledge on the search topic Using word-embeddings of searchers’ written summaries, we show that w.r.t. Expert Knowl-
                                          edge, there is observable and quantifiable difference between the Pre-Search knowledge (Pre-Exp distance) and Post-Search
                                          knowledge (Post-Exp distance).

                                          Keywords
                                          search as learning, quantifying learning, expert knowledge, word embedding


         Pre-Search
         Knowledge
                                                                                                                    tual multiple choice questions (MCQs). The answer
                                                                                                                    options can be a mixture of fact-based responses (TRUE,
                                                                                                                    FALSE, or I DON’T KNOW ), [3, 4] or recall-based re-
                                                                                                                    sponses (I remember / don’t remember seeing this infor-
                                                                                                                    mation) [5, 6]. Constructing topic-dependant MCQs
                                       Searching                                                                    may take time and effort, which may be aided by auto-
                                                                                                                    mated question generation techniques[7]. For evalua-
                                                                                                                    tion, this approach is the easiest, and often automated.
                    Post-Search                                                                         Expert      However, MCQs allow respondents to answer correctly
                     Knowledge                                Post‒Exp Distance                         Knowledge   by guesswork. The third approach lets searchers
                                                                                                                    write natural language summaries or short answers,
Figure 1: Conceptual framework of Search-as-Learning.                                                               before and after the search [8, 2]. Depending on ex-
                                                                                                                    perimental design, prompts for writing such responses
                                                                                                                    can be generic (least effort) [9] or topic-specific (some
1. Introduction                                                                                                     effort) [7]. While this approach can provide the rich-
                                                                                                                    est information about the searcher’s knowledge state,
An important aspect of understanding learning during                                                                evaluating such responses is the most challenging, and
web search is to measure and quantify learning, possi-                                                              requires extensive human intervention.
bly in an automated fashion. Recent literature adopts                                                                  We report progress on extending work by [9], and
three broad approaches for this purpose. The first ap-                                                              take the third approach mentioned above. We attempt
proach asks searchers to rate their self-perceived pre-                                                             to visualize and quantify vocabulary learning during
search and post-search knowledge levels [1, 2]. This                                                                search, using natural language Pre-Search and Post-
approach is the easiest to construct, and can be gener-                                                             Search responses. The previous authors used sentence
alized over any search topic. However, self-perceptions                                                             embedding models, and reported not finding strong as-
may not objectively represent true learning. The sec-                                                               sociations between search interactions and knowledge
ond approach tests searchers’ knowledge using fac-                                                                  change measures. A possible reason is that sentence
                                                                                                                    embedding approaches are yet to attain maturity, and
Proceedings of the CIKM 2020 Workshops, October 19-20, 2020,                                                        typically employ average pooling operation to generate
Galway, Ireland                                                                                                     sentence vectors from individual word vectors. Devis-
email: nilavra@ieee.org (N. Bhattacharya);                                                                          ing effective strategies to obtain vectors for compound
iwilds2020@gwizdka.com (J. Gwizdka)
url: https://nilavra.in (N. Bhattacharya); http://gwizdka.com (J.                                                   units (phrases / sentences) from individual word vec-
Gwizdka)                                                                                                            tors is always a challenge [10]. Differently from [9],
orcid: 0000-0001-7864-7726 (N. Bhattacharya);                                                                       we use word embedding vectors and max-pooling op-
0000-0003-2273-3996 (J. Gwizdka)                                                                                    erations (taking element wise maximum of individual
                                    © 2020 Copyright for this paper by its authors. Use permitted under Creative
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR Workshop Proceedings (CEUR-WS.org)
                                                                                                                    word vectors to form sentence vectors), which experi-
Pre-Search Prompt:                                                          Post-Search Prompt:
Think of what you already know on the topic of this search and list as      Now that you have completed this search task, think of the information
many phrases or words as you can that come to your mind. For                that you found and list as many words or phrases as you can on the topic
example, if you know about side effects, please do not just type the        of the search task. This will be short ANSWERS to the search questions.
phrase “side effects” ,but rather type “side effects” and then list the     For example, if you were searching for side effects, please do not just
specific side effects you know about. Please list only one word or phrase   type the phrase “side effects”, but rather type “side effects” and then list
per line and end each line with a comma.                                    the specific side effects you found. Please list only one word (or phrase)
                                                                            per line and end each line with a comma.

 Example Pre-Search                    Example Post-Search Knowledge:                     Expert Knowledge (Excerpt):
 Knowledge:                            Vitamin A deficiency can led to blindness          Health benefits of using vitamin A: Vision, Breast cancer,
 health benefits vitamin consumption   Vitamin A is not toxic if over ingested            Catarats, measles, Malaria, Diahrrea related to hiv, lower
 is highly debated                     if over consumed                                   risk of complications during and after pregnancy, Retinitis
 I know nothing about Vitamin A        vitamin A can decrease vitamin B absorption        pigmentosa, Ensures Healthy Eyes, soft skin, strong bones
 specifically                          and increase likelihood of hip fractures           and teeth, acne, prevents muscular dystrophy, slow the
                                       Vitamin A can be found in leafy green vegetables   aging process, lower risk of leukemia, good vision, Can
     Participant: P03                  organ meats                                        prevent cancer, antioxidant, protects cells, maintain healthy
                                       and broccoli                                       skin, healthy immune system, healthy skeletal and soft
                                       Vitamin A contents can be found on nutritional     tissue...
                                       labels
Figure 2: Example of Pre-Search and Post-Search knowledge
                                       Participant: P03   assessment responses from a participant, for Task T3 (Vita-
min A), alongside Expert Knowledge .



mentally showed better results than average-pooling.                        Search, and Expert Knowledge (Fig. 1). word2vec con-
                                                                            tains 300 dimensional vectors for about 100 billion
                                                                            words (tokens) from the Google News dataset, and is
2. Experimental Design                                                      claimed to be the most stable word-embedding [13].
                                                                            GloVe offers multiple pre-trained word embeddings;
We analyze data from the user-study reported in [8, 9].
                                                                            we ran experiments with 50, 100, and 300 dimensional
Participants (𝑁 = 30, 16 females, mean age 24.5 years)
                                                                            versions.
searched for health-related information on the web,
                                                                               Word embedding algorithms produce vectors for in-
over two search-tasks, T3 (topic: Vitamin A) and T4
                                                                            dividual words. To obtain vectors for phrases and sen-
(topic: Hypotension). Each search task began (Pre-
                                                                            tences, the individual word vectors are usually pooled
Search) and ended (Post-Search) with a knowledge as-
                                                                            or aggregated. As discussed in Sec. 1, we performed
sessment, to gauge the participants’ initial and final
                                                                            max pooling, to produce a single high dimensional vec-
knowledge states. Participants entered natural lan-
                                                                            tor for a participant response (or expert knowledge).
guage responses from free-recall, as answers. A vo-
                                                                            We employed two distance metrics – euclidean, and an-
cabulary of Expert Knowledge was also created for
                                                                            gular (cosine) – to compute distances between vectors
each topic, in consultation with a medical doctor. Ex-
                                                                            of Pre-Search responses, Post-Search responses, and
ample participant responses, and an excerpt from the
                                                                            Expert’s Knowledge (Fig. 1). The euclidean distance is
Expert Knowledge are shown in Fig. 2. After data clean-
                                                                            unbounded, while the angular distance (Eqn. 1) ranges
ing, we obtained data from 49 participant-task pairs
                                                                            from 0 (no distance) to 1 (maximum distance).
(𝑁𝑇 3 = 26; 𝑁𝑇 4 = 23). Due to space limitations, please
see [9] for more details about the study.                                                                                        𝐮⋅𝐯
                                                                                 angular distance(𝐮, 𝐯) = arccos                          /𝜋          (1)
                                                                                                                              ( ‖𝐮‖ ‖𝐯‖ )
3. Data Analysis & Preliminary                            We manually set the angular distance to be 1 (i.e, max-
   Results                                                imum) if one of the input vectors was a zero vector.
                                                          This makes sense because zero vectors are obtained
We hypothesize that participants’ learning during search only if participants’ responses do not contain any signs
can be assessed from the ‘difference’ in their Pre-Search of knowledge (e.g., “none” or “i dont know”).
and Post-Search responses. Since different participants      To visualize the high-dimensional vectors of various
may have different initial and final knowledge states, knowledge states, we employed the t-SNE algorithm.
we measured it from a stationary reference-point: the This algorithm projects a set of high-dimensional ob-
expert knowledge. Calculating such differences be- jects on a 2D plane in such a way that similar objects
tween pieces of natural language texts is challenging, are modelled by nearby points, and dissimilar objects
and is an active research topic. Word embedding is a are modelled by distant points. Using this algorithm,
popular method of computing semantic similarity (or we obtained 2D representations of the Pre-Search, Post-
distances) between two pieces of natural language texts. Search, and Expert Knowledges (Fig. 3, left column).
A word embedding algorithm produces a numeric, high- The visualization shows an almost clear separation
dimensional vector for each word, which is assumed between the Pre-Search (red circle) and Post-Search
to encapsulate the ‘meaning’ of the word. In this work, (green square) knowledge states, with Expert Knowl-
we leverage two popular pre-trained word-embedding edge (blue star) residing near the Post-Search knowl-
models: word2vec [11], and GloVe [12], to compute edge states. This is a visual confirmation and support
‘differences’ or ‘distances’ between Pre-Search, Post- to the hypothesis that participants gain knowledge dur-
                                                                                          Distance Magnitude
                                        Distance Magnitude
      Pre-search Knowledge Embedding                              Participant-Task Pair                                 Participant-Task Pair
      Post-search Knowledge Embedding                             Pre – Exp Distance                                   Pre – Exp Distance
      Expert Knowledge Embedding                                 Post – Exp Distance                                   Post – Exp Distance

  Visualizing high-dimensional                                                                                   Angular Distance Metric
  embeddings in 2D using t-SNE                               Euclidean Distance Metric                         [0 = min distance, 1= max distance]

Figure 3: Results using word2vec 300d word embeddings, across tasks T3 and T4 combined. A clear separation can be
observed between the majority of Pre-Search and Post-Search knowledge states (left column), as well as between Pre-Exp
and Post-Exp distances (middle and right column).



ing search, and move ‘closer’ to the Expert Knowledge the sum of the positive difference ranks (Σ𝑅+ ) and the
state at the end of a search.                                 sum of the negative difference ranks (Σ𝑅− ). Since Σ𝑅−
   The Euclidean and Angular distances between Pre- was greater than Σ𝑅+ in all the tests, the difference be-
Search and Expert (Pre-Exp distance), and Post-Search tween Pre-Exp and Post-Exp distances is negative. This
and Expert (Post-Exp distance), are shown in the mid- means that the majority of participants had lower Post-
dle and right columns, respectively, in Fig. 3. For both Exp distance than Pre-Exp distance (i.e. they moved
distance metrics, the majority of the participants have closer to expert knowledge at the end of the task). The
lower Post-Exp distances than Pre-Exp distances (i.e. magnitude of a phenomenon is measured by effect size,
their Post-Search response is less distant, or more simi- which ranges from 0 (no effect) to 1 (maximum effect).
lar to, Expert Knowledge). These metrics were calcu- All the tests had effect sizes greater than 0.8, signifying
lated between the high dimensional embedding vectors, that searching online had a strong effect on minimizing
which supports the fact that the 2D visualizations (left the distance between participants’ knowledge level and
column) showing the clear separation between Pre- and expert knowledge.
Post-Search Knowledge levels is not merely by random
chance. Interestingly, for few participants, the Post-Exp
distance was higher than the Pre-Exp distance. This 4. Conclusion and Future Work
possibly demonstrates a ‘loss’ in knowledge level: users
                                                              We showed that word embeddings have promise for
were closer to Expert Knowledge before the search, and
                                                              visualizing and quantifying vocabulary-based learn-
moved away from Expert Knowledge after the search.
                                                              ing during search. Clear separation between user’s
   We further tested whether these visual differences
                                                              Pre-Search and Post-Search knowledge states was seen
between Pre-Exp and Post-Exp distances were statis-
                                                              and measured using simple distance metrics. Possi-
tically significant. Since the distance values were not
                                                              ble future directions include predicting these learning
normally distributed, we employed the non-parametric
                                                              metrics from search-interactions measures. Another
Wilcoxon Signed-Rank test, which is used for compar-
                                                              direction is to experiment with contextual embeddings
ing paired or related samples. (b)
                                The[36]
                                     results are presented
                                                              (e.g., BERT). We also plan to investigate individual dif-
in Table 1. We can see that across different(d) choices  of
                                                   eye-tracking  [135]
      (c) eye-tracking [103]                                  ferences in learning during search.
word embeddings, there were significant differences
between the Pre-Exp and Post-Exp distances. Thus, the
results are not due to choice of particular word em- 4.0.1. Acknowledgements
bedding models. The directionalities of the differences We thank Sudipto Mukherjee, for technical and concep-
in the Wilcoxon Signed-Rank test are expressed using tual mentoring; Dr. Andrzej Kahl, our medical doctor
Table 1
Descriptive values of Pre-Exp and Post-Exp distances, and results of statistical significance tests, using different word-
embeddings to model knowledges. As evident from Fig. 3, Pre-Exp and Post-Exp distances are significantly different for all
the tested choices of word embedding models.

                                     Euclidean Distance Metric                                       Angular Distance Metric (Normalized)
                                                                                                           [0=least distance; 1=max distance]
     Word
   Embedding         Pre – Exp         Post – Exp                                           Pre – Exp         Post – Exp
                                                            Wilcoxon SR Test                                                        Wilcoxon SR Test
                     mean (±SD)         mean (±SD)                                          mean (±SD)         mean (±SD)
                                                         all tests significant at p < .05                                        all tests significant at p < .05
                      median             median                                              median             median
                                                       ΣR + = 20.0, ΣR – = 1205.0                                              ΣR + = 28.0, ΣR – = 1197.0
                     6.30 (±1.52)       3.90 (±0.87)                                        0.30 (±0.28)       0.11 (±0.03)
     word2vec            6.12               3.68
                                                       95% CI: -2.76 to -1.82
                                                                                                0.18               0.10
                                                                                                                               95% CI: -0.13 to -0.06
                                                       Effect Size: 0.84                                                       Effect Size: 0.83
                                                       ΣR + = 37.0, ΣR – = 1188.0                                              ΣR + = 43.0, ΣR – = 1182.0
                     8.67 (±2.39)       5.12 (±1.29)                                        0.27 (±0.28)       0.10 (±0.03)
   GloVe 6B 50d          8.26               4.68
                                                       95% CI: -4.03 to -2.48
                                                                                                0.17               0.09
                                                                                                                               95% CI: -0.12 to -0.06
                                                       Effect Size: 0.82                                                       Effect Size: 0.81
                                                       ΣR + = 30.0, ΣR – = 1195.0                                              ΣR + = 32.0, ΣR – = 1193.0
                     9.34 (±2.55)       5.46 (±1.42)                                        0.30 (±0.28)       0.11 (±0.03)
  GloVe 6B 100d          8.96               5.17
                                                       95% CI: -4.46 to -2.79
                                                                                                0.19               0.10
                                                                                                                               95% CI: -0.15 to -0.07
                                                       Effect Size: 0.83                                                       Effect Size: 0.82
                                                       ΣR + = 29.0, ΣR – = 1196.0                                              ΣR + = 35.0, ΣR – = 1190.0
                     12.15 (±3.18)      7.20 (±1.72)                                        0.30 (±0.27)       0.11 (±0.03)
  GloVe 6B 300d          11.97              6.81
                                                       95% CI: -5.79 to -3.65
                                                                                                0.20               0.10
                                                                                                                               95% CI: -0.14 to -0.07
                                                       Effect Size: 0.83                                                       Effect Size: 0.82
                                                       ΣR + = 29.0, ΣR – = 1196.0                                              ΣR + = 38.0, ΣR – = 1187.0
                     12.17 (±3.10)      7.09 (±1.80)                                        0.31 (±0.27)       0.11 (±0.03)
  GloVe 42B 300d         11.74              6.66
                                                       95% CI: -5.92 to -3.79
                                                                                                0.21               0.10
                                                                                                                               95% CI: -0.16 to -0.08
                                                       Effect Size: 0.83                                                       Effect Size: 0.82
                                                       ΣR + = 28.0, ΣR – = 1197.0                                              ΣR + = 38.0, ΣR – = 1187.0
                     13.24 (±3.16)      8.36 (±1.79)                                        0.30 (±0.27)       0.12 (±0.03)
consultant   for expert-vocabulary
 GloVe 840B 300d     12.69          creation;
                                  7.71
                                           95%and      Ying-
                                                CI: -5.67        M. Teng,
                                                          to -3.48
                                                                        0.20 S. Williams,
                                                                                      0.11 D. W. W. Tay, S. Iqbal, Im-
                                                                                               95% CI: -0.13 to -0.06
                                           Effect Size: 0.83                                   Effect Size: 0.82
long Zhang, for contributing to experimental data col-           proving learning outcomes with gaze tracking
lection. The research was partially funded by IMLS               and automatic question generation, in: The Web
Award #RE-04-11-0062-11 to Jacek Gwizdka.                        Conference (WWW), 2020.
                                                             [8] N. Bhattacharya, J. Gwizdka, Relating eye-
                                                                 tracking measures with changes in knowledge
References                                                       on search tasks, in: Symposium on Eye Tracking
                                                                 Research & Applications (ETRA), 2018.
[1] S. Ghosh, M. Rath, C. Shah, Searching as learning:
                                                             [9] N. Bhattacharya, J. Gwizdka, Measuring learning
    Exploring search behavior and learning outcomes
                                                                 during search: differences in interactions, eye-
    in learning-related tasks, in: Conference on Hu-
                                                                 gaze, and semantic similarity to expert knowledge,
    man Information Interaction & Retrieval (CHIIR),
                                                                 in: Conference on Human Information Interaction
    2018.
                                                                 and Retrieval (CHIIR), 2019.
[2] H. L. O’Brien, A. Kampen, A. W. Cole, K. Bren-
                                                            [10] D. Roy, D. Ganguly, M. Mitra, G. J. Jones, Rep-
    nan, The role of domain knowledge in search as
                                                                 resenting documents and queries as sets of word
    learning, in: Conference on Human Information
                                                                 embedded vectors for information retrieval, in:
    Interaction and Retrieval (CHIIR), 2020.
                                                                 ACM SIGIR workshop on neural information re-
[3] L. Xu, X. Zhou, U. Gadiraju, How does team com-
                                                                 trieval (Neu-IR), 2016.
    position affect knowledge gain of users in collab-
                                                            [11] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado,
    orative web search?, in: Conference on Hypertext
                                                                 J. Dean, Distributed representations of words and
    and Social Media (HT), 2020.
                                                                 phrases and their compositionality, in: Advances
[4] U. Gadiraju, R. Yu, S. Dietze, P. Holtz, Analyzing
                                                                 in neural information processing systems, 2013,
    knowledge gain of users in informational search
                                                                 pp. 3111–3119.
    sessions on the web, in: Conference on Human
                                                            [12] J. Pennington, R. Socher, C. D. Manning, Glove:
    Information Interaction & Retrieval (CHIIR), 2018.
                                                                 Global vectors for word representation, in: Con-
[5] S. Kruikemeier, S. Lecheler, M. M. Boyer, Learning
                                                                 ference on empirical methods in natural language
    from news on different media platforms: An eye-
                                                                 processing (EMNLP), 2014, pp. 1532–1543.
    tracking experiment, Political Communication 35
                                                            [13] L. Burdick, J. K. Kummerfeld, R. Mihalcea, Fac-
    (2018) 75–96.
                                                                 tors influencing the surprising instability of word
[6] N. Roy, F. Moraes, C. Hauff, Exploring users’ learn-
                                                                 embeddings, in: Conference of the North Ameri-
    ing gains within search sessions, in: Conference
                                                                 can Chapter of the Association for Computational
    on(a)[21]
        Human Information (b)Interaction
                                 [36]     and Retrieval
                                                                 Linguistics: Human Language Technologies, 2018,
                                             (d) eye-tracking [135]
    (CHIIR), 2020.
                                                                 pp. 2092–2102.
[7] R. Syed, K. Collins-Thompson, P. N. Bennett,