=Paper= {{Paper |id=Vol-1798/paper4 |storemode=property |title=Interactively Producing Purposive Samples for Qualitative Research using Exploratory Search |pdfUrl=https://ceur-ws.org/Vol-1798/paper4.pdf |volume=Vol-1798 |authors=Orland Hoeber,Larena Hoeber,Ryan Snelgrove,Laura Wood |dblpUrl=https://dblp.org/rec/conf/chiir/HoeberHSW17 }} ==Interactively Producing Purposive Samples for Qualitative Research using Exploratory Search== https://ceur-ws.org/Vol-1798/paper4.pdf
        Interactively Producing Purposive Samples for Qualitative
                    Research using Exploratory Search
                               Orland Hoeber                                                              Larena Hoeber
                           University of Regina                                                        University of Regina
                     Department of Computer Science                                         Faculty of Kinesiology and Health Studies
                            Regina, SK, Canada                                                          Regina, SK, Canada
                        orland.hoeber@uregina.ca                                                    larena.hoeber@uregina.ca

                               Ryan Snelgrove                                                               Laura Wood
                      University of Waterloo                                                        University of Waterloo
            Department of Recreation and Leisure Studies                                  Department of Recreation and Leisure Studies
                      Waterloo, ON, Canada                                                          Waterloo, ON, Canada
                   ryan.snelgrove@uwaterloo.ca                                                    laura.wood@uwaterloo.ca

ABSTRACT                                                                             by selecting a sub-population of the whole [8]. For the task of
An important step in conducting qualitative research on large col-                   analyzing social media data, this may be done by categorizing the
lections of text is reducing the size of the collection to one that is               stakeholders (e.g., in the context of sporting events: fans, media,
manageable. While it is common to use a variety of simple sampling                   athletes, coaches, organizers), and choosing all the posts from those
methods, the limitation of these approaches is that their mecha-                     within a specific stakeholder category. Randomness may also be
nisms do not consider the relevance of the data. We have developed                   added to the stratified sampling method in two ways: random
exploratory search methods that leverage visual analytics to pro-                    selection of the stratification (e.g., randomly choosing which stake-
duce purposive samples of large qualitative datasets. In this paper,                 holders to follow), or random selection of the data within a specific
we outline how exploratory search strategies lead to purposive                       stratum (e.g., random selection of posts from the selected stake-
sampling, and put this in the context of the interactive information                 holders) [8]. Systematic sampling may be used as an alternative to
retrieval literature.                                                                random sampling, where a rule-based mechanism is employed (e.g.,
                                                                                     select every nth post). Fundamentally, these sampling methods
CCS CONCEPTS                                                                         may overlook or ignore important qualities of the larger dataset,
                                                                                     such as missing important aspects of the data, interactions among
•Information systems →Users and interactive retrieval; •Human-
                                                                                     key stakeholders, and the temporal relationships between the data
centered computing →Visual analytics;
                                                                                     and other events occurring that are inspiring people to post their
KEYWORDS                                                                             thoughts and opinions on social media [5].
                                                                                        A fundamentally different approach to data reduction is to per-
exploratory search, purposive sampling, qualitative research meth-                   form purposive sampling of the data by carefully choosing a subset
ods                                                                                  based on relevance to the topic of interest. However, doing so in
                                                                                     the context of qualitative research requires that the entire dataset
1    INTRODUCTION                                                                    be considered, limiting the feasibility when the dataset is large.
A common task of qualitative research on textual data is to as-                      Our solution to this problem has been to leverage technology to
sign codes to significant pieces of data (e.g., phrases, sentences,                  support the human effort of qualitative research [9]. The interactive
paragraphs) and then seeking patterns and relationships among                        creation of purposive samples of the data allow the dataset to be
the codes. Doing so allows a researcher to translate a large set of                  reduced to include all relevant posts for a given topic of interest,
textual data into a constrained vocabulary that is more easily kept                  ensuring that important features are not missed. Furthermore, by
in mind, and therefore more easily understood. The challenge is                      maintaining the temporal aspect of the data, the qualitative fea-
that coding data is tedious and time consuming due to the need                       tures can be studied in order, and in consideration of the real-world
to carefully read the text before assigning codes. This challenge                    context in which the posts are situated. In the specific context of
is greater when dealing with large textual datasets such as those                    studying public opinion posted on Twitter, we have developed a
from social media services; it is often necessary to take measures                   software system called Vista (Visual Twitter Analytics) [4], which
to reduce the amount of data to be coded and analyzed further.                       enables the visual exploration, creation, and export of purposive
   The general approach for data reduction is to sample the data.                    samples. In the remainder of this paper, we will explain how pur-
Stratified sampling seeks to reduce the amount of data to consider                   posive sampling is enabled through exploratory search, and how
                                                                                     visual analytics approaches can enhance the process. We will also
CHIIR 2017 Workshop on Supporting Complex Search Tasks, Oslo, Norway.                discuss search strategies that allow a qualitative researcher to focus
Copyright for the individual papers remains with the authors. Copying permitted      their search activities as they develop inductive research questions.
for private and academic purposes. This volume is published and copyrighted by its
editors. Published on CEUR-WS, Volume 1798, http://ceur-ws.org/Vol-1798/.
CHIIR 2017 Workshop on Supporting Complex Search Tasks, March 11, 2017, Oslo, Norway.                          Hoeber, Hoeber, Snelgrove, Wood


2    EXPLORATORY SEARCH LEADING TO
     PURPOSIVE SAMPLING
A necessary first step is to collect a large set of data that captures
as much of the information relevant to a high-level interest as
possible. For social media services such as Twitter, this may be
done by choosing many hashtags and query terms in order to
capture the full breadth of public interest in a topic, and collecting
the data over an extended period of time. Doing so means that the
researcher does not need to identify the specific research questions
to be pursued a priori, but may instead collect a large dataset that
has a high probability of capturing salient aspects that emerge as
the topic or event of interest unfolds. This is important in situations
such as critical event analysis in sport, where it is difficult to predict
what issues or micro-events might occur in advance.                                    (a) The entire Le Tour de France dataset (#tdf).
   With such a large dataset, exploratory search [11] is a valuable
approach for enabling a researcher to inductively develop specific
research questions to pursue in detail. In particular, the interactive
nature of the searcher engaging within the search process [3] allows
for potential avenues of interest to be pursued, considered, saved,
synthesized, and evaluated in the context of developing research
questions. For example, one might collect all of the tweets posted
during a mega-sporting event such as Le Tour de France using their
official hashtag #tdf, and interactively explore the data to find what
issues people are commenting upon. The discovery of gender issues
within a predominantly male event may lead to the development
of research questions to be pursued within the data, supported by
searching for various different embodiments of this issue within
the tweets.
   Our particular approach has been to use visual analytics [6] to                    (b) Isolation of the tweets that mention ‘women’.
enable the (re)searcher to take an active and informed role in the
search process. Vista [4] provides a visual overview of the tem-
porally changing sentiment of the collected Twitter data, presents
visual overviews of the top terms, hashtags, user mentions, and
authors, provides a geovisualization of the tweet source locations,
and enables sub-querying and exclusions to create visually com-
parable sentiment timelines (see Figure 1). For the purposes of
exploring the data to discover emergent events and issues, the
sentiment timeline provides a pre-analysis of the data to draw
attention to times during the event when there are strong posi-
tive, negative, or divisive sentiment. The visual overviews of the
terms/hashtags/mentions/authors allow for the recognition of rel-
evant and irrelevant topics, making it easy for the (re)searcher to
isolate the data relevant to the topic, or exclude it from the whole.
Textual querying is supported, allowing the (re)searcher to con-                        (c) Isolation of the tweets that mention ‘girls’.
struct queries based on their knowledge of the possible issues and
micro-events that may have occurred. If spatial and temporal as-
                                                                             Figure 1: Using Vista, samples of the data can be isolated
pects of the data are also relevant, queries can be generated that
                                                                             focusing on the use of the terms ‘women’ and ‘girls’ and
limit the temporal range and the spatial extent of the data.
                                                                             stacked upon one another, allowing visual comparison be-
   An unique feature of Vista is the mechanism by which it supports
                                                                             tween them and back to the entire dataset under considera-
the comparison and analysis of multiple sets of search results. Each
                                                                             tion.
query of the data adds a new section within the visual overview of
the data, drawn as a sentiment timeline. As a result, the (re)searcher
can generate multiple queries of the data and visually compare
the temporal pattern of the stakeholders engagement with the                 visual identification of patterns and relationships across the sets of
associated issues. Keeping the timelines synchronized enables the            search results.
                                                                                Fruitful avenues of exploration of the data can be maintained and
                                                                             refined, and new hypotheses can be investigated and discarded if
                                                CHIIR 2017 Workshop on Supporting Complex Search Tasks, March 11, 2017, Oslo, Norway.


they do not reveal useful information. Through the careful selection     ongoing work is to refine and enhance Vista to further support
and experimentation of what terms and hashtags to include and            the interactive exploration, discovery, and isolation of subsets of
exclude from the data, a purposive sample of the data can be selected    textual data, producing topically-complete samples that make the
and evaluated in an interactive manner, and ultimately exported          application of traditional qualitative methods tractable.
for further analysis. In addition, during the traditional qualitative
analysis of the exported data, the researcher can readily return         REFERENCES
to Vista to explore newly emergent topics, as well as inspect the         [1] Paul André, Jamie Teevan, and Susan T Dumais. 2009. From x-rays to silly putty
                                                                              via Uranus: serendipity and its role in web search. In Proceedings of the SIGCHI
details of embedded links and the authors to support their coding             Conference on Human Factors in Computing Systems. ACM, New York, NY, USA,
and analysis tasks.                                                           2033–2036. DOI:http://dx.doi.org/10.1145/1518701.1519009
   In the context of supporting the qualitative study of emergent         [2] Marcia J Bates. 1989. The design of browsing and berrypicking techniques
                                                                              for the on-line search interface. Online Review 13, 5 (1989), 407–431. DOI:
issues within large datasets, it is useful to consider the importance         http://dx.doi.org/10.1108/eb024320
of serendipity within the (re)search process [1]. The ability to          [3] Nicholas J. Belkin. 2015. People, Interacting with Information. ACM SIGIR Forum
easily generate queries of the data and visually evaluate the results         49, 2 (2015), 13–27. DOI:http://dx.doi.org/10.1145/2766462.2767854
                                                                          [4] Orland Hoeber, Larena Hoeber, Maha El Meseery, Kenneth Odoh, and Radhika
allows for new avenues of inquiry to be readily pursued. Should               Gopi. 2016. Visual Twitter analytics (Vista): Temporally changing sentiment and
the searcher stumble upon some topic that was not considered, it              the discovery of emergent themes within sport event tweets. Online Information
                                                                              Review 40, 1 (2016), 25–41. DOI:http://dx.doi.org/10.1108/OIR-02-2015-0067
can be isolated from the data, and new research questions may be          [5] Brett Hutchins. 2014. Twitter: Follow the money and look beyond sports.
inductively developed to study this aspect of the data.                       Communication & Sport 2, 2 (2014), 122–126. DOI:http://dx.doi.org/10.1177/
   Considering the use of Vista to produce a purposive sample in              2167479514527430
                                                                          [6] Daniel A Keim, Gennady Andrienko, Jean-Daniel Fekete, Carsten Görg, Jörn
light of the theory on interactive information retrieval, we can con-         Kohlhammer, and Guy Melançon. 2008. Visual analytics: Definition, process, and
sider the process to be strongly influenced by exploratory search             challenges. In Information visualization: Human-centered issues and perspectives,
[11] and sensemaking [7]. In particular, the researcher may start             Andreas Kerren, John T Stasko, Jean-Daniel Fekete, and Chris North (Eds.).
                                                                              Springer-Verlag, Berlin Heidelberg, 154–175. DOI:http://dx.doi.org/10.1007/
with a vague and under-defined goal for searching within the data,            978-3-540-70956-5 7
and may seek to develop knowledge and understanding by inter-             [7] Peter Pirolli and Daniel M. Russell. 2011. Introduction to this Special Issue
                                                                              on Sensemaking. Human-Computer Interaction 26, 1-2 (2011), 1–8. DOI:http:
actively querying the data to isolate potentially important subsets.          //dx.doi.org/10.1080/07370024.2011.556557
The process of starting with a large set of data, identifying a po-       [8] Jeremy C. Short, David J. Ketchen, and Timothy B. Palmer. 2002. The role
tential research question, and iteratively sub-querying the data to           of sampling in strategic management research on performance: A two study
                                                                              analysis. Journal of Management 28, 3 (2002), 363–385. DOI:http://dx.doi.org/10.
both inductively refine the research question and isolate the tweets          1177/014920630202800306
that are relevant to the issue is an evolutionary search process.         [9] Ramine Tinati, Susan Halford, Leslie Carr, and Catherine Pope. 2014. Big data:
Researchers using the system may employ berry-picking strategies              Methodological challenges and approaches for sociological analysis. Sociology
                                                                              48, 4 (2014), 663–681. DOI:http://dx.doi.org/10.1177/0038038513511561
[2] to learn and develop an understanding of what is being sought.       [10] Pertti Vakkari. 2003. Task-based information searching. Annual Review of
This ultimately leads to the co-development of research questions             Information Science and Technology 37, 2 (2003), 413–464. DOI:http://dx.doi.org/
                                                                              10.1002/aris.1440370110
to ask of the data and complex queries (textual, spatial, temporal)      [11] Ryen W White and Resa A Roth. 2009. Exploratory Search: Beyond the Query-
to isolate the data to answer the questions.                                  Response Paradigm. Morgan & Claypool Publisher, San Rafael, CA. DOI:http:
   Because of the need to ensure that all relevant information is dis-        //dx.doi.org/10.2200/S00174ED1V01Y200901ICR003
covered, a structured information seeking process is beneficial, such
as the task-based information seeking model proposed by Vakkari
[10]. Initial assessments of the data, exploratory sub-querying,
inspection of the tweets, and preliminary development of a set
of possible research questions to pursue within the data can be
considered pre-focus tasks. Once the researcher settles on a re-
search question to delve into, they may issue a series of sub-queries
to isolate the relevant data, and use the visualization of the sen-
timent timelines to verify the patterns in the data, representing
the focus formulation tasks. With sufficient data selected via the
purposive sample, the corresponding tweets may then be exported
and analyzed within traditional qualitative research software and
methods, which constitutes the post-focus tasks. Such a structured
task-centric model of the information seeking process meshes well
with the structured research methods that are commonplace in
qualitative research.


3   CONCLUSION
The primary contribution of this paper is the presentation of a qual-
itative research mechanism that leverages interactive exploratory
search and visual analytics to enable the dynamic development of
purposive samples that address emergent research questions. Our