<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Workshop on Supporting Complex Search Tasks, March</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Interactively Producing Purposive Samples for alitative Research using Exploratory Search</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Orland Hoeber</string-name>
          <email>orland.hoeber@uregina.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ryan Snelgrove</string-name>
          <email>ryan.snelgrove@uwaterloo.ca</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Larena Hoeber</string-name>
          <email>larena.hoeber@uregina.ca</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laura Wood</string-name>
          <email>laura.wood@uwaterloo.ca</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Regina, Department of Computer Science</institution>
          ,
          <addr-line>Regina, SK</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Regina, Faculty of Kinesiology and Health Studies</institution>
          ,
          <addr-line>Regina, SK</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Waterloo, Department of Recreation and Leisure Studies</institution>
          ,
          <addr-line>Waterloo, ON</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <volume>11</volume>
      <issue>2017</issue>
      <abstract>
        <p>An important step in conducting qualitative research on large collections of text is reducing the size of the collection to one that is manageable. While it is common to use a variety of simple sampling methods, the limitation of these approaches is that their mechanisms do not consider the relevance of the data. We have developed exploratory search methods that leverage visual analytics to produce purposive samples of large qualitative datasets. In this paper, we outline how exploratory search strategies lead to purposive sampling, and put this in the context of the interactive information retrieval literature.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>•Information systems →Users and interactive retrieval;
•Humancentered computing →Visual analytics;
exploratory search, purposive sampling, qualitative research
methods</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>A common task of qualitative research on textual data is to
assign codes to signicant pieces of data (e.g., phrases, sentences,
paragraphs) and then seeking paerns and relationships among
the codes. Doing so allows a researcher to translate a large set of
textual data into a constrained vocabulary that is more easily kept
in mind, and therefore more easily understood. e challenge is
that coding data is tedious and time consuming due to the need
to carefully read the text before assigning codes. is challenge
is greater when dealing with large textual datasets such as those
from social media services; it is oen necessary to take measures
to reduce the amount of data to be coded and analyzed further.</p>
      <p>
        e general approach for data reduction is to sample the data.
Stratied sampling seeks to reduce the amount of data to consider
CHIIR 2017 Workshop on Supporting Complex Search Tasks, Oslo, Norway.
Copyright for the individual papers remains with the authors. Copying permied
for private and academic purposes. is volume is published and copyrighted by its
editors. Published on CEUR-WS, Volume 1798, hp://ceur-ws.org/Vol-1798/.
by selecting a sub-population of the whole [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. For the task of
analyzing social media data, this may be done by categorizing the
stakeholders (e.g., in the context of sporting events: fans, media,
athletes, coaches, organizers), and choosing all the posts from those
within a specic stakeholder category. Randomness may also be
added to the stratied sampling method in two ways: random
selection of the stratication (e.g., randomly choosing which
stakeholders to follow), or random selection of the data within a specic
stratum (e.g., random selection of posts from the selected
stakeholders) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Systematic sampling may be used as an alternative to
random sampling, where a rule-based mechanism is employed (e.g.,
select every nth post). Fundamentally, these sampling methods
may overlook or ignore important qualities of the larger dataset,
such as missing important aspects of the data, interactions among
key stakeholders, and the temporal relationships between the data
and other events occurring that are inspiring people to post their
thoughts and opinions on social media [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        A fundamentally dierent approach to data reduction is to
perform purposive sampling of the data by carefully choosing a subset
based on relevance to the topic of interest. However, doing so in
the context of qualitative research requires that the entire dataset
be considered, limiting the feasibility when the dataset is large.
Our solution to this problem has been to leverage technology to
support the human eort of qualitative research [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. e interactive
creation of purposive samples of the data allow the dataset to be
reduced to include all relevant posts for a given topic of interest,
ensuring that important features are not missed. Furthermore, by
maintaining the temporal aspect of the data, the qualitative
features can be studied in order, and in consideration of the real-world
context in which the posts are situated. In the specic context of
studying public opinion posted on Twier, we have developed a
soware system called Vista (Visual Twier Analytics) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], which
enables the visual exploration, creation, and export of purposive
samples. In the remainder of this paper, we will explain how
purposive sampling is enabled through exploratory search, and how
visual analytics approaches can enhance the process. We will also
discuss search strategies that allow a qualitative researcher to focus
their search activities as they develop inductive research questions.
Hoeber, Hoeber, Snelgrove, Wood
2
      </p>
    </sec>
    <sec id="sec-3">
      <title>EXPLORATORY SEARCH LEADING TO</title>
    </sec>
    <sec id="sec-4">
      <title>PURPOSIVE SAMPLING</title>
      <p>A necessary rst step is to collect a large set of data that captures
as much of the information relevant to a high-level interest as
possible. For social media services such as Twier, this may be
done by choosing many hashtags and query terms in order to
capture the full breadth of public interest in a topic, and collecting
the data over an extended period of time. Doing so means that the
researcher does not need to identify the specic research questions
to be pursued a priori, but may instead collect a large dataset that
has a high probability of capturing salient aspects that emerge as
the topic or event of interest unfolds. is is important in situations
such as critical event analysis in sport, where it is dicult to predict
what issues or micro-events might occur in advance.</p>
      <p>
        With such a large dataset, exploratory search [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is a valuable
approach for enabling a researcher to inductively develop specic
research questions to pursue in detail. In particular, the interactive
nature of the searcher engaging within the search process [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] allows
for potential avenues of interest to be pursued, considered, saved,
synthesized, and evaluated in the context of developing research
questions. For example, one might collect all of the tweets posted
during a mega-sporting event such as Le Tour de France using their
ocial hashtag #tdf, and interactively explore the data to nd what
issues people are commenting upon. e discovery of gender issues
within a predominantly male event may lead to the development
of research questions to be pursued within the data, supported by
searching for various dierent embodiments of this issue within
the tweets.
      </p>
      <p>
        Our particular approach has been to use visual analytics [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to
enable the (re)searcher to take an active and informed role in the
search process. Vista [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] provides a visual overview of the
temporally changing sentiment of the collected Twier data, presents
visual overviews of the top terms, hashtags, user mentions, and
authors, provides a geovisualization of the tweet source locations,
and enables sub-querying and exclusions to create visually
comparable sentiment timelines (see Figure 1). For the purposes of
exploring the data to discover emergent events and issues, the
sentiment timeline provides a pre-analysis of the data to draw
aention to times during the event when there are strong
positive, negative, or divisive sentiment. e visual overviews of the
terms/hashtags/mentions/authors allow for the recognition of
relevant and irrelevant topics, making it easy for the (re)searcher to
isolate the data relevant to the topic, or exclude it from the whole.
Textual querying is supported, allowing the (re)searcher to
construct queries based on their knowledge of the possible issues and
micro-events that may have occurred. If spatial and temporal
aspects of the data are also relevant, queries can be generated that
limit the temporal range and the spatial extent of the data.
      </p>
      <p>An unique feature of Vista is the mechanism by which it supports
the comparison and analysis of multiple sets of search results. Each
query of the data adds a new section within the visual overview of
the data, drawn as a sentiment timeline. As a result, the (re)searcher
can generate multiple queries of the data and visually compare
the temporal paern of the stakeholders engagement with the
associated issues. Keeping the timelines synchronized enables the
(a) e entire Le Tour de France dataset (#tdf).
(b) Isolation of the tweets that mention ‘women’.</p>
      <p>(c) Isolation of the tweets that mention ‘girls’.
visual identication of paerns and relationships across the sets of
search results.</p>
      <p>Fruitful avenues of exploration of the data can be maintained and
rened, and new hypotheses can be investigated and discarded if
they do not reveal useful information. rough the careful selection
and experimentation of what terms and hashtags to include and
exclude from the data, a purposive sample of the data can be selected
and evaluated in an interactive manner, and ultimately exported
for further analysis. In addition, during the traditional qualitative
analysis of the exported data, the researcher can readily return
to Vista to explore newly emergent topics, as well as inspect the
details of embedded links and the authors to support their coding
and analysis tasks.</p>
      <p>
        In the context of supporting the qualitative study of emergent
issues within large datasets, it is useful to consider the importance
of serendipity within the (re)search process [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. e ability to
easily generate queries of the data and visually evaluate the results
allows for new avenues of inquiry to be readily pursued. Should
the searcher stumble upon some topic that was not considered, it
can be isolated from the data, and new research questions may be
inductively developed to study this aspect of the data.
      </p>
      <p>
        Considering the use of Vista to produce a purposive sample in
light of the theory on interactive information retrieval, we can
consider the process to be strongly inuenced by exploratory search
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and sensemaking [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In particular, the researcher may start
with a vague and under-dened goal for searching within the data,
and may seek to develop knowledge and understanding by
interactively querying the data to isolate potentially important subsets.
e process of starting with a large set of data, identifying a
potential research question, and iteratively sub-querying the data to
both inductively rene the research question and isolate the tweets
that are relevant to the issue is an evolutionary search process.
Researchers using the system may employ berry-picking strategies
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to learn and develop an understanding of what is being sought.
is ultimately leads to the co-development of research questions
to ask of the data and complex queries (textual, spatial, temporal)
to isolate the data to answer the questions.
      </p>
      <p>
        Because of the need to ensure that all relevant information is
discovered, a structured information seeking process is benecial, such
as the task-based information seeking model proposed by Vakkari
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Initial assessments of the data, exploratory sub-querying,
inspection of the tweets, and preliminary development of a set
of possible research questions to pursue within the data can be
considered pre-focus tasks. Once the researcher seles on a
research question to delve into, they may issue a series of sub-queries
to isolate the relevant data, and use the visualization of the
sentiment timelines to verify the paerns in the data, representing
the focus formulation tasks. With sucient data selected via the
purposive sample, the corresponding tweets may then be exported
and analyzed within traditional qualitative research soware and
methods, which constitutes the post-focus tasks. Such a structured
task-centric model of the information seeking process meshes well
with the structured research methods that are commonplace in
qualitative research.
3 CONCLUSION
e primary contribution of this paper is the presentation of a
qualitative research mechanism that leverages interactive exploratory
search and visual analytics to enable the dynamic development of
purposive samples that address emergent research questions. Our
ongoing work is to rene and enhance Vista to further support
the interactive exploration, discovery, and isolation of subsets of
textual data, producing topically-complete samples that make the
application of traditional qualitative methods tractable.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Paul</given-names>
            <surname>Andre</surname>
          </string-name>
          ´,
          <string-name>
            <given-names>Jamie</given-names>
            <surname>Teevan</surname>
          </string-name>
          , and
          <string-name>
            <surname>Susan T Dumais</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>From x-rays to silly puy via Uranus: serendipity and its role in web search</article-title>
          .
          <source>In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM</source>
          , New York, NY, USA,
          <fpage>2033</fpage>
          -
          <lpage>2036</lpage>
          . DOI:hp://dx.doi.org/10.1145/1518701.1519009
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Marcia</surname>
            <given-names>J</given-names>
          </string-name>
          <string-name>
            <surname>Bates</surname>
          </string-name>
          .
          <year>1989</year>
          .
          <article-title>e design of browsing and berrypicking techniques for the on-line search interface</article-title>
          .
          <source>Online Review</source>
          <volume>13</volume>
          ,
          <issue>5</issue>
          (
          <year>1989</year>
          ),
          <fpage>407</fpage>
          -
          <lpage>431</lpage>
          . DOI: hp://dx.doi.org/10.1108/eb024320
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Nicholas</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Belkin</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>People, Interacting with Information</article-title>
          .
          <source>ACM SIGIR Forum 49</source>
          ,
          <issue>2</issue>
          (
          <year>2015</year>
          ),
          <fpage>13</fpage>
          -
          <lpage>27</lpage>
          . DOI:hp://dx.doi.org/10.1145/2766462.2767854
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Orland</given-names>
            <surname>Hoeber</surname>
          </string-name>
          , Larena Hoeber, Maha El Meseery,
          <string-name>
            <given-names>Kenneth</given-names>
            <surname>Odoh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Radhika</given-names>
            <surname>Gopi</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Visual Twier analytics (Vista): Temporally changing sentiment and the discovery of emergent themes within sport event tweets</article-title>
          .
          <source>Online Information Review</source>
          <volume>40</volume>
          ,
          <issue>1</issue>
          (
          <year>2016</year>
          ),
          <fpage>25</fpage>
          -
          <lpage>41</lpage>
          . DOI:hp://dx.doi.org/10.1108/OIR-02-2015-0067
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Bre</surname>
          </string-name>
           Hutchins.
          <year>2014</year>
          .
          <article-title>Twier: Follow the money and look beyond sports</article-title>
          .
          <source>Communication &amp; Sport</source>
          <volume>2</volume>
          ,
          <issue>2</issue>
          (
          <year>2014</year>
          ),
          <fpage>122</fpage>
          -
          <lpage>126</lpage>
          . DOI:hp://dx.doi.org/10.1177/ 2167479514527430
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Daniel</surname>
            <given-names>A Keim</given-names>
          </string-name>
          , Gennady Andrienko,
          <string-name>
            <surname>Jean-Daniel</surname>
            <given-names>Fekete</given-names>
          </string-name>
          , Carsten Go¨rg, Jo¨rn Kohlhammer, and Guy Melan¸con.
          <year>2008</year>
          .
          <article-title>Visual analytics: Denition, process, and challenges</article-title>
          . In Information visualization:
          <article-title>Human-centered issues and perspectives</article-title>
          , Andreas Kerren, John T Stasko,
          <string-name>
            <surname>Jean-Daniel Fekete</surname>
          </string-name>
          , and Chris North (Eds.). Springer-Verlag, Berlin Heidelberg,
          <fpage>154</fpage>
          -
          <lpage>175</lpage>
          . DOI:hp://dx.doi.
          <source>org/10.1007/ 978-3-540-70956-5 7</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Peter</given-names>
            <surname>Pirolli and Daniel M. Russell</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Introduction to this Special Issue on Sensemaking</article-title>
          .
          <source>Human-Computer Interaction 26</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>2</lpage>
          (
          <year>2011</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . DOI:hp: //dx.doi.org/10.1080/07370024.
          <year>2011</year>
          .556557
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Jeremy</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Short</surname>
            ,
            <given-names>David J.</given-names>
          </string-name>
          <string-name>
            <surname>Ketchen</surname>
          </string-name>
          , and
          <string-name>
            <surname>Timothy</surname>
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Palmer</surname>
          </string-name>
          .
          <year>2002</year>
          . 
          <article-title>e role of sampling in strategic management research on performance: A two study analysis</article-title>
          .
          <source>Journal of Management</source>
          <volume>28</volume>
          ,
          <issue>3</issue>
          (
          <year>2002</year>
          ),
          <fpage>363</fpage>
          -
          <lpage>385</lpage>
          . DOI:hp://dx.doi.org/10. 1177/014920630202800306
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Ramine</given-names>
            <surname>Tinati</surname>
          </string-name>
          , Susan Halford, Leslie Carr, and
          <string-name>
            <given-names>Catherine</given-names>
            <surname>Pope</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Big data: Methodological challenges and approaches for sociological analysis</article-title>
          .
          <source>Sociology</source>
          <volume>48</volume>
          ,
          <issue>4</issue>
          (
          <year>2014</year>
          ),
          <fpage>663</fpage>
          -
          <lpage>681</lpage>
          . DOI:hp://dx.doi.org/10.1177/0038038513511561
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>[10] Peri Vakkari</source>
          .
          <year>2003</year>
          .
          <article-title>Task-based information searching</article-title>
          .
          <source>Annual Review of Information Science and Technology 37</source>
          ,
          <issue>2</issue>
          (
          <year>2003</year>
          ),
          <fpage>413</fpage>
          -
          <lpage>464</lpage>
          . DOI:hp://dx.doi.org/ 10.1002/aris.1440370110
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Ryen</surname>
            <given-names>W White</given-names>
          </string-name>
          <source>and Resa A Roth</source>
          .
          <year>2009</year>
          .
          <article-title>Exploratory Search: Beyond the eryResponse Paradigm</article-title>
          . Morgan &amp; Claypool Publisher, San Rafael, CA. DOI:hp: //dx.doi.org/10.2200/S00174ED1V01Y200901ICR003
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>