<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Social Set Visualizer (SoSeVi) II: Interactive Social Set Analysis of Big Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Benjamin Flesch</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ravi Vatraapu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raghava Rao Mukkamala</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Abid Hussain</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre for Business Data Analytics (</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Westerdals Oslo School of Arts, Comm &amp; Tech</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Current state-of-the-art in big social data analytics is largely limited to graph theoretical approaches such as social network analysis (SNA) informed by the social philosophical approach of relational sociology. This paper proposes and illustrates an alternate holistic approach to big social data analytics, social set analysis (SSA), which is based on the sociology of associations, mathematics of set theory, and advanced visual analytics of event studies. We first presented the SSA approach to a wider audience at IEEE Big Data [ ], IEEE EDOCW [ ], and IEEE EDOCW [ ]. Since then we worked on improving SoSeVi in order to further demonstrate its usefulness in large-scale visual analytics tasks of individual and collective behavior of actors in social networks. The current iteration of the Social Set Visualizer (SoSeVi) builds upon some of the concepts laid out by the UpSet project [ ] and aims to further improve the capabilities of researchers and practitioners in big social data analytics alike. We then illustrate our new approach by reporting on the design, development, and evaluation results of a state-of-theart visual analytics dashboard, the Social Set Visualizer (SoSeVi). The development of the dashboard involved cutting-edge open source visual analytics libraries (D .js) and creation of new visualizations such as visualizations of actor mobility across time and space, conversational comets, and more. Evaluation of the dashboard consisted of technical testing, usability testing, and domain-specific testing with CSR students and yielded positive results. In conclusion, we discuss the new analytical approach of social set analysis and conclude with a discussion of the benefits of set theoretical approaches based on the social philosophical approach of associational sociology.</p>
      </abstract>
      <kwd-group>
        <kwd>Big Social Data</kwd>
        <kwd>Social Set Analysis</kwd>
        <kwd>Computational Set Analysis</kwd>
        <kwd>Big Data Visual Analytics</kwd>
        <kwd>Event Studies</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>This paper introduces a new research approach situated in the domains of
Data Science [ , , ] and Computational Social Science [ ] with practical
applications to Big Social Data Analytics in organizations [ , , ]. It addresses
one of the important theoretical and methodological limitations in the emerging
paradigm of Big Data Analytics of social media data [ ]. In particular, it
address the major limitation in existing research on Big Social Data analytics
that computational methods, formal models and software tools are largely limited
to graph theoretical approaches [ ] (such as SNA [ ]), and are informed by
the social philosophical approach of relational sociology [ ]. There are no other
unified modeling approaches to social data that integrate the conceptual, formal,
software, analytical and empirical realms [ ]. This results in a research problem
when analyzing Big Social Data from platforms like Facebook and Twitter, as
such data consists of not only dyadic relations but also individual associations [ ].
For Big Social Data analytics of Facebook or Twitter data, the fundamental
assumption of SNA, that social reality is constituted by dyadic relations and
interactions that are determined by structural positions of individuals in social
networks [ ], is neither necessary nor sufficient [ ].</p>
      <p>For example, consider a Facebook post made on the official Facebook wall of
Lionel Messi, the soccer prodigy who plays for FC Barcelona and Argentina’s
national football team. Each official post by Messi to his Facebook page typically
receives more than , likes, , comments and , shares. Such
association-based and content-driven social media interactions involving large
number of social actors are unlike the other social interactions such as face-to-face,
email, phone and instant messaging in the sense that what binds the interacting
social actors together in the first instance is not so much the relational ties (strong
vs. weak ties) but associations ranging from the player himself, the teams that
he plays for, to the cultural, ethnic, national and linguistic attributes. Modeling
such Facebook interactions using affiliation networks creates the problem of an
extremely low number of nodes with an extremely high number of nodes as
spokes. Further, such SNA assumes the central social psychological concept of
"homophily" that social actors with similar interests (that is, associations) prefer
to interact with each other. To overcome this limitation and address the research
problem, this paper proposes an alternative holistic approach to Big Social Data
analytics that is based on the sociology of associations as well as the mathematics
of set theory and offers to develop fundamentally new methods and tools for
Big Social Data analytics, Social Set Analysis (SSA). Our overarching research
question is stated as: How, and in what way, can methods and tools for Social
Set Analysis, derived from the alternative holistic approach to Big Social Data
analytics based on the sociology of associations and the mathematics of set theory,
result in meaningful facts, actionable insights and valuable outcomes?</p>
      <p>The rest of the paper is organized as follows. First, we present a philosophical
template for holistic approaches to computational social sciences, compare and
contrast the dominant approach of social network analysis with the proposed
novel approach of social set analysis and discuss the benefits of set theoretical
approaches based on the social philosophical approach of associational sociology
in section . Then, we illustrate our new analytical approach by reporting on the
design and development of our state-of-the-art visual analytics dashboard, the
Social Set Visualizer (SoSeVi), that builds on and extends the UpSet visualizations
of set intersections.</p>
    </sec>
    <sec id="sec-2">
      <title>Theoretical Framework</title>
      <p>The theoretical concepts behind our proposed approach of Social Set Analysis
are discussed here.</p>
      <p>.</p>
      <sec id="sec-2-1">
        <title>Set Theoretical Big Social Data Analytics</title>
        <p>Social Set Analysis (SSA) as employed in this paper is concerned with the mobility
of social actors across time and space. For mobility across time, we conduct SSA
of big social data from the Facebook walls of eleven companies from the same
industry with an analytical focus on the set of actors that interacted with the
company before, during and after the real-world events, and set theoretical
intersections of the three time periods. Similarly, for mobility across space, we
conduct set inclusions and exclusion of actors who interacted with different
Facebook walls. This will allow us to uncover not only the interactional dynamics
over time and space but also identify actor sets that correspond to marketing
segmentations such as brand loyalists, brand advocates, brand critics and social
activists.</p>
        <p>.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Event Study Methodology</title>
        <p>Event studies is a finance methodology to assess an impact on corporate wealth
(e.g. stock prices) caused by events such as restructuring of companies, leadership
change, mergers &amp; acquisitions [ , , ]. It has been a powerful tool since the
late s to assess financial impact of changes in corporate policies and used
exclusively in the area of investments and accounting to examine stock price
performance and the dissemination of new information [ ].</p>
        <p>While there is no unique structure for event study methodology, at a higher
level of abstraction, it contains identifying three important time periods or
windows. First, defining an event of interest and identifying the period over which
it is active (event window), the second involves identifying the estimation period
for the event (pre-event or estimation window) and the final one being identifying
the post-event window [ ]. In social set analysis of a real-world event, we have
applied event study methodology to identify the three important time periods of
user interactions on social media platforms: before (pre-event window), during
(event window) and after (post-event window).</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>The improved version of the Social Set Visualizer (SoSeVi) has been influenced
by the previous work of the UpSet project [ ], and the visualizations of set
intersection based on innovative approaches to combination matrices. In contrast
to UpSet, SoSeVi uses server-side calculations on its underlying big social data
corpus, and therefore is able to handle much larger volumes of data with , s
to millions of actors moving between set intersections. Both projects strive to
provide a real-time visual analytics tool.</p>
      <p>The previous version of SoSeVi used Venn diagrams to showcase actor
migration between time periods and different Facebook walls , with focus shifting
towards the use of Euler visualizations as discussed in [ ] for version of SoSeVi.</p>
    </sec>
    <sec id="sec-4">
      <title>The Visual Analytics Tool</title>
      <p>S
In</p>
      <p>C</p>
      <p>F</p>
      <p>W</p>
      <p>We showcase version II of the Social Set Visualizer (SoSeVi) tool which is
adapted to the challenges of visualizing user-selected and dynamically calculated
large-scale set intersections depicting aggregated actor behavior in social networks
such as Facebook. Figure illustrates the major design shift from the SoSeVi
version which was presented in [ ].</p>
      <p>. .
g x e g
irn itr is t</p>
      <p>d in
u a a</p>
      <p>g
D m nd it
,
e n a s
r o h e
o it v
f t n
e a f i
B in le e
r b e th
o
f m th f</p>
      <p>o
p te n it</p>
      <p>o
e in i w</p>
      <p>t
im t c s
ith fte e m
m i
i
S p t d</p>
      <p>a i
. r r v
e
t ig ro .t
e f</p>
      <p>Figure depicts the DashboardView which is the central interface of the web
application. It contains all visualizations and is initially shown to the user. It
consists of both small and large overall social media activity visualizations [F].
The researcher can use the time period selection tool [S] to navigate the data, and
to toggle data sets from different Facebook walls depending on the analysis tasks
at hand. Based on the user-selected time period, which we label as the During
period, the tool is able to deduct Before and After time periods by looking at the
beginning (earliest event) and end (most recent event) of the underlying data. An
alphabetical word cloud [W] underneath the main activity chart [F] illustrates
the most important conversation topics in the during period. This showcases
the pluggable architecture of SoSeVi, which facilitates diverse real-time content
analysis tasks on the underlying data, all based on a user-selected time frame
for analysis. To the left of the main activity visualization [F], set intersections
[In] are dynamically visualized based on the user selection of the time period.
Set intersections are encoded in a combination matrix. Each data source uses a
distinct color.</p>
      <p>For each set intersection, we render up to three bar graphs in [C]. One bar each
is drawn for every single Before, During and After period, when the underlying
set has a cardinality of at least one actor. The bars are horizontally stacked, with
the topmost bar signifying the Before period, the center bar During, and the
lowest After. More information on the visualization of actor migration through
time and space sets is shown in figure .</p>
      <p>When clicking on or hovering over the set cardinality visualizations in [C], a
visualization of actor migration between periods and set intersections is displayed
as shown in figure . SoSeVi calculates all possible set intersection for each set
with all sets of the following period (migrations from Before to During, and
migrations from During to After) based on the user-selected time period using
the selection tool [S]. Cardinality numbers illustrate the actual migration volume,
whereas the right-hand side bar labels indicate the destination of each migration.</p>
      <p>Augmenting the extensive visual analytics features of SoSeVi, RawdataView
presents a detailed search interface for the underlying Facebook activity data. It is
accessible to the user through various means by interacting with the visualizations
of the DashboardView. ActorsView presents a dedicated interface for analysis
tasks related to Actor Mobility across time and space of companies’ Facebook
walls. The visualizations of actor mobility in DashboardView refer to ActorsView
in order to provide the user with further details when requested. ActorsView
presents a handy set of tools for analysis of actor mobility and cross-postings
between different time periods and Facebook walls.</p>
      <p>The Social Set Visualizer can be accessed at http://bigdata:bigdata@ . . . /
with user name and password bigdata. It has been tested by using Webkit and
Gecko-based web browsers.</p>
      <p>.</p>
      <sec id="sec-4-1">
        <title>Data Acquisition</title>
        <p>Facebook data was collected through the Social Data Analytics Tool (SODATO) [ , , ].
SODATO-provided Facebook activity datasets are generated as independent files
for each company’s Facebook wall, and were combined into one for using them
as a whole data set that can be filtered or expanded on demand.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Software Development</title>
      <p>The SoSeVi dashboard is implemented as a client-side web application, and
uses the D .js Javascript SVG visualization framework. D .js constitutes a
lightweight and very extendable Javascript visualization framework which can
display visualizations for a multitude of browser-based clients. The flexibility
provided by D .js enables the creation of new kinds of interactive visualizations
which are able to run on any device with decent processing resources including
Windows, MacOS and Linux based systems with screen sizes up to K, which
gives SoSeVi the flexibility needed for various purposes in different research areas.</p>
      <p>After thorough (re-)evaluation of Apache Spark and other NoSQL-based
storage solutions, the decision to use PostgreSQL for data storage was not
overthrown due to lack of empirically measured benefits in execution time with
our test data. In version of SoSeVi, all set intersection calculations have been
outsourced from PostgreSQL to Redis. A dedicated Redis instance now performs
memory-intensive set intersection calculations with a significantly better execution
speed and pipes the calculation results back to the user-facing dashboard in real
time.</p>
      <p>Version of the Social Set Visualizer (SoSeVi) project presented in this paper
provides a significantly better interface for Social Set Analysis (SSA) tasks of big
social data originating from Facebook. We showcase our interactive tool for
largescale real-time set intersection calculations to the research community. SoSeVi
depicts the first visual analytics tool to visualize migration flows between set
intersections in big social data.</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
    </sec>
    <sec id="sec-7">
      <title>Future Work</title>
      <p>We strive to add more customization features in order to provide the user with
more viable investigation strategies, such as sorting and filtering of sets. This
was also demonstrated in the UpSet project, but real-time implementation of
those features was out of scope for version of SoSevi. Add extension points
to perform statistical calculations over the set intersections and improve the
overall measurement of migration flows. The research tool will be extended for
non-Facebook data social media data.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>The authors were partially supported by the project Big Social Data Analytics:
Branding Algorithms, Predictive Models, and Dashboards funded by Industriens
Fond (The Danish Industry Foundation). Any opinions, findings, interpretations,
conclusions or recommendations expressed in this paper are those of its authors
and do not represent the views of the Industriens Fond (The Danish Industry
Foundation).
All links were last followed on July</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>