<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Text Insights: Natural Language Analytics for Understanding Social Media Engagement</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Frank Grimm</string-name>
          <email>fgrimm@techfak.uni-bielefeld.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthias Hartung</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Philipp Cimiano</string-name>
          <email>cimianog@cit-ec.uni-bielefeld.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cognitive Interaction Technology Center of Excellence (CIT-EC) Bielefeld University 33615</institution>
          <addr-line>Bielefeld</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>19</fpage>
      <lpage>23</lpage>
      <abstract>
        <p>We present Text Insights, an application for understanding factors of user engagement in Facebook pages. Providing analytics based on natural language processing, Text Insights is complementary to existing tools o ering mainly numerical indicators of user engagement. Our system extracts keyphrases from page content in a linguistically motivated manner. Keyphrases are weighted according to their relevance as approximations of the most important topics in the community. We demonstrate that the system provides valuable insights for page owners interested in trend discovery, content evaluation and content planning.</p>
      </abstract>
      <kwd-group>
        <kwd>social media</kwd>
        <kwd>text analytics</kwd>
        <kwd>natural language processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Modern companies use their presence on social media platforms for diverse
business goals. Social media present a new and unique way for direct interaction
between the company and di erent stakeholders, right down to the customer.</p>
      <p>
        While most social media platforms o er some way to measure user
engagement, many focus on customer conversion, rather than content. Tools like
Insights for Facebook Pages1 or Social Analytics for Google Analytics2 provide
convenient indices to track user demographics and engagement in numerical
terms (e.g. page impressions, number of likes or referrals). Most traditional
social media metrics rely on a large number of interactions to generate actionable
and meaningful insights. While some brands promote themselves through viral
campaigns, most e orts in social media base on long-term campaign strategies,
rather than short-term or viral approaches[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>We argue that for evaluating and re ning such strategies, it might be
bene cial to analyze the textual content of user contributions. Therefore, our Text
Insights web application o ers analytics for investigating the publicly available
data on a speci c Facebook page on both the numerical and the textual level.</p>
      <sec id="sec-1-1">
        <title>1 https://www.facebook.com/insights/</title>
      </sec>
      <sec id="sec-1-2">
        <title>2 http://www.google.com/analytics/</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Text Insights</title>
      <p>Text Insights analyzes data on a social media presence in the Facebook
ecosystem. A Facebook page contains di erent forms of content contributions by the
maintainer of a page and outside commenters. In the following, we outline how
this data is acquired (Section 2.1) and presented to the user in an aggregated
form (Section 2.2), based on linguistic analysis (Section 2.3).
2.1</p>
      <sec id="sec-2-1">
        <title>Data Acquisition Methods</title>
        <p>During page data retrieval, the system recursively queries the Facebook Graph
API3 for information stored on a Facebook page, comprising all posts and
comments (incl. metrics such as count, creation timestamp, etc.). We also retrieve
limited user data on all contributors associated with the content, which is
anonymized for privacy reasons. The system only retains public data available
from most user pro les, such as gender, ISO language and country code.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Data Preparation</title>
        <p>Text Insights aggregates general metadata on posts, comments, user
demographics and post types (status, photo, question, link and video), as well as textual
information derived from page content.</p>
        <p>The system presents the most important topics within all posts and
comments on the page, condensed into a tag cloud. Each topic is denoted as a single
keyword or a keyphrase (i.e., a sequence of keywords; see Fig. 1, left). The size
of a tag corresponds to its relative importance which is determined by a pipeline
including several steps of linguistic analysis as described in Section 2.3. Colors
are used to indicate the source of topics: Topics triggered by the page owner that
have not been picked up by the users are presented in light blue, topics triggered
by the page and picked up by other contributors in dark blue, topics that have
been independently triggered by others in brown.</p>
        <p>All data is available in an all-time overview, monthly breakdowns, as well
as speci c queries for a user-de ned time frame. The user interface is
interactive, thus enabling, for example, to search for contributions containing a certain
keyphrase, navigating to the original contribution(s) mentioning a particular
keyphrase, or investigating related keyphrases.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Linguistic Analysis</title>
        <p>Text Insights generates keyphrases from each contribution on the page using the
following steps of linguistic analysis:
1. Tokenization. After sentence splitting, a WhitespaceTokenizer is applied to
extract sequences of individual tokens from each posting or comment. Both
steps make use of NLTK4. All tokens are normalized to lowercase characters.</p>
        <sec id="sec-2-3-1">
          <title>3 https://developers.facebook.com/docs/graph-api</title>
          <p>
            4 Natural Language Toolkit http://www.nltk.org/
2. Part-of-Speech Tagging. NLTK and Ark-Tweet-NLP [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]5 part-of-speech
taggers are used to assign word classes to each token.
3. Keyphrase Extraction and Normalization. The tagged words of a
contribution are searched for linguistically meaningful patterns of words (e.g.,
compound noun-phrases, verb-noun constructions). Linguistic patterns of
di erent type and length can be con gured. Keyphrases containing special
characters, stop words, numbers or parts of URLs are rejected. All tokens
are normalized using the Lancaster stemmer6 as implemented in NLTK.
4. Keyphrase Weighting. Each extracted keyphrase is assigned a TF/IDF
[
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] relevance score as given in equation (1), where p refers to a keyphrase, d
to a speci c post or comment within the set D of all contributions on the
page, and f (p; d) denotes the frequency of p in d.
          </p>
          <p>t df(p; d; D) = log(f (p; d) + 1) log</p>
          <p>N
jfp 2 D : t 2 pgj
(1)
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Use Cases</title>
      <p>Text Insights addresses several use cases, among them trend discovery, content
evaluation and content planning. The following examples are the results of
applying Text Insights to a Facebook page targeting health care professionals in
the domain of hypertension treatment, the \Hypertension Hub"7 (HH).
Trend Discovery. The global tag cloud in Fig. 1 (left) displays speci c
hypertension risk factors as an overall prominent topic of interest for HH users. Apart
from discovering such overall trends, monthly breakdowns can be used for
tracking interesting engagement patterns. The unique comment peak for 09/2013 (see
Fig. 1, right), for example, can be attributed to contributions surrounding a
medical congress (\esc"). The content coverage for this was perceived so well that
the congress is still showing in the global tag cloud. As an actionable result, it
might be useful to cover similar events in the future.</p>
      <sec id="sec-3-1">
        <title>5 http://www.ark.cs.cmu.edu/TweetNLP/, version 0.3.2.</title>
      </sec>
      <sec id="sec-3-2">
        <title>6 http://www.nltk.org/api/nltk.stem.html</title>
      </sec>
      <sec id="sec-3-3">
        <title>7 https://www.facebook.com/thehypertensionhub</title>
        <p>Content Evaluation and Planning. Monthly tag clouds and the ability to gather
related keywords to a query term enable the user to evaluate existing content
and reactions to the page owner's content. The monthly tag cloud in Fig. 2
(left) shows the page-triggered keyword \smoking" as the most important topic.
Analyzing frequently co-occuring keyphrases for \smoking" (see Fig. 2, right)
yields (i) semantically related terms like \quit", \cigar", \cvd" (cardiovascular
disease) and (ii) keyphrases indicating a personal connection (e.g., \husband",
\father", \worries"), both mostly user-triggered. Apparently, many users are
focusing on personal, rather than professional, aspects of the domain. As an
actionable conclusion, it might be of use to trigger more treatment-related topics
in order to shift engagement from patients to professionals.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and Future Work</title>
      <p>One of the biggest bene ts social media channels add to modern marketing is the
direct feedback on the content strategy. In order to help re ne such a strategy, we
have proposed Text Insights, a tool to analyze social media content on a textual
level. Apart from content evaluation and planning, we have demonstrated that
the system may also enable page owners to gain an understanding of trends in
their communities. Since the very nature of social media is public, this can also
be applied to a competitors' presence in order to compare content strategies.</p>
      <p>
        As future improvements, we plan to include spam or language detection, since
some contributions have shown to skew part of the generated tag clouds. The
linguistic analysis might bene t from clustering synonymous keyphrases [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>David</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>Andrew</given-names>
          </string-name>
          <string-name>
            <surname>Ng</surname>
            , and
            <given-names>Michael</given-names>
          </string-name>
          <string-name>
            <surname>Jordan</surname>
          </string-name>
          .
          <article-title>Latent Dirichlet Allocation</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>3</volume>
          :
          <fpage>993</fpage>
          {
          <fpage>1022</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Rohan</given-names>
            <surname>Miller</surname>
          </string-name>
          and
          <string-name>
            <given-names>Natalie</given-names>
            <surname>Lammas</surname>
          </string-name>
          .
          <article-title>Social media and its implications for viral marketing</article-title>
          .
          <source>Asia Paci c Public Relations Journal</source>
          ,
          <volume>11</volume>
          (
          <issue>1</issue>
          ):1{
          <issue>9</issue>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Olutobi</given-names>
            <surname>Owoputi</surname>
          </string-name>
          ,
          <string-name>
            <surname>Brendan O'Connor</surname>
            , Chris Dyer, Kevin Gimpel,
            <given-names>Nathan</given-names>
          </string-name>
          <string-name>
            <surname>Schneider</surname>
          </string-name>
          , and
          <string-name>
            <surname>Noah</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Smith.</surname>
          </string-name>
          <article-title>Improved part-of-speech tagging for online conversational text with word clusters</article-title>
          .
          <source>In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , pages
          <volume>380</volume>
          {
          <fpage>390</fpage>
          ,
          <string-name>
            <surname>Atlanta</surname>
          </string-name>
          , Georgia,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Gerard</given-names>
            <surname>Salton</surname>
          </string-name>
          and
          <string-name>
            <given-names>Christopher</given-names>
            <surname>Buckley</surname>
          </string-name>
          .
          <article-title>Term-weighting approaches in automatic text retrieval</article-title>
          .
          <source>Information processing &amp; management</source>
          ,
          <volume>24</volume>
          (
          <issue>5</issue>
          ):
          <volume>513</volume>
          {
          <fpage>523</fpage>
          ,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>