<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Profiling Reputation of Corporate Entities in Semantic Space</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jussi Karlgren</string-name>
          <email>jussi@gavagai.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Magnus Sahlgren</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fredrik Olsson</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fredrik Espinoza</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ola Hamfors</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Gavagai AB Skånegatan 97</institution>
          ,
          <addr-line>116 35 Stockholm</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Gavagai used its first-generation baseline system for the profiling task for evaluation campaign for online reputation management systems of CLEF 2012. The system builds on large scale analysis of streaming text and performed excellently on this task with standard settings. Profiling corporate reputation in streaming online data The profiling task was defined to be based on real data, using a set of microblog posts from Twitter filtered to contain a company name. The experimental data consisted of thirty-six sets of microblog post references, each set potentially relevant to a named company. The first six sets were used for training and the thirty latter ones for testing. The task was for each post in the test set first to determine whether it refers to the company name mentioned in it (some names are ambiguous and sometimes the company name is mentioned in passing) and then assess whether the tweet improves or detracts from the reputation of the company. This task is close to but not identical to a typical negative-positive sentiment analysis. Firstly, the tweets may not be attitudinal but factual, yet retain implications for a company's reputation: a factual report on the state of the world may impinge negatively or positively on a company name. Secondly a statement of attitude couched in ever so attitudinally loaded terms might not have the effect with respect to polarity on company reputation: a glowingly positive report on the wrong aspect of a product or service might not be what the company wants to represent itself with. The details of the task are given in the introduction to the Evaluating Online Reputation Management Systems Lab of the 2012 CLEF conference [1].</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>System description</title>
      <p>
        Gavagai provides through its Ethersource suite of services tools for monitoring targets
of interest for some commercial purpose in streaming data of any scale and editorial
quality in any language with respect to semantic poles of some permanence. Ethersource
is based on distributional semantics [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] represented in a semantic space [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and realised
through a proprietary implementation of the Random Indexing processing framework [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
as described in our position paper at the recent Online Reputation Management
workshop [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Ethersource is under constant development and the results from this evaluation
are being fed back into the system quality assurance cyle.
      </p>
      <p>A target in Ethersource is defined through manual entry of a number of representative
terms. In this case the targets were defined through their primary name (“lufthansa”,
“#lufthansa”, “lufthansa’s”) and a small number of additional or blocking terms obtained
through a support system based on a semantic space model built from previous large
scale analysis of streaming text (“blackberry” ! “#rim”).</p>
      <p>
        A semantic pole in Ethersource is likewise defined through a larger and more
permanently selected number of terms. This term set can be extensive or limited, depending on
if recall or precision is crucial for the task at hand and if typical expression of this pole
is wide-ranging or more exact [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. For typical sentiment analysis purposes, the poles can
be defined through a list of positive and negative terms; for other purposes other word
lists can be used — in our commercial context we have a large number of poles and
do not generalise to simple positive or negative [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. For this task, we utilised Gavagai’s
standard poles for customer satisfaction for English and Spanish, each of some few
hundred editorially selected terms, semi-automatically augmented through the semantic
space model built from previous large scale analysis of streaming text and static textual
collections in each language.
      </p>
      <p>The system took each candidate microblog post as if it were harvested from a live
feed, ran it through a standard language identifier, and filtered it through the entity target
representation. If the target identifier fired, the post was polarised with respect to the
two opposing customer satisfaction poles defined for the language as identified. The
polarisation score, normally aggregated by our system over streaming data into a time
series and monitored by our customers for change, varies between 0 and 1 and is not
designed to make decisions for text items in isolation from their context. In this case, if
the score of either pole was over a editorially set threshold the post was considered to
have effect on company reputation. If both scores were high, the larger score was used.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Results</title>
      <p>Our baseline system as it stands achieved excellent results in the evaluation, with an
overall profiling accuracy of 40.0%, calculated by both relevance and polarity identification
being correct.</p>
      <p>In the relevance assessment or filtering process where our system uses simple lexical
term recognition as an indicator of relevance the settings we chose yielded the highest
reliability score of any system and near highest sensitivity achieving a filtering accuracy
of 77.4%. For predicting effect and direction of effect on reputation our system found
that the threshold for taking a post into account was set somewhat conservatively yielding
an accuracy of 37%.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Analysis</title>
      <p>This sort of analysis is by necessity very subjective. A post such as</p>
      <p>definitivo lo que me choca de armani es su estilismo
might arguably be interpreted to be neutral, negative, or positive for Armani and a post
such as</p>
      <p>shit bitch me and @name livin in up at tha Marriott BITCH
as negative or positive (hardly neutral) for Marriott depending on one’s perspective; a
post such as</p>
      <p>BA set to by BMI from Lufthansa
can be interpreted to be neutral, negative, or positive for Lufthansa depending on what
one knows about BMI. This is the reason we in commercial application of our system
work on time series and sequences rather than single posts.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Amigó</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corujo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meij</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rijke</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>d</year>
          .: Overview of RepLab 2012:
          <article-title>Evaluating online reputation management systems</article-title>
          .
          <source>In: CLEF 2012 Labs and Workshop Notebook</source>
          Papers (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Kanerva</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors</article-title>
          .
          <source>Cognitive Computation</source>
          <volume>1</volume>
          (
          <issue>2</issue>
          ),
          <fpage>139</fpage>
          -
          <lpage>159</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Karlgren</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sahlgren</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Olsson</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Espinoza</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , ,
          <string-name>
            <surname>Hamfors</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Usefulness of sentiment analysis</article-title>
          .
          <source>In: ECIR</source>
          <year>2012</year>
          , 34th European Conference on Information Retrieval.
          <source>Barcelona</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Olsson</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karlgren</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sahlgren</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Espinoza</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hamfors</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Technical requirements for knowledge representation for attitude mining on a realistic scale</article-title>
          .
          <source>In: Proceedings of the Workshop on Reputation Management in Social Media at LREC'12</source>
          .
          <string-name>
            <surname>Istanbul</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Sahlgren</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces</article-title>
          .
          <source>Ph.D. thesis</source>
          , Stockholm University (
          <year>2006</year>
          ), http://soda.swedish-ict.se/437/
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Sahlgren</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The distributional hypothesis</article-title>
          .
          <source>Rivista di Linguistica (Italian Journal of Linguistics)</source>
          <volume>20</volume>
          (
          <issue>1</issue>
          ),
          <fpage>33</fpage>
          -
          <lpage>53</lpage>
          (
          <year>2008</year>
          ), http://soda.swedish-ict.se/3941/
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Sahlgren</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karlgren</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eriksson</surname>
          </string-name>
          , G.:
          <article-title>SICS: Valence annotation based on seeds in word space</article-title>
          . In: Fourth International Workshop on Semantic Evaluations (SemEval-2007).
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2007</year>
          ), http://soda.swedish-ict.se/2593/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>