<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Visualisation of Russian Newspaper Corpus by Means of Reference Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dmitry Ilvovsky</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ekaterina Chernyak</string-name>
          <email>echernyak@hse.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Research University - Higher School of Economics Moscow</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we present some preliminary results for text corpus visualization by means of so-called reference graphs. The nodes of this graph stand for key words or phrases extracted from the texts and the edges represent the reference relation. The node A refers to the node B if the corresponding key word / phrase B is more likely to co-occur with key word / phrase A than to occur on its own. Since reference graphs are directed graphs, we are able to use graphtheoretic algorithms for further analysis of the text corpus. The visualization technique is tested on our own Web-based corpus of Russian-language newspapers.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The main idea of any text visualisation technique is to plot important elements of the
text (such as key words or key phrases, named entities, or terms). Such pictures can
be seen as a tool for text summarization and information extraction / presentation [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
The most known text visualization technique is tag clouds [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The tag cloud shows the
key words / phrases (i.e. tags) extracted from a text on a plane. The size of the tag
depends on its frequency or any other statistical feature. The majority of text visualization
techniques exploit the idea of tag cloud. In [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] the tags extracted from tweets were
color-coded according to the politics of the user. Vennclouds, introduced in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] are an
extension of the tag cloud idea. Instead of one tag cloud, a Venncloud presents three tag
clouds, which are used to contrast two texts. In [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] the tag clouds are placed inside the
nodes of the graph and the nodes are connected by an edge if they have a lot in common.
Furthermore, the nodes are sorted according to the time axis. Another extension of the
tag cloud idea is the tag graph. To achieve the tag graph one needs to introduce some
sort of relation between the tags. For example, in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] the tags stand for named entities
and the edges between them show whether they co-occur.
      </p>
      <p>Our project of text collection visualization follows to this direction. We construct
socalled reference graphs, where nodes stand for key words / phrases, which are extracted
from the whole collection. There is a directed edge between two nodes A and B if node
A refers to node B: A =) B . The referral relation shows that the key word or phrase
B occurs with a higher probability if the key word or phrase A occurs in the same text
and B is more likely to occur in a document where A is present than its expected value
over the entire corpus indicates. We use an asymmetrical association measure to extract
referral rules that is based on annotated suffix tree scoring. Hence the reference graph
is a directed graph, which is a very well studied mathematical structure. This gives us
plenty of opportunities for further analysis.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data</title>
      <p>We choose a number of Russian news portals (“Izvestia”, “Nezavisimaya gazeta”,“Moscow
Komsomoltes”, “Kommersant”). We crawled the section of their web-pages devoted to
economics. We process and aggregate all the articles published in 2014, which gives
us a total of 4061 articles (1109 from “Kommersant”, 1061 from “Izvestia”, 1284 from
“Nezavisimaya gazeta”, and 613 from “Moscow Komsomoltes”).
3</p>
    </sec>
    <sec id="sec-3">
      <title>Key word and key phrase extraction</title>
      <p>
        First, following [
        <xref ref-type="bibr" rid="ref6 ref9">6,9</xref>
        ] we form candidate phrases, which satisfy certain part of speech
tag patterns, such as NOUN + NOUN or ADJECTIVE + NOUN or NOUN +
PREPOSITION + NOUN, etc. The whole list of patterns was adopted from [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. If we want to
extract key words, we restrict ourselves only to nouns. Second, we set a frequency
threshold for candidate phrases and select only frequent phrases. We calculate frequency of
the candidate phrase in the whole corpus, not in individual texts. Finally, we achieve
a list of phrases that satisfy grammar patterns and are frequent enough. We chose the
threshold for frequency empirically so that we get the top 250 candidate phrases and the
top 100 candidate words. We remove senseless phrases from this list (such as “Izvestia
reporter”) manually and consider the remaining key phrases. Since all the texts in the
collection belong to the same domain and are written using specific vocabulary, there
is no need for a more complex extraction procedure. The replacement of our manual
key phrase processing with some computational techniques, which takes a
newspaperspecific vocabulary into account, is an important part of future work.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Annotated suffix tree (AST) scoring</title>
      <p>
        According to the annotated suffix tree model [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], a text document is not a set of words
or terms, but a set of so-called fragments, i.e., sequences of characters arranged in the
same order as they occur in the text. Each fragment is characterized by a float number.
The greater the number is, the more important the fragment is for the text. An annotated
suffix tree (see Fig. 1) is a data structure used for computing and storing all fragments
of the text and their frequencies. It is a rooted tree in which:
– every node corresponds to one character;
– every node is labeled by the frequency of the text fragment encoded by the path
from the root to the node.
      </p>
      <p>
        To build an AST, we split the text into relatively short fragments, strings of two
to four words, and apply them consecutively to ensure that the resulting AST has a
relatively modest size. Our algorithm for constructing an AST [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is a light modification
of the well-known algorithms for constructing suffix trees [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>To use an AST to score the string to text relevance we need to do the following,
First we build an AST for every text. Next we match the strings to the AST to estimate
the relevance. This is done in several steps:
1. Every string is split into suffixes;
2. Every suffix is matched to the AST. A match is a path from the root of the AST,
that coincides with the beginning of the current suffix. To estimate the match we
use scoring function:
score(match(su x ; ast )) =
=</p>
      <p>X
node2match
(</p>
      <p>f (node)
f (parent(node))
);
where f (node) is the frequency of the matching node and f (parent(node)) is it’s
parent frequency;
3. Then the relevance is estimated by averaging the score of a symbol:
relevance(string ; text ) = SCORE(string; ast) =
= Psu x score(match(su x ; ast ))=jsu x j ;
jstring j
where jsu x j and jstring j are the lengths of the suffix and the string.</p>
      <p>
        Note, that “score” is a scaling function that converts a match score into a relevance
estimation. We consider three types of the scaling functions, according to [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], where
the AST method was used to categorize e-mails:
– Linear function: (x) = x
– Logit function:
– Root function (x) = px
      </p>
      <p>x
1</p>
      <p>x
(x) = log
= log x
log(1
x)
Of them, only the linear scaling function has an obvious meaning: it stands for the
conditional probability of characters averaged over matching fragments (CPAMF).</p>
      <p>
        Let us calculate the relevance of string “dining” to the AST in Fig. 1. There are
six suffixes of the string “dining”: ‘dining”, “ining”, “ning”, “ing”, “ng”, and “g’ . The
scorings of these suffixes are presented in Table 1.
The reference graph construction method is based on the procedure of scoring
relevance of a key phrase to a text. Because the key phrases are extracted from the whole
collection, we do not know how relevant they are to individual texts. We use annotated
suffix tree (AST) scoring to compute key phrase to text relevance in the same fashion
as it is presented in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. This scoring takes all fuzzy matches between the key phrase
and the text into account. It helps to cope with some typos and replaces stemming in a
sense. Using AST scoring we estimate the relevance of every key phrase to every text.
If the relevance value is lower than the given threshold, we suppose the text is not about
this particular key phrase. Usually we set up the relevance threshold at the level of 0.2,
which makes up around a third of the maximum experimental AST relevance value.
Given the relevance threshold we define the set of texts, which are relevant for every
key phrase. Let us denote key phrases as ki, i = 1 : n, and let F (ki) be the set of texts
relevant to key phrase ki. Let us consider that key phrase ki refers to key phrase kj
(ki =) kj ), if the number of texts which belong both to F (kj ) and F (ki) makes out
a significant part of F (kj ) : jF (ki)\F (kj)j &gt; r, where r is the confidence threshold and
jF (ki)j
belongs to the (0.5, 1) interval. This gives us the structure of the referrals between key
phrases which can be represented as a graph where nodes are key phrases and edges
are referral. We also introduce the support threshold in a way similar to association rule
framework [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]: Support(F (ki)) = jF (ki)j and use for further analysis only those key
phrases, whose support values are higher than the given threshold. From the associative
rule framework we inherit the problem of the confidence and support thresholds
selection. Both are very important, but there is no technique to set them automatically. The
association rules are found with user defined minimum support and confidence values.
So do we. We set the relevance threshold at 0.2, the confidence threshold at 0.7 and the
support threshold at 5.
6
      </p>
    </sec>
    <sec id="sec-5">
      <title>Reference graph visualization</title>
      <p>As soon as we get the set of referrals ki =) kj and their confidence and support
values, we can plot the reference graphs. For the sake of space we replace key words /
phrases with their index numbers. The size of the node depends on the support value.
The nodes are color-coded in the following way: the green nodes only refer to other
nodes, the violet are only referred by other nodes, the rest of the nodes are blue.</p>
      <p>Let us consider two reference graphs constructed for two newspapers ‘Moscow
Komsomolets” (upper Fig. 2) and “Nezavisimaya gazeta” (lower Fig. 2), based on
articles published in December 2014. First of all, the graph are of similar size: there are
88 nodes and 85 nodes correspondingly. The graphs are of different shapes: the first
one is centered around the node 256 (“Russian Government” ), that has the highest
support. The second graph is sparse and there is no obvious center. The highest support get
the nodes 256 (“Russian Government” ) and 325 (“Economic growth” ). Both graphs
share in common a strongly connected component of four nodes 215, 216, 217, 218,
that describes consumer behavior (“Consumer price”, “Consumer credit” , “Consumer
demand” , “Consumer lending” ), which is no surprise. However there is little
intersections in the content of the graphs. The nodes “Vladimir Putin” and “Dmitry Medvev”
are absent in the second graph. There are four nodes that deal with Ukraine in the
first graph, and only two of them appear in the second. At the same time, there is a node
“Saudi Arabia” in the second graph. The majority of nodes in the second graph are
Russian Government and Ministry of Finance related, while in the first graph the majority
of nodes relate to ruble devaluation and business in Russia. These two graphs clearly
show the difference between two newspapers. “Nezavisimaja gazeta” being more
politics and business oriented presents the year end situation in Russia as crisis, while the
“Moscow Komsomolets” is more oriented towards international relations of Russia and
consumer needs.
7</p>
    </sec>
    <sec id="sec-6">
      <title>Analysis of reference graphs</title>
      <p>
        There are several ways of reference graph analysis. such as link analysis and extension
of reference graphs to time-depended case. The most straightforward way to analyse the
structure of the directed edges and to measure centrality of the nodes of the reference
graph is to apply the PageRank algorithm[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The list of the top 5 nodes according to
PageRank of “Nezavisimaya gazeta” and “Moscow komsomolets” reference graphs are
presented in Tables 2 and 3.
      </p>
      <p>Node (Key word/phrase)
“Russian market”
“Economic sanctions”
“Consumer demand”
“Consumer credit”
“Economic policy”</p>
      <p>PageRankValue
0.107
0.063
0.047
0.042
0.035
“Economic sanctions” seem to be the most important event of the December, 2014,
widely discussed in both newspapers under consideration. Despite the attitude to
economics sanctions might be different, the consumer demand is usually discussed in
context of possible consequences. The link analysis confirms the fact that “Nezavisimaya
gazeta” is more oriented to market situation, while “Moscow komsomolets” looks at the
situation from the world-wide perspective. To analyse how the reference graphs change
with time we need to construct a series of reference graphs. Let us split the 2014 year in
25 periods, so that every period lasts for 2 weeks. Now we can construct 25 reference
graphs, using the same key words / phrases and subcorpus of the articles published in
the given period. Let us focus on the “Izvestia” newspaper. First of all we check, which
key words / phrases are relevant (i.e. have high support that is document frequency)
to each period. To do this we need to merge all the articles of the period into one text
and apply the AST relevance measure. Surprisingly this gives us a simple typology of
the key words / phrases. There are key words / phrases that are relevant to all periods,
such as “Gas supply” . There are key words / phrases that are relevant for several
consequent periods. For example, “Tax concession” is relevant to periods 8-13, that is for
the middle of the year (after annexation of Crimea), “Weakening rouble” is relevant to
periods 16-20. Finally, there are key words / phrases that are relevant to the beginning
and to the end of year periods: “Rate increase” , “the Customs Union” , “Oil price”,
“Gasprom” . This might be explained by planning and wrapping up the year using the
same vocabulary.</p>
      <p>The next step is to check which referrals are present during the whole year and
which appear / disappear in some periods. Such referrals are “State Duma” =) “Bill”
, “Vladimir Putin” =) “Russian rouble” , “Russian gas” =) “Ukraine government”
are present in almost all 25 reference graphs. Closer to the end of the year appear such
referrals as “Inflation” =) “Russian currency” . There are some referrals that are
appear in disjoint periods. For example, the referral “Retail” =) “Criminal
responsibility” appears in periods 3-4, 8, 11-12, 17, 19, 23. Since we plan to make an animation
of how reference graphs change with time, such referrals might cause a lot of
difficulties. We cannot animate straightforwardly the reference graphs and need some sort of
smoothing.
8</p>
    </sec>
    <sec id="sec-7">
      <title>Future work</title>
      <p>Analysis of reference graphs. It is necessary to test some methods for graph analysis
such as clustering nodes, measuring centrality (ingluding PageRank and HITS), finding
cycles of minimal length, bridges, connected components.</p>
      <p>Temporal analysis. The extension of reference graphs to time-depended case will
allow us to detect trends and / or events in newspapers by finding temporal references
between key word or phrases that occurred yesterday and today’s key word or phrase.</p>
      <p>
        Coloring the nodes. We plan to use the LDA [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] or LDA-like methods to group key
phrases into latent topics and color the nodes according to the topic mixture.
      </p>
      <p>Text preprocessing improvement. Further directions of text processing module
development include word sense disambiguation and disambiguation for morphological
analysis. We need to develop some filters that distinguish between newspaper-specific
vocabulary and general vocabulary and take synonyms such as “Russian currency” and
“Rouble” into account.</p>
      <p>Quantitative evaluation. Since there is no golden standard for text visualisation
problems, it is a common idea to conduct a user study for quantitative evaluation. There
are three possible designs of the study. To test the information discovery power of
referral relation A =) B we ask the user “What is the context of concept B?” and provide
4 possible answers: concept A (right answer), concept C, such that B =) C, concept
D, such that D =) A =) B, a random concept. The proportion of right answers
will show the validity of referral relation. To test the reference graphs as a search tool,
we provide the user with a search engine which is able to search in our text collection, a
series of tag clouds for every time period and every source and corresponding series of
referral pairs. We ask the user to find concepts, associated to his/her own query. Using
only the search engine he/she will hardly find any associated concepts. By looking at
tag clouds the user might discover some concepts that are related to the search. But the
reference graphs will provide not only the concepts the query refers to or is referred
by, but also the concepts achieved according to transitivity of referral relation. To
validate this we a) record the average time of determining the associated concepts, b) ask
the user to estimate the ease of finding associated concepts by means of every tool. To
test the temporal reference graphs, we ask the user to search for his/her own query and
understand since when his/her query is important. To answer the question the user may
read all the text, rely on the size of the tags of the tag clouds or check, whether the
query appears in a certain reference graph. Since the key word / phrase appears in the
reference graph, in and only if it has high support and is a part of a reference rule with a
high confidence, using the reference graphs might help the user a lot. We use the same
validation techniques as we use for the previous test.
9</p>
    </sec>
    <sec id="sec-8">
      <title>Conclusion</title>
      <p>In this paper we tried to analyse a corpus of articles published in the most popular
Russian newspapers in 2014 by means of so-called reference graphs. Every node of a
reference graph is a key word or key phrase. The edges of the graph represent reference
relation, which means that if node A refers to node B, B is more likely to co-occur with
A. The reference graphs can serve not only as a visualization tool, but also as a tool for
further text analysis.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agrawal</surname>
            , R.,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          , A.,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Mining association rules between sets of items in large databases</article-title>
          .
          <source>ACM SIGMOD Record</source>
          <volume>22</volume>
          (
          <issue>2</issue>
          ) (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>3</volume>
          (
          <issue>4-5</issue>
          ),
          <volume>9931022</volume>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Coppersmith</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Dynamic wordclouds and vennclouds for exploratory data analysis</article-title>
          .
          <source>Association for Computational</source>
          Linguistics pp.
          <fpage>22</fpage>
          -
          <lpage>29</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dubov</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Text analysis with enhanced annotated suffix trees: Algorithms and implementation</article-title>
          .
          <source>In: Analysis of Images, Social Networks and Texts, Communications in Computer and Information Science</source>
          , vol.
          <volume>542</volume>
          , pp.
          <fpage>308</fpage>
          -
          <lpage>319</lpage>
          . Springer International Publishing (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gusfield</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          : Algorithms on Strings,
          <source>Trees, and Sequences: Computer Science and Computational Biology</source>
          . Cambridge University Press, New York, NY, USA (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Hulth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Improved automatic keyword extraction given more linguistic knowledge</article-title>
          .
          <source>In: Proceedings of the 2003 conference on Empirical methods in natural language processing</source>
          . pp.
          <fpage>216</fpage>
          -
          <lpage>223</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hulth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Tag clouds: Data analysis tool or social signaller?</article-title>
          <source>In: Proceedings of the Hawaii International Conference on System Sciences</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Lloyd</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kechagias</surname>
          </string-name>
          , D., S, S.:
          <article-title>Lydia: A system for large-scale news analysis</article-title>
          . Springer Berlin Heidelberg (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Mitrofanova</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaharov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Automatic analysis of terminology in the russian text corpus on corpus linguistics (in russian)</article-title>
          .
          <source>In: Conference Proceedings Computational Linguistics and Intellectual Technologies</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Page</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motwani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Winograd</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>The pagerank citation ranking: Bringing order to the web</article-title>
          .
          <source>Stanford InfoLab</source>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Pampapathi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mirkin</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levene</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A suffix tree approach to anti-spam email filtering</article-title>
          .
          <source>Machine Learning</source>
          <volume>65</volume>
          (
          <issue>1</issue>
          ),
          <fpage>309</fpage>
          -
          <lpage>338</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Shahaf</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suen</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jacobs</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leskovec</surname>
          </string-name>
          , J.:
          <article-title>Information cartography: Creating zoomable, large-scale maps of information</article-title>
          .
          <source>In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          . pp.
          <fpage>1097</fpage>
          -
          <lpage>1105</lpage>
          . KDD '13,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2013</year>
          ), http://doi.acm.
          <source>org/10</source>
          . 1145/2487575.2487690
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Can</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kazemzadeh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bar</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Narayanan</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A system for real-time twitter sentiment analysis of 2012 u.s. presidential election cycle</article-title>
          .
          <source>In: Proceedings of the ACL 2012 System Demonstrations</source>
          . pp.
          <fpage>115</fpage>
          -
          <lpage>120</lpage>
          . ACL '
          <volume>12</volume>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computational Linguistics, Stroudsburg, PA, USA (
          <year>2012</year>
          ), http://dl.acm.org/citation.cfm?id=
          <volume>2390470</volume>
          .
          <fpage>2390490</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>