=Paper=
{{Paper
|id=Vol-1637/paper_3
|storemode=property
|title=Visualising Text Co-occurrence Networks
|pdfUrl=https://ceur-ws.org/Vol-1637/paper_3.pdf
|volume=Vol-1637
|authors=Laurie Hirsch,Simon Andrews
|dblpUrl=https://dblp.org/rec/conf/iccs/HirschA16
}}
==Visualising Text Co-occurrence Networks==
Visualising Text Co-occurrence Networks
Laurie Hirsch
Simon Andrews
Sheffield Hallam University
Abstract. We present a tool for automatically generating a visual summary of
unstructured text data retrieved from documents, web sites or social media feeds.
Unlike tools such as word clouds, we are able to visualise structures and topic
relationships occurring in a document. These relationships are determined by a
unique approach to co-occurrence analysis. The algorithm applies a decaying
function to the distance between word pairs found in the original text such that
words regularly occurring close to each other score highly, but even words oc-
curring some distance apart will make a small contribution to the overall co-oc-
currence score. This is in contrast to other algorithms which simply count adja-
cent words or use a sliding window of fixed size. We show, with examples, how
the network generated can be presented in tree or graph format. The tree format
allows for the user to interact with the visualisation and expand or contract the
data to a preferred level of detail. The tool is available as a web application and
can be viewed using any modern web browser.
1 Background
Visual representations have proved to be useful alternatives to linear text documents.
The mind mapping technique was introduced in the 1960s and is thought to encourage
learning. However, creating mind maps can be a complex and time-consuming under-
taking and the ability to automatically produce text visualisations has attracted signifi-
cant research in recent decades. A number of possible benefits have been attributed to
such tools including managing information overload, providing summaries and ‘im-
pression formation’. Tools have been developed for identifying topics and topic corre-
lations, displaying knowledge and generating concept clouds [1][2]. Here we will
briefly outline a number of existing techniques and then show how we have developed
a method based on word co-occurrence which can be used for generating both graphs
and trees in various types of diagram. Here we include a number of example visuali-
sations, all of which are based on the text of a paper concerning conceptual struc-
tures[3]1
1 Available at http://www.jfsowa.com/pubs/ca4cs.pdf It may help the reader to briefly read the
article before viewing the visualisations.
19
1.1 Word Clouds
Although many systems are formed using user provided tags, there has been significant
interest in ‘word tags’ or ‘text tags’ which are automatically generated using the text
found in documents or web sites. The popular tool Wordle [4] has seen a steady increase
in usage and many variations have been made available. Word clouds are based on the
frequency of individual words found in the available text after stop word removal. The
most frequent words are selected and then presented using various techniques to adjust
font, colour, size and position, in a way that is pleasing and useful to the user. The
words are often sorted alphabetically, although various systems of arrangement have
been proposed and attempts have been made to place similar words together. Word
clouds are simple and are commonly presented on web sites with little or no explanation
of how they should be used or interpreted. A word cloud of the Sowa text can be seen
in Figure 12
Fig. 1. Word Cloud
A commonly cited issue with word clouds is that they can hinder understanding due to
the fact that they lack information about the relationships between words.
1.2 Tree Clouds
Trees have been presented as an easy to read and meaningful format and the term
'tree cloud' has been proposed [5]. A freely available system which generates trees
based on the semantic distance between words derived from the original text is also
2 Created using the tool at https://www.jasondavies.com/wordcloud/
20
available and gives the user an indication of the relationship between the key terms in
the visualisation. The Sowa text produces the tree cloud shown in figure 2 3
Fig. 2. Tree Cloud
The tree cloud includes colouring, font sizes and arcs to indicate relationships between
topics.
2 Description of the System
In this section we will describe how our system known as txt2vz
(http://txt2vz.appspot.com/) works and will compare visualisations produced with other
text visualisation tools.
2.1 Pre-processing
To reduce dimensionality of the document(s) all words are placed in lower case, stop
words are removed and stemming applied, such that only the most frequent form of a
word is preserved.
2.2 Significance Measure.
We define a measure of significance for a pair (P, Q) of words, based on the number of
occurrences of (P, Q), or more specifically the co-occurrences and the distance between
3 Using the tool at http://treecloud.univ-mlv.fr/cgi-bin/NuageArbore_EN.cgi#
21
P and Q where the distance between P and Q is defined to be the number of words
between P and Q:
M
significance( P, Q) B
dis tan ce ( PQi )
i 1 (1)
where M is the number of co-occurrences of P and Q; is the distance between P and Q
in the ith co-occurrence; 0