<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Topic-aware Network Visualisation to Explore Large Email Corpora</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tim Repke</string-name>
          <email>tim.repke@hpi.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ralf Krestel</string-name>
          <email>ralf.krestel@hpi.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Hasso Plattner Institute</institution>
          ,
          <addr-line>Potsdam</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>104</fpage>
      <lpage>107</lpage>
      <abstract>
        <p>Nowadays, more and more large datasets exhibit an intrinsic graph structure. While there exist special graph databases to handle ever increasing amounts of nodes and edges, visualising this data becomes infeasible quickly with growing data. In addition, looking at its structure is not suficient to get an overview of a graph dataset. Indeed, visualising additional information about nodes or edges without cluttering the screen is essential. In this paper, we propose an interactive visualisation for social networks that positions individuals (nodes) on a two-dimensional canvas such that communities defined by social links (edges) are easily recognisable. Furthermore, we visualise topical relatedness between individuals by analysing information about social links, in our case email communication. To this end, we utilise document embeddings, which project the content of an email message into a high dimensional semantic space and graph embeddings, which project nodes in a network graph into a latent space reflecting their relatedness.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        In our modern information society we produce substantial amounts
of data each day. A large portion of it comes from the
communication on social media platforms or through emails. Special graph
databases enable the eficient storage of these large
communication networks and provide interfaces to query or analyse the data.
Visualising networks in their entirety on the other hand is a very
challenging task. Users investigating a communication network
want to find information about when does who communicate
with whom about what. These kind of networks can be found in
many diferent shapes. Modern social networks, such as Twitter
or Facebook exhibit similar structures as classic, ofline social
networks [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. We investigate another type of social network: a
collection of emails.
      </p>
      <p>
        Given the communication data over a year or more, it is
practically impossible to gain an overview or quick insights into
the latent network structure with a basic approach as shown in
1. Also, in such a traditional network visualisation, information
about the content of messages sent between individuals is lost.
Besides these traditional systems, more exotic approaches use
the metaphor of geographical maps [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] to visualise networks,
for example using topology to reflect connectivity of densely
connected social communities. The map analogy can also be used
to visualise the contents of documents by embedding them into
a high dimensional semantic space [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and projecting it on the
map as a document landscape. In order to highlight how
relationships form and change based on the interactions, the metaphor
of a growing tree ca be used (ContactTrees [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]). Although this
reflects temporal aspects of dynamic networks well, it focuses on
one person as the root, thus an overview of the entire network
is lost. CactusTrees [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] on the other hand represent hierarchical
structures with the goal of untangling overlaid bundles of
intersecting edges, making distant connections more apparent. As
higher order dependencies may get lost in traditional
visualisations, HoNVis [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] adds nodes to encode dependencies in chains
of interactions. Usually, a communication network has many
nodes and overlapping connections already, so Yang et al. [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]
rather focus on discovering overlapping cores to improve the
identification of community boundaries to highlight global latent
structures. Similarly, Gronemann et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] use the metaphor of
islands and hills to visualise clustered graphs, making densely
connected communities clearly noticeable. The edges are bundled
and follow valleys of the resulting topology, thus making
relationships between other communities hard to follow. MapSets [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
assume a graph that was laid out using embeddings reflecting
communities. An algorithm then draws regions around clusters
of nodes, such that the bounding shapes are contiguous and
non-overlapping, but yet abstract. Another approach to visualise
networks at full scale is to aggregate nodes based on their
spatial distribution and thereby allowing for a simple exploration
with contour lines and heatmap overlays to emphasise latent
structures as proposed by Hildenbrand et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        Document visualisation aims to visualise the content, such that
users gain quick insights into topics, latent phrases, or trends.
Tiara [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] extracts topics and derives time-sensitive keywords
to depict evolving subjects over time as stacked plots. Other
approaches project documents into a latent space, using topic
models or embeddings. Creating scatter-plots of embedded
documents of a large corpus may result in a very dense and unclear
layout, so Chen et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] developed an algorithm to reduce
overfull visualisations by picking representative documents. A
different approach is taken by Fortuna et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], who do not show
documents directly, but generate a heatmap of the populated
canvas and overlay it with salient phrases at more densely
populated areas from the underlying documents in that region. Friedl
et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] extend that concept by drawing clear lines between
regions and colouring them. They also add edges between salient
phrases based on co-occurrences in the texts. Most recently
Cartograph [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] was proposed, which is visually very similar to
previous approaches, but uses pre-rendered information of
different resolution and map technology to enable a responsive
interactive visualisation. Regions are coloured based on
underlying ontologies from a knowledge-base.
      </p>
      <p>Our goal is to merge approaches for network and document
visualisations in one interactive user interface. This means to
integrate multiple dimensions of email datasets including time,
interactions, users and topics into a 2D map representation.
Giving an overview over latent structures and topics in one map
may significantly improve the exploration of a corpus by users
unfamiliar with the domain and terminology. Also domain
experts could benefit from such an overview, e.g. by easily being
able to identify global patterns in the data.</p>
      <p>
        A specific application scenario that could benefit from such
integrated, interactive visualisations is the analysis of large,
unstructured, heterogeneous data collections. Data-driven
journalism [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] often has to deal with leaked, unstructured, very
heterogeneous data, e.g. in the context of the Panama Papers, where
journalists needed to untangle and order huge amounts of
information, search entities, and visualise found patterns [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Similar
datasets are of interest in the context of computational
forensics [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Auditing firms and law enforcement need to sift through
huge amounts of data to gather evidence of criminal activity,
often involving communication networks and documents [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>INTERACTIVE VISUALISATION</title>
      <p>Systems for document exploration largely vary in what they
display and how users interact with them. This depends partly
on the available raw data, but also on information extracted from
pre-processing or enrichment with external sources. 1 shows
a basic visualisation of the network graph extracted from an
email corpus. Although it is an improvement over only listing
connections, large densely connected graphs quickly become
hard to read and information about the email contents is lost.</p>
      <p>Exploring document collections can be seen as a top-down
approach, where the system provides abstract overviews of the
entire document collection and users incrementally refine the
search, narrowing the results to just a few documents of interest.
Such a top-down approach may help users without prior
knowledge to get a sense for the data by visualising high level latent
structures of communication networks or the topical
distributions.</p>
      <p>In the scope of this work we primarily consider documents to
be emails or data attached to them. The sender, recipients, time,
and content can directly be extracted from the raw data. We call
these – and results from further processing – dimensions that can
be visualised. From the contents one may infer named entities,
topics, embeddings, or salient phrases, while the communication
network spanned by sender-recipient pairs can be used to detect
salient structures and hierarchies. The temporal information
enables the previously mentioned data to be analysed over time
to detect evolving or changing patterns.</p>
      <p>There are numerous ways to visualise each dimension on
its own or in combination with others. The requirement of a
dimension and its priority in a visualisation is dictated by the
system objective. From the wide range of possibilities, we strive
for a system which supports the exploration of a large collection
of documents without any prior knowledge about its content and
individuals involved.</p>
      <p>In our system, we use the names and email addresses of senders
and recipients (individuals), communication network, semantic
vector representations of email contents, and as part of an overlay
the timestamps of emails and propose a graph layout over a
document landscape that visually describes who talks with whom
about what at a given time period.
3</p>
    </sec>
    <sec id="sec-3">
      <title>SYSTEM ARCHITECTURE</title>
      <p>
        Visualising communication networks in a topic-aware fashion to
explore documents and salient structures is not straightforward
Diferent layout objectives may produce contradicting results and
the challenges of processing big data need to be addressed [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In
this section, we describe algorithmic approaches behind the
system we are working on. For a discussion of engineering aspects
on how to store, serve, and render the map-like data, we refer to
the Cartograph stack [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], as we will focus on the process how
to get the information that the map is generated from.
      </p>
      <p>We visualise the embedded emails as dots in a two-dimensional
landscape in which individuals are placed as nodes connected by
edges. All emails between two individuals are reduced into one
edge reducing visual complexity and making it easier to detect
salient structures. However, that comes with the trade-of that
nodes and edges cannot be perfectly placed in the landscape to
cover all semantic aspects of the communication between them,
but rather an estimate. Our very early prototype placed some
individuals with no dominant topic in a crowded area in the
centre of the landscape as shown in 2, where colours of opaque
dots for emails correspond to that of the sender. Although the
network visualisation at this point does not make connections
more clear than in 1, users can already distinguish individuals
with similar or unrelated topics.</p>
      <p>Our proposed algorithm to find a stable network layout has
three stages, namely an (i) initialisation phase which creates
the landscape and roughly places nodes and connections, an
(ii) update phase which iteratively updates the node placement
towards a better fit, and finally a (iii) post processing phase where
edges become splines to make latent structures more clear and a
map topology is added.</p>
      <p>
        Initialising the Landscape. To generate the document
landscape, we first process the network graph to roughly determine
regions, where documents will be placed. Therefore we apply
node2vec [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] to the communication network and embed each
individual’s node. We separate the graph into communities Pi ⊂ P
using the kernel density of the resulting populated space at
1301-1301
threshold κ, where a higher κ results in more, but smaller
communities. For each community Pi , pairwise neighbourhood
similarities are calculated using euclidean distance betwe14-1e4n nodes,
forming the triangular matrix Si , where skl is the similarity
between pk , pl ∈ Pi .
      </p>
      <p>
        Furthermore, we train document embeddings [
        <xref ref-type="bibr" rid="ref15 ref25">15, 25</xref>
        ] on all
904-904
2096-a2096ntic
vecemails and use them to infer high dimensional sem
tor representations. Let Mi be the set of emails that originated
in community Pi . For each email m ∈ Mi , the dimensionality
is reduced using t-SNE [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], which retains possible semantic
clusterings of documents in the higher dimensional space. The
resulting two-dimensional vectors are then placed as dots on the
map using the centre of embedded network communities as the
respective origin, whereas the size is determined by the number
of related individuals.
      </p>
      <p>We also initialise communication network’s layout. Thereby,
the staring position of a node representing an individual is
determined by the normalised sum of two-dimensional vectors of
all emails he or she has sent or received. This way, we
implicitly group semantically related individuals into communities as
frequent communication biases this normalised sum. Straight
edges are added between the nodes if the respective individuals
exchanged emails. Note, that many edges may only represent
a small number of emails. Applying a variable threshold σ can
reduce the computational load in later stages, as these edges will
not impact the overall layout very much. They can be added
again as the user requests a detailed visualisation by zooming in
or through other interactions.</p>
      <p>In the algorithm’s second stage, we iteratively try to improve
the layout of the communication network by finding a balance
between the closeness of nodes to semantic context and densely
connected neighbourhoods a node belongs to. Therefore, for
each individual pj ∈ P we use linear regression to fit a line mMpj
though all two-dimensional vectors of emails he or she has sent
or received. As a node is placed near this line, it remains in a
semantically good position.</p>
      <p>Adjusting the Network Layout. The first stage of our proposed
algorithm produces a fixed document landscape and roughly fits
the communication network on top. We now aim to incrementally
adapt the layout of the graph to better reflect salient structures
in the network while keeping each individual’s node close to the
reflective semantic area in the landscape.</p>
      <p>Therefore we define a score quantifying how well the current
layout fits these objectives:</p>
      <p>X " #
η d (pi , mMpi ) + X θ si j − d (pi , pj )
(1)
pi ∈P
pj ∈P
(2)
(3)
(4)
δ⃗js := (pmj − pj ) ∥pmj − pj ∥
where pmj is the closest point on mMpj to pj and ∥·∥ denotes the
euclidean norm, while neigh b602-o602 urhood gradient is defined by
δ⃗jn := X
ppjk∼∈pPk 247-247</p>
      <p>(pj − pk ) ∥pj − pk ∥ − sjk
where pj ∼ pk denotes that an edge exists between pj and pk .</p>
      <p>With the definitions in 2 and 3, we can formulate the update
vector δ⃗j for node pj ∈ P as</p>
      <p>δ⃗j := ξ θδ⃗jn + ηδ⃗js
where ξ is the learning rate and θ , η as before parameters to
weight between a better semantic or neighbourhood fit.</p>
      <p>Most likely, complex network structures might prevent the
stochastic gradient descent to find a stable minimum, so the score
of the objective function should be monitored or intermediate
layouts be visually evaluated to determine a satisfactory result.</p>
      <p>
        Post Processing. Lastly, we use the post processing stage to
enhance the readability of our visualisation. Densely connected
communities in the graph are potentially hard to read, thus we
apply edge bundling [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to visually clear latent structures. We
also apply MapSets [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] to separate the regions for each
community. Since semantically similar emails may appear in diferent
communities, we apply colouring based on clusters in the original
global document embedding space to retain this aspect. Choosing
the colours depends on the number of latent topics that should
be depicted [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. If the topic number exceeds 25-30 topics,
grouping topics and allowing for zooming within a two-level
topichierarchy ensures distinguishable colors for up to 10 subtopics
(25 × 10 = 25). In order to represent temporal aspects of the data,
we calculate the kernel density of the document landscape for
ifxed time-intervals, which can be used to add heat-map overlays
that users can select later on.
      </p>
      <p>580-580
2208-2208
2523-2523
1717-1717
970-9276048-2648
85-85
1764-1764
2665-2665
168
37
2622-26
12564793-12564793
4</p>
    </sec>
    <sec id="sec-4">
      <title>CONCLUSION AND VISION</title>
      <p>In this paper, we described an algorithm to lay out a
communication network on top of a landscape of semantically
embedded emails. This is still work in progress, thus 3 shows only a
manually drawn mock-up of the visualisation we envision. In it,
individuals are represented as nodes positioned such that densely
connected communities are visually clustered. Edges describe the
email trafic, where the opacity and thickness is used to indicate
the frequency of messages between the nodes they connect.</p>
      <p>The semantic representations of emails are used to place dots
on a background layer which we call the document landscape.
This landscape is used as additional input to the graph layout
algorithm, aiming to place a node within corresponding semantic
regions. The colouring of regions in the landscape is derived from
densely connected communities in the communication graph.
Optionally, representative words are selected for densely populated
areas in the landscape, so that users get a rough idea about
subjects in that area. The aforementioned timestamps of emails can
be used to generate a heatmap overlay to show the activity in a
certain time interval which is controlled by a slider. Similar to
modern geographical maps, zooming into a region reveals more
details. In our case, less prominent individuals and their
connections are shown along with additional salient phrases from the
document landscape. Selecting a node will not only highlight
connected edges but may also temporarily show more edges which
were previously hidden at that zoom level. The user will also be
able to retrieve documents with the help of a selection rectangle
or clicking dots in the document landscape.</p>
      <p>In future work, we hope to evaluate this system using
fullscale real-world data as well as practitioners from journalism and
auditing. It may also be interesting to experiment with embedding
methods, which take both the emails and the network graph as
input and directly project the inferred representations into the
two-dimensional landscape to simplify the proposed algorithm.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Benjamin</given-names>
            <surname>Bach</surname>
          </string-name>
          , Nathalie Henry Riche, Christophe Hurter, Kim Marriott, and
          <string-name>
            <given-names>Tim</given-names>
            <surname>Dwyer</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Towards unambiguous edge bundling: Investigating conlfuent drawings for network visualization</article-title>
          .
          <source>Transactions on Visualization and Computer Graphics</source>
          <volume>23</volume>
          ,
          <issue>1</issue>
          (
          <year>2017</year>
          ),
          <fpage>541</fpage>
          -
          <lpage>550</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Nikos</given-names>
            <surname>Bikakis</surname>
          </string-name>
          and
          <string-name>
            <given-names>Timos</given-names>
            <surname>Sellis</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Exploration and visualization in the web of big linked data: A survey of the state of the art</article-title>
          .
          <source>In Proceedings of the Workshops of the EDBT/ICDT 2016 Joint Conference</source>
          , Vol.
          <volume>1558</volume>
          . CEUR-WS.org.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Marie-Anne Chabin</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Panama papers: a case study for records management</article-title>
          ?
          <source>Brazilian Journal of Information Science: Research Trends</source>
          <volume>11</volume>
          ,
          <issue>4</issue>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Yanhua</given-names>
            <surname>Chen</surname>
          </string-name>
          , Lijun Wang,
          <string-name>
            <surname>Ming Dong</surname>
            , and
            <given-names>Jing</given-names>
          </string-name>
          <string-name>
            <surname>Hua</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Exemplar-based visualization of large document corpus</article-title>
          .
          <source>Transactions on Visualization and Computer Graphics</source>
          <volume>15</volume>
          ,
          <issue>6</issue>
          (
          <year>2009</year>
          ),
          <fpage>1161</fpage>
          -
          <lpage>1168</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Mark</given-names>
            <surname>Coddington</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Clarifying journalismâĂŹs quantitative turn: A typology for evaluating data journalism, computational journalism, and computerassisted reporting</article-title>
          .
          <source>Digital Journalism</source>
          <volume>3</volume>
          ,
          <issue>3</issue>
          (
          <year>2015</year>
          ),
          <fpage>331</fpage>
          -
          <lpage>348</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Tommy</given-names>
            <surname>Dang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Angus</given-names>
            <surname>Forbes</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>CactusTree: A tree drawing approach for hierarchical edge bundling</article-title>
          .
          <source>In Proc. of the Pacific Visualization Symposium . IEEE</source>
          ,
          <fpage>210</fpage>
          -
          <lpage>214</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Alon</given-names>
            <surname>Efrat</surname>
          </string-name>
          , Yifan Hu, Stephen G Kobourov,
          <article-title>and</article-title>
          <string-name>
            <given-names>Sergey</given-names>
            <surname>Pupyrev</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>MapSets: Visualizing Embedded and Clustered Graphs</article-title>
          .
          <source>Journal of Graph Algorithms and Applications</source>
          <volume>19</volume>
          ,
          <issue>2</issue>
          (
          <year>2015</year>
          ),
          <fpage>571</fpage>
          -
          <lpage>593</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Blaz</given-names>
            <surname>Fortuna</surname>
          </string-name>
          , Marko Grobelnik, and
          <string-name>
            <given-names>Dunja</given-names>
            <surname>Mladenic</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Visualization of text document corpus</article-title>
          .
          <source>Informatica</source>
          <volume>29</volume>
          ,
          <issue>4</issue>
          (
          <year>2005</year>
          ),
          <fpage>497</fpage>
          -
          <lpage>502</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Katrin</given-names>
            <surname>Franke and Sargur N Srihari</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Computational forensics: Towards hybrid-intelligent crime investigation</article-title>
          .
          <source>In International Symposium on Information Assurance and Security. IEEE</source>
          ,
          <fpage>383</fpage>
          -
          <lpage>386</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Fried</surname>
          </string-name>
          and Stephen G Kobourov.
          <year>2014</year>
          .
          <article-title>Maps of computer science</article-title>
          .
          <source>In Proc. of the Pacific Visualization Symposium . IEEE</source>
          ,
          <fpage>113</fpage>
          -
          <lpage>120</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Martin</given-names>
            <surname>Gronemann</surname>
          </string-name>
          and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Jünger</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Drawing clustered graphs as topographic maps</article-title>
          .
          <source>In Proc. of the Symposium on Graph Drawing and Network Visualization</source>
          . Springer,
          <fpage>426</fpage>
          -
          <lpage>438</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Aditya</given-names>
            <surname>Grover</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jure</given-names>
            <surname>Leskovec</surname>
          </string-name>
          .
          <year>2016</year>
          . node2vec:
          <article-title>Scalable feature learning for networks</article-title>
          .
          <source>In Proc. of the Conference on Knowledge Discovery and Data Mining. ACM</source>
          ,
          <volume>855</volume>
          -
          <fpage>864</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Jan</surname>
            <given-names>Hildenbrand</given-names>
          </string-name>
          , Arlind Nocaj, and
          <string-name>
            <given-names>Ulrik</given-names>
            <surname>Brandes</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Flexible Level-ofDetail Rendering for Large Graphs</article-title>
          .
          <source>In Proc. of the Symposium on Graph Drawing and Network Visualization</source>
          . Springer,
          <fpage>625</fpage>
          -
          <lpage>627</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Mukundan</surname>
            <given-names>Karthik</given-names>
          </string-name>
          , Mariappan Marikkannan, and
          <string-name>
            <given-names>Arputharaj</given-names>
            <surname>Kannan</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>An intelligent system for semantic information retrieval information from textual web documents</article-title>
          .
          <source>In International Workshop on Computational Forensics</source>
          . Springer,
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Quoc</given-names>
            <surname>Le</surname>
          </string-name>
          and
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Distributed representations of sentences and documents</article-title>
          .
          <source>In Proc. of the International Conference on Machine Learning. PMLR</source>
          ,
          <fpage>1188</fpage>
          -
          <lpage>1196</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Laurens</surname>
            <given-names>van der Maaten and Geofrey</given-names>
          </string-name>
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Visualizing data using t-SNE</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>9</volume>
          ,
          <issue>11</issue>
          (
          <year>2008</year>
          ),
          <fpage>2579</fpage>
          -
          <lpage>2605</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Cheong-Iao</surname>
          </string-name>
          <string-name>
            <given-names>Pang</given-names>
            ,
            <surname>Robert P Biuk-Aghai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Muye</given-names>
            <surname>Yang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Bin</given-names>
            <surname>Pang</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Creating realistic map-like visualisations: Results from user studies</article-title>
          .
          <source>Journal of Visual Languages and Computing</source>
          <volume>43</volume>
          (
          <year>2017</year>
          ),
          <fpage>60</fpage>
          -
          <lpage>70</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Arnaud</surname>
            <given-names>Sallaberry</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang-chih Fu</surname>
          </string-name>
          ,
          <string-name>
            <surname>Hwai-Chung Ho</surname>
          </string-name>
          , and
          <string-name>
            <surname>Kwan-Liu Ma</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Contact trees: Network visualization beyond nodes and edges</article-title>
          .
          <source>PLOS ONE 11</source>
          ,
          <issue>1</issue>
          (
          <year>2016</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Shilad</surname>
            <given-names>Sen</given-names>
          </string-name>
          , Anja Beth Swoap,
          <string-name>
            <given-names>Qisheng</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Brooke</given-names>
            <surname>Boatman</surname>
          </string-name>
          , Ilse Dippenaar, Rebecca Gold, Monica Ngo, Sarah Pujol,
          <string-name>
            <surname>Bret Jackson</surname>
            ,
            <given-names>and Brent</given-names>
          </string-name>
          <string-name>
            <surname>Hecht</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Cartograph: Unlocking Spatial Visualization Through Semantic Enhancement</article-title>
          .
          <source>In Proc. of Conference on Intelligence User Interfaces</source>
          .
          <source>ACM</source>
          ,
          <volume>179</volume>
          -
          <fpage>190</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Kaveri</surname>
            <given-names>Subrahmanyam</given-names>
          </string-name>
          , Stephanie M Reich, Natalia Waechter, and
          <string-name>
            <given-names>Guadalupe</given-names>
            <surname>Espinoza</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Online and ofline social networks: Use of social networking sites by emerging adults</article-title>
          .
          <source>Journal of Applied Developmental Psychology</source>
          <volume>29</volume>
          ,
          <issue>6</issue>
          (
          <year>2008</year>
          ),
          <fpage>420</fpage>
          -
          <lpage>433</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Jun</surname>
            <given-names>Tao</given-names>
          </string-name>
          , Jian Xu,
          <string-name>
            <given-names>Chaoli</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <surname>Nitesh</surname>
            <given-names>V</given-names>
          </string-name>
          <string-name>
            <surname>Chawla</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>HoNVis: Visualizing and Exploring Higher-Order Networks</article-title>
          .
          <source>In Proc. of the Pacific Visualization Symposium. IEEE</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Furu</surname>
            <given-names>Wei</given-names>
          </string-name>
          , Shixia Liu, Yangqiu Song, Shimei Pan,
          <string-name>
            <surname>Michelle X Zhou</surname>
            , Weihong Qian, Lei Shi,
            <given-names>Li</given-names>
          </string-name>
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>and Qiang</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>TIARA: a visual exploratory text analytic system</article-title>
          .
          <source>In Proc. of the Conference on Knowledge Discovery and Data Mining. ACM</source>
          ,
          <volume>153</volume>
          -
          <fpage>162</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Jaewon</given-names>
            <surname>Yang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jure</given-names>
            <surname>Leskovec</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Overlapping communities explain core-periphery publisher of networks</article-title>
          .
          <source>Proc. IEEE</source>
          <volume>102</volume>
          ,
          <issue>12</issue>
          (
          <year>2014</year>
          ),
          <fpage>1892</fpage>
          -
          <lpage>1902</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Achim</surname>
            <given-names>Zeileis</given-names>
          </string-name>
          , Kurt Hornik, and
          <string-name>
            <given-names>Paul</given-names>
            <surname>Murrell</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Escaping RGBland: selecting colors for statistical graphics</article-title>
          .
          <source>Computational Statistics &amp; Data Analysis</source>
          <volume>53</volume>
          ,
          <issue>9</issue>
          (
          <year>2009</year>
          ),
          <fpage>3259</fpage>
          -
          <lpage>3270</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Zhaocheng</given-names>
            <surname>Zhu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Junfeng</given-names>
            <surname>Hu</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Context Aware Document Embedding</article-title>
          .
          <source>CoRR abs/1707</source>
          .01521 (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>