=Paper= {{Paper |id=Vol-1137/LAK14CLA_submission_6 |storemode=property |title=A Web-based Tool for Communication Flow Analysis of Online Chats |pdfUrl=https://ceur-ws.org/Vol-1137/LAK14CLA_submission_6.pdf |volume=Vol-1137 |dblpUrl=https://dblp.org/rec/conf/lak/HoppeGSC14 }} ==A Web-based Tool for Communication Flow Analysis of Online Chats== https://ceur-ws.org/Vol-1137/LAK14CLA_submission_6.pdf
                        A Web-based Tool for
              Communication Flow Analysis of Online Chats
           H. Ulrich Hoppe                                Tilman Göhnert                             Christopher Charles
                                                           Laura Steinert
                                                   University of Duisburg-Essen
                                                         Lotharstr. 63/65
                                                    47048 Duisburg, Germany
                                                  {hoppe, goehnert, charles,
                                                     steinert}@collide.info

GOALS AND PREMISES                                                   log, the Chat-PageRank agent, which applies a page rank
Chat as a communication medium has its own characteristics that      calculation to an ECG, the Chat-MainPathAnalysis agent for
need to be considered, especially regarding turn taking and          performing a main path analysis on an ECG, and the Chat-
interactional coherence. Following suggestions by Suthers et al.     Visualization agent, which gives a visual representation of an
[8; 9], operational rules are used as a basis to detect general      ECG and of analysis results connected to that ECG. As a
dependencies or “contingencies”. Indicators may use lexical-         programming language for these agents Python was used together
semantic features but also time lapses between contributions play    with the NLTK library1, which allows natural language
a crucial role in determining which utterances are to be linked      processing. Furthermore, the igraph network analysis library2 was
with each other. This paper introduces a web-based system that       used for analyzing and visualizing graphs. Figure 1 shows an
automatically analyzes text chat logs for dependencies between       example workflow that is based on the modules described above.
posts and constructs a contingency graph from it. This graph is
then used to further analyze the communication flow in the           EMPIRICAL RESULTS
underlying text chat.                                                So far, the automatically generated ECGs were compared to
                                                                     manually constructed graphs using results reported in [8] as a
Our approach reconstructs and extends the above mentioned
                                                                     reference. This comparison yielded an F-score based similarity of
approach of “contingency analysis”: First, the approach is refined
                                                                     83 percent compared to a 97 percent F-score similarity between
by incorporating the concept of dialogue act tagging [6; 11] to
                                                                     two manually generated graphs. Although this leaves room for
enrich the basic set of indicators and to exploit existing
                                                                     improvement, the similarity values show that the automatically
techniques of linguistic processing. Second, in order to analyze
                                                                     generated ECGs agree to a reasonable degree with contingencies
the information flow in the graph, methods such as main path
                                                                     detected by humans. This is further backed up by the inter-
analysis (MPA) [3] enriched by information gathered from the
                                                                     network comparison, where the majority of metrics show highly
web search algorithms PageRank [7] and HITS [4] are applied.
                                                                     positive correlations for the different graphs based on the same
While the latter two have been used frequently outside of their
                                                                     chat log. Looking at the inter-network correlations between the
original domain, MPA has not been applied to chat networks
                                                                     individual metrics, it becomes clear that the rankings have
before.
                                                                     different informative values based on their concepts of
                                                                     unidirectional influence (PageRank and input domain),
IMPLEMENTATION                                                       bidirectional centrality (main paths) and mutual enforcement
The implementation uses a network analytics workbench that           between two classes of nodes (hubs and authorities).
combines a web-interface for easily defining analysis workflows
using a visual language with a multi-agent system as the
computational backend [1]. The communication platform is based
                                                                     THE VISUAL REPRESENTATION OF
on SQLSpaces [10] and implements a blackboard architecture,          WORKFLOWS
mediating between the user interface, the computational backend,     Our workbench facilitates the interactive construction of analysis
and the analysis agents. The underlying communication protocol       workflows in a kind of visual programming approach: The
is based on exchanging information through tuples placed on the      “analyst” users may pull together data sources, processing units
blackboard (i.e., an SQLSpace). Each analysis step is performed      (“filters”), and export modules for visual rendering or download
by an individual agent. This architecture allows for an easy         to form a workflow. Workflows can be shared between analysts
extension of the workbench by adding further processing agents       and can be re-used with different data sets and/or modified. We
that can be programmed in several different languages, including     believe that the level of visual representation of these workflows
Java, R, Python, and Prolog.                                         also provides an adequate reference for discussing the underlying
                                                                     processing schemes without entering into too much technical
In previous applications the workbench had already been used to      detail.
analyze the evolution of knowledge in wiki environments [2] by
incorporating “main path analysis” [3] as an analytic method. In
order to analyze chat logs, a number of additional agents have
                                                                     1
been added. Among these are the Chat-ECGBuilder agent, which             http://nltk.org/
constructs the extended contingency graph (ECG) based on a chat      2
                                                                         http://igraph.sourceforge.net/
                                                 Figure 1. Workbench with ECG filters
Figure 1 shows a workflow in which six different agents are used.           The 2013 IEEE/ACM International Conference on Advances
In this example the first agent loads a chat log and relays it to a         in Social Networks Analysis and Mining (ASONAM 2013).
second agent. Here the log is transformed into an ECG. The third      [2] Halatchliyski, I., Hecking, T., Göhnert. T., and Hoppe, H. U.
component uses that graph to perform a main path analysis. In the         2013. Analyzing the flow of ideas and profiles of contributors
next step the page rank of each node is calculated. The final             in an open learning community. In Proceedings of the Third
results are then visualized in another component and made                 International Conference on Learning Analytics and
available to the user.                                                    Knowledge (LAK '13). ACM, New York, NY, 66-74.
                                                                      [3] Hummon, N. P. and Doreian, P. 1989. Connectivity in a
SUMMARY AND OUTLOOK                                                       citation network: The development of DNA theory. Social
The current version’s rules for detecting contingencies try to form       Networks, 11:39-63.
a balance between sophistication and simplicity. Typing mistakes
are quite common in text chats, yet they are not corrected.           [4]    Kleinberg, J. M. 1999. Authoritative sources in a
Additionally, in order to measure similarity between posts only a           hyperlinked environment. Journal of the ACM, 46(5):604-
removal of stop words and a stemmer are applied. Further                    632.
lemmatization, e.g. by WordNet [5], might improve the detection       [5] Miller, G. A. 1995. Wordnet: a lexical database for English.
of semantic cohesion, yet could also increase the risk of                 Communications of the ACM, 38(11):39-41.
erroneously detected contingencies. In order to avoid such false
                                                                      [6] Stolcke, A., Ries, K., Coccaro, N., Shriberg, E., Bates, R.,
contingencies, simplicity was chosen over sophistication in this
                                                                          Jurafsky, D., Taylor, P., Martin, R., Ess-Dykema, C. V., and
case.
                                                                          Meteer, M. 2000. Dialogue act modeling for automatic
As workflows in the analysis workbench are based on a modular             tagging and recognition of conversational speech.
concept and the technical platform supports adding additional             Computational Linguistics, 26:339-373.
features to it easily, variations of workflows are encouraged. The    [7] Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The
workflow presented here could be modified by adding new input             PageRank citation ranking: Bringing order to the web.
components (e.g. for newsgroup dumps), analysis components                Technical report, Stanford Digital Library Technologies
(e.g. for pattern detection), or new output components (e.g.              Project, Stanford Unversity, Stanford, CA, USA.
alternative visualizations, reports or graph formats).
                                                                      [8] Suthers, D. D. and Desiato, C. 2012. Exposing chat features
In future works linguistic features such as coreference resolution        through analysis of uptake between contributions. In
could help detecting and filtering existing contingencies. So far         Proceedings of HICSS 2012. IEEE Computer Society, 3368-
text chats without any explicit threading information have been           3377.
analyzed. However, it could be interesting to incorporate user
                                                                      [9] Suthers, D. D., Dwyer, N., Medina, R., and Vatraou, R.
generated threading information such as it is given in forums or
                                                                          2010. A framework for conceptualizing, representing, and
newsgroups. In these systems it is often only allowed to explicitly
                                                                          analyzing distributed interaction. Int. Journal of Computer-
reference one other message, but by using lexical coherence,
                                                                          Supported Collaborative Learning, 5(1): 5-42.
author name referencing and syntactical patterns, further
dependencies might be detected.                                       [10] Weinbrenner, S. 2012. SQLSpaces - a platform for flexible
                                                                           language-heterogeneous multi-agent systems. Dr. Hut.
REFERENCES                                                            [11] Wu, T., M. Khan, F., A. Fisher, T., A. Shuler, L., and M.
[1] Göhnert, T., Harrer A., Hecking T., and Hoppe H. U. 2013.              Pottenger, W. 2005. Posting act tagging using
    A Workbench to Construct and Re-use Network Analysis                   transformation-based learning. Foundations of Data Mining
    Workflows - Concept, Implementation, and Example Case.                 and Knowledge Discovery, 6:319-331