=Paper=
{{Paper
|id=Vol-1137/LAK14CLA_submission_6
|storemode=property
|title=A Web-based Tool for Communication Flow Analysis of Online Chats
|pdfUrl=https://ceur-ws.org/Vol-1137/LAK14CLA_submission_6.pdf
|volume=Vol-1137
|dblpUrl=https://dblp.org/rec/conf/lak/HoppeGSC14
}}
==A Web-based Tool for Communication Flow Analysis of Online Chats==
A Web-based Tool for Communication Flow Analysis of Online Chats H. Ulrich Hoppe Tilman Göhnert Christopher Charles Laura Steinert University of Duisburg-Essen Lotharstr. 63/65 47048 Duisburg, Germany {hoppe, goehnert, charles, steinert}@collide.info GOALS AND PREMISES log, the Chat-PageRank agent, which applies a page rank Chat as a communication medium has its own characteristics that calculation to an ECG, the Chat-MainPathAnalysis agent for need to be considered, especially regarding turn taking and performing a main path analysis on an ECG, and the Chat- interactional coherence. Following suggestions by Suthers et al. Visualization agent, which gives a visual representation of an [8; 9], operational rules are used as a basis to detect general ECG and of analysis results connected to that ECG. As a dependencies or “contingencies”. Indicators may use lexical- programming language for these agents Python was used together semantic features but also time lapses between contributions play with the NLTK library1, which allows natural language a crucial role in determining which utterances are to be linked processing. Furthermore, the igraph network analysis library2 was with each other. This paper introduces a web-based system that used for analyzing and visualizing graphs. Figure 1 shows an automatically analyzes text chat logs for dependencies between example workflow that is based on the modules described above. posts and constructs a contingency graph from it. This graph is then used to further analyze the communication flow in the EMPIRICAL RESULTS underlying text chat. So far, the automatically generated ECGs were compared to manually constructed graphs using results reported in [8] as a Our approach reconstructs and extends the above mentioned reference. This comparison yielded an F-score based similarity of approach of “contingency analysis”: First, the approach is refined 83 percent compared to a 97 percent F-score similarity between by incorporating the concept of dialogue act tagging [6; 11] to two manually generated graphs. Although this leaves room for enrich the basic set of indicators and to exploit existing improvement, the similarity values show that the automatically techniques of linguistic processing. Second, in order to analyze generated ECGs agree to a reasonable degree with contingencies the information flow in the graph, methods such as main path detected by humans. This is further backed up by the inter- analysis (MPA) [3] enriched by information gathered from the network comparison, where the majority of metrics show highly web search algorithms PageRank [7] and HITS [4] are applied. positive correlations for the different graphs based on the same While the latter two have been used frequently outside of their chat log. Looking at the inter-network correlations between the original domain, MPA has not been applied to chat networks individual metrics, it becomes clear that the rankings have before. different informative values based on their concepts of unidirectional influence (PageRank and input domain), IMPLEMENTATION bidirectional centrality (main paths) and mutual enforcement The implementation uses a network analytics workbench that between two classes of nodes (hubs and authorities). combines a web-interface for easily defining analysis workflows using a visual language with a multi-agent system as the computational backend [1]. The communication platform is based THE VISUAL REPRESENTATION OF on SQLSpaces [10] and implements a blackboard architecture, WORKFLOWS mediating between the user interface, the computational backend, Our workbench facilitates the interactive construction of analysis and the analysis agents. The underlying communication protocol workflows in a kind of visual programming approach: The is based on exchanging information through tuples placed on the “analyst” users may pull together data sources, processing units blackboard (i.e., an SQLSpace). Each analysis step is performed (“filters”), and export modules for visual rendering or download by an individual agent. This architecture allows for an easy to form a workflow. Workflows can be shared between analysts extension of the workbench by adding further processing agents and can be re-used with different data sets and/or modified. We that can be programmed in several different languages, including believe that the level of visual representation of these workflows Java, R, Python, and Prolog. also provides an adequate reference for discussing the underlying processing schemes without entering into too much technical In previous applications the workbench had already been used to detail. analyze the evolution of knowledge in wiki environments [2] by incorporating “main path analysis” [3] as an analytic method. In order to analyze chat logs, a number of additional agents have 1 been added. Among these are the Chat-ECGBuilder agent, which http://nltk.org/ constructs the extended contingency graph (ECG) based on a chat 2 http://igraph.sourceforge.net/ Figure 1. Workbench with ECG filters Figure 1 shows a workflow in which six different agents are used. The 2013 IEEE/ACM International Conference on Advances In this example the first agent loads a chat log and relays it to a in Social Networks Analysis and Mining (ASONAM 2013). second agent. Here the log is transformed into an ECG. The third [2] Halatchliyski, I., Hecking, T., Göhnert. T., and Hoppe, H. U. component uses that graph to perform a main path analysis. In the 2013. Analyzing the flow of ideas and profiles of contributors next step the page rank of each node is calculated. The final in an open learning community. In Proceedings of the Third results are then visualized in another component and made International Conference on Learning Analytics and available to the user. Knowledge (LAK '13). ACM, New York, NY, 66-74. [3] Hummon, N. P. and Doreian, P. 1989. Connectivity in a SUMMARY AND OUTLOOK citation network: The development of DNA theory. Social The current version’s rules for detecting contingencies try to form Networks, 11:39-63. a balance between sophistication and simplicity. Typing mistakes are quite common in text chats, yet they are not corrected. [4] Kleinberg, J. M. 1999. Authoritative sources in a Additionally, in order to measure similarity between posts only a hyperlinked environment. Journal of the ACM, 46(5):604- removal of stop words and a stemmer are applied. Further 632. lemmatization, e.g. by WordNet [5], might improve the detection [5] Miller, G. A. 1995. Wordnet: a lexical database for English. of semantic cohesion, yet could also increase the risk of Communications of the ACM, 38(11):39-41. erroneously detected contingencies. In order to avoid such false [6] Stolcke, A., Ries, K., Coccaro, N., Shriberg, E., Bates, R., contingencies, simplicity was chosen over sophistication in this Jurafsky, D., Taylor, P., Martin, R., Ess-Dykema, C. V., and case. Meteer, M. 2000. Dialogue act modeling for automatic As workflows in the analysis workbench are based on a modular tagging and recognition of conversational speech. concept and the technical platform supports adding additional Computational Linguistics, 26:339-373. features to it easily, variations of workflows are encouraged. The [7] Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The workflow presented here could be modified by adding new input PageRank citation ranking: Bringing order to the web. components (e.g. for newsgroup dumps), analysis components Technical report, Stanford Digital Library Technologies (e.g. for pattern detection), or new output components (e.g. Project, Stanford Unversity, Stanford, CA, USA. alternative visualizations, reports or graph formats). [8] Suthers, D. D. and Desiato, C. 2012. Exposing chat features In future works linguistic features such as coreference resolution through analysis of uptake between contributions. In could help detecting and filtering existing contingencies. So far Proceedings of HICSS 2012. IEEE Computer Society, 3368- text chats without any explicit threading information have been 3377. analyzed. However, it could be interesting to incorporate user [9] Suthers, D. D., Dwyer, N., Medina, R., and Vatraou, R. generated threading information such as it is given in forums or 2010. A framework for conceptualizing, representing, and newsgroups. In these systems it is often only allowed to explicitly analyzing distributed interaction. Int. Journal of Computer- reference one other message, but by using lexical coherence, Supported Collaborative Learning, 5(1): 5-42. author name referencing and syntactical patterns, further dependencies might be detected. [10] Weinbrenner, S. 2012. SQLSpaces - a platform for flexible language-heterogeneous multi-agent systems. Dr. Hut. REFERENCES [11] Wu, T., M. Khan, F., A. Fisher, T., A. Shuler, L., and M. [1] Göhnert, T., Harrer A., Hecking T., and Hoppe H. U. 2013. Pottenger, W. 2005. Posting act tagging using A Workbench to Construct and Re-use Network Analysis transformation-based learning. Foundations of Data Mining Workflows - Concept, Implementation, and Example Case. and Knowledge Discovery, 6:319-331