Ignet: A centrality and INO-based web system for analyzing and visualizing literature-mined networks Arzucan Özgür1*, Junguk Hur2*, Zuoshuang Xiang3*, Edison Ong3, Dragomir R. Radev3, Yongqun He3§ 1 Bogazici University, 34342 Istanbul, Turkey; 2 University of North Dakota, US; 3 University of Michigan, Ann Arbor, USA. ABSTRACT Based on the CONDL strategy, we have developed Ignet Ignet (Integrative Gene Network) is a web-based system for dynamical- (http://ignet.hegroup.org), a web-based literature mining ly updating and analyzing gene interaction networks mined using all Pub- Med abstracts. Four centrality metrics, namely degree, eigenvector, be- database system that stores gene-gene interactions extracted tweenness, and closeness are used to determine the importance of genes in from PubMed abstracts. A gene–gene interaction in this the networks. Different gene interaction types between genes are classified study corresponds to an interaction between genes and/or using the Interaction Network Ontology (INO) that classifies interaction gene products such as proteins. types in an ontological hierarchy along with individual keywords listed for each interaction type. An interactive user interface is designed to explore the interaction network as well as the centrality and ontology based net- 2 FEATURES AND USAGE work analysis. Availability: http://ignet.hegroup.org. Briefly, all article abstracts available in PubMed are re- trieved. The sentences, split by Java’s internal splitter 1 INTRODUCTION (BreakIterator), were examined using SciMiner (Hur et al., Many web systems exist for literature mining of gene inter- 2009) to identify gene names and interaction keyword(s) actions, e.g., Chilibot (http://www.chilibot.net/) and iHOP (e.g., interacts, binds, activates) represented in INO. We (http://www.ihop-net.org/UniPub/iHOP/). Some of these obtained the dependency parse trees of the sentences using tools mark the interaction keywords in the sentences. One the Stanford Parser (http://nlp.stanford.edu/software/lex- common obstacle is that these interaction keywords are not parser.shtml) and extracted the shortest dependency path classified; so detailed interaction types cannot be studied. between each pair of genes in a sentence. Our assumption is Ontology-based literature mining is an emerging research that the shortest path between two gene names in a depend- field that applies ontology to support literature mining. The ency tree is a good description of the semantic relation be- Interaction Network Ontology (INO) is a newly developed tween them in the corresponding sentence. We defined an interaction ontology that supports biomedical literature min- edit distance-based kernel function among these dependency ing (Hur et al., 2015). INO was initially developed to repre- paths and used support vector machines (SVM) in the SVMlight package (Joachims, 1999) to classify each path as sent over 800 interaction keywords (Ozgur et al., 2011), and describing an interaction between the gene pair or not their hierarchical structure using ontological format, and (Erkan et al., 2007). The value output by the decision func- more interaction terms were later added to INO with well- tion of the SVM classifier (i.e., the score field in Fig. 1B) defined axioms (Hur et al., 2015). can be used as a confidence score to measure the confidence In our previous studies, we also ranked the genes in the of association between two genes in a sentence. Positive literature-mined gene networks using different types of cen- score means that the SVM classifier predicts an “interac- tralities: degree centrality, eigenvector centrality, closeness tion”, whereas negative score corresponds to a prediction of centrality, and betweenness centrality (Ozgur et al., 2011). “not interaction”. The larger the absolute value of a score, Theses centralities measure different levels of importance. the more confidant the classifier is in the classification deci- For example, in betweenness centrality a node is considered sion. The higher the score of a sentence is, the more likely it important if it occurs on many shortest paths between other is that the sentence describes an interaction between the pair nodes, whereas in degree centrality a node is considered of genes. The current database contains only those interac- important if it is connected to many other nodes. tions with a positive SVM score. We have named our literature mining strategy Centrality and Ontology-based Network Discovery using Literature 3 IGNET USE CASE DEMONSTRATION data (CONDL) (Ozgur et al., 2011). CONDL was success- Ignet contains user-friendly web query interface (Fig. 1). A fully applied to extract and analyze IFN-γ and vaccine- user can query one gene or two genes. Each gene has its related gene interaction network as well as vaccine and fe- own centrality scores, which indicate its degree of im- ver-related gene interaction network (Hur et al., 2012). portance in a network. All sentences associated with the queries are obtained, with gene name and INO interaction * verbs highlighted. A click to a specific INO interaction verb These authors contributed equally. § To whom correspondence should be addressed: yongqunh@umich.edu 1 Ozgur et al. links to a page that shows the hierarchy of the INO verbs PubMed search to define the scope of papers for generating (Fig. 1C). the network, and use the Ignet execution pipeline to In addition, Ignet also includes a subprogram called generate gene-gene interactions and networks and calculate Dignet (http://ignet.hegroup.org/dignet), which applies centrality scores for genes in the networks. (B) (C) (A) Fig. 1. Ignet web query of literature mined gene interaction network. (A) The list of genes as the neighbors associated with IL2 in the vaccine context, and the automatically generated visualization graphic displaying the interactions among IL2 and its associated genes. Red circled are centrality scores and a confidence score (e.g., 4.5018) for ranking gene-gene interactions. (B) Once the edge be- tween IL2 and TNF is clicked, the publication records to support the interaction are shown in another page. In addition to the two gene/protein names, the interaction keywords (e.g., increase) are also shown. (C) Once the interaction word “increase” is clicked, the on- tology hierarchy of this term in INO is displayed in an Ontobee web page (Xiang et al., 2011). Erkan G, Ozgur A, Radev D: Semi-Supervised Classification for 4 SUMMARY Extracting Protein Interaction Sentences using Dependency Parsing. In Proceedings EMNLP-CoNLL. 2007: 228-237. Ignet is a web-based literature mining system that integrates Hur, J., Ozgur, A., Xiang, Z., and He, Y. (2012). Identification of the centrality-based literature mining approach with INO- fever and vaccine-associated gene interaction networks using based ontology analysis of interaction types. The gene-gene ontology-based literature mining. J Biomed Semantics 3, 18. relationships are extracted using machine learning methods Hur, J., Ozgur, A., Xiang, Z., and He, Y. (2015). Development and with the syntactic and semantic structures of the sentences. application of an interaction network ontology for literature To the best of our knowledge, Ignet is the first web system mining of vaccine-associated gene-gene interactions. J that provides centrality analysis for literature-mined gene Biomed Semantics 6, 2. Hur, J., Schuyler, A.D., States, D.J., and Feldman, E.L. (2009). interaction networks and ontology representation of SciMiner: web-based literature mining tool for target interaction types. Ignet not only provides access to identification and functional enrichment analysis. automatically extracted gene interactions, but it also enables Bioinformatics 25, 838-840. generations of new hypotheses (Özgür et al., 2010; Özgür et Joachims, T. (1999). "Making large-scale support vector machine al., 2011; Hur et al., 2012). learning practical," in Advances in Kernel Methods: Support Vector Learning, ed. C.J.B. B. Schölkopf, And A. J. Smola, ACKNOWLEDGEMENTS Eds. (Cambridge, MA.: MIT Press), 169-184. Ozgur, A., Xiang, Z., Radev, D.R., and He, Y. (2011). Mining of This work was supported by grant R01AI081062 from the vaccine-associated IFN-gamma gene interaction networks US NIH National Institute of Allergy and Infectious Diseas- using the Vaccine Ontology. J Biomed Semantics 2 Suppl 2, es and by Marie Curie FP7-Reintegration-Grants within the S8. 7th European Community Framework Programme. Xiang, Z., Mungall, C., Ruttenberg, A., and He, Y. (Year). "Ontobee: A linked data server and browser for ontology REFERENCES terms", in: The 2nd International Conference on Biomedical Ontologies (ICBO): CEUR Workshop Proceedings), Pages 279-281. 2