The AIDA Dashboard: Analysing Conferences with Semantic Technologies Simone Angioni1 , Angelo Salatino2 , Francesco Osborne2 , Diego Reforgiato Recupero1 , and Enrico Motta2 1 Department of Mathematics and Computer Science, University of Cagliari (Italy) {simone.angioni, diego.reforgiato}@unica.it 2 Knowledge Media Institute, The Open University, Milton Keynes (UK) {angelo.salatino, francesco.osborne, enrico.motta}@open.ac.uk Abstract. Scientific conferences play a crucial role in the field of Com- puter Science by promoting the cross-pollination of ideas and technolo- gies, fostering new collaborations, shaping scientific communities, and connecting research efforts from academia and industry. However, current systems for analysing research data do not provide a good representation of conferences. Specifically, these solutions do not allow to track research trends, to compare conferences in similar fields, and to analyse the in- volvement of industrial sectors. In order to address these limitations, we developed the AIDA Dashboard, a tool for exploring and making sense of scientific conferences which integrates statistical analysis, semantic technologies, and visual analytics. Keywords: Scholarly Data · Knowledge Graphs · Topic Detection · Bib- liographic Data · Scholarly Ontologies · Research Dynamics 1 Introduction Scientific conferences play a crucial role in the field of Computer Science by promoting the cross-pollination of ideas and technologies, fostering new collab- orations, shaping scientific communities, and connecting research efforts from academia and industry. For this reason, every significant field is usually associ- ated with multiple conferences that help defining its challenges and paradigms and to coordinate the effort of all the interested stakeholders. Therefore, understanding and monitoring Computer Science conferences is an important task for editors, researchers, companies, research policy makers and other users working in this space. Several applications and services already provide a wide variety of functionalities to support the exploration of research data and produce various kinds of analytics. These include Microsoft Academic Graph, Semantic Scholar, Scopus, Web of Science, OpenCitations, and many others. However, these systems tend to neglect conferences and offer only a very Copyright c 2020 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). limited set of relevant analytics, such as the number of papers or citations. In the first instance, they do not allow users to examine the trends of the relevant research topics. It is thus difficult to assess what are the research challenges that a conference is actually addressing and how its focus changed in time. Secondly, there is poor support in comparing conferences to determine the best performing ones in specific fields. For instance, we would like to know which are the main conferences in Semantic Web, how they compare in terms of average citations or other metrics, and how their performance changes in the last few years. A third limitation is that current systems do not report any analytics about the industry involvement. Conversely, it can be argued that conferences are the premium public venues in which industry and academia interact, hence monitoring these dynamics is critical for assessing a conference. In order to address these limitations, we developed the AIDA Dashboard, a tool for exploring and making sense of scientific conferences which integrates statistical analysis, semantic technologies, and visual analytics. The AIDA Dash- board was developed in collaboration with Springer Nature for assisting editors in assessing conferences, but it also supports several other use cases. It introduces three novel features that state-of-the-art systems are currently lacking. First, it associates to conferences a very granular representation of their topics from the Computer Science Ontology (CSO)[7] 3 and uses it to produce several analytics about its research trends over time. Second, it enables to easily compare and rank conferences according to several metrics within specific fields (e.g., Seman- tic Web) and time-frames (e.g., last five years). Finally, the AIDA Dashboard offers several features for assessing the involvement of industry in a conference. This includes the ability to focus on companies and their performance when assessing organizations, to report the ratio of publications and citations from academia, industry, collaborative efforts, and to distinguish industrial contri- butions according to 66 industrial sectors (e.g., automotive, financial, energy, electronics) from the Industrial Sectors Ontology (INDUSO)4 . A demo of AIDA Dashboard is currently available at http://w3id.org/aida/dashboard. 2 The AIDA Dashboard The AIDA Dashboard is a web application that allows users to visualize several kind of analytics about a specific conference (see Figure 1). The backend is developed in Python, while the frontend is in HTML5 and Javascript. The AIDA Dashboard builds on the Academia/Industry DynAmics [2,1] knowledge graph (AIDA)5 , a large knowledge base describing 14M articles and 8M patents in the field of Computer Science according to the research topics drawn from CSO. 4M articles and 5M patents are also classified according to the type of the author’s affiliations (academy, industry, or collaborative) and 66 industrial sectors drawn from INDUSO, which was specifically designed to 3 CSO - https://cso.kmi.open.ac.uk/ 4 INDUSO - http://w3id.org/aida/downloads/induso.ttl 5 AIDA - http://w3id.org/aida Fig. 1. The Overview of ISWC according to the AIDA Dashboard. support AIDA. AIDA was generated by integrating several knowledge graphs and bibliographic corpora, including Microsoft Academic Graph (MAG), Di- mensions, DBpedia, CSO, and the Global Research Identifier Database (GRID). The research papers were annotated with CSO topics using the CSO Clas- sifier [6]6 , which is a tool that uses part-of-speech tagging to identify promising terms and then exploits word embeddings to infer semantically related topics from CSO. In addition, to extract further relevant topics, the classifier includes also all their super topics according to the CSO. For instance, a paper tagged with Neural Networks would be assigned the topic Artificial Intelligence. This solution enables identifying high level topics that are not typically mentioned in the documents. The CSO Classifier powers the current version of the Smart Topic Miner [5], which is the application used by Springer Nature to semi-automatically annotate Proceedings books in the field of Computer Science. Since CSO is often updated, the set of topics used by AIDA is also evolving, constantly including new emerging topics. As an example, topics can be extended by using hyperlinks present in papers that might become Semantic Web entities or properties [3]. Each research article was also linked to the industrial sectors described in INDUSO by mapping the affiliations of the authors to their DBpedia entities, which in turn are mapped to INDUSO. For instance, an article that was written by authors who have Toyota as affiliation would be associated to the industrial sector Automotive. AIDA is available at http://w3id.org/aida under the CC-BY 4.0 license. It was recently used for supporting the generation of adavanced analytics about research dynamics and forecasting the impact of research topics on industry [4]. However, using these data was not easy for less technical-savvy users. AIDA 6 CSO Classifier - https://pypi.org/project/cso-classifier/ Dashboard is the first step in allowing users to access AIDA through a user- friendly but comprehensive interface. In order to support the AIDA Dashboard, we pre-computed a full set of analytics for each conference from AIDA-KG and store it in a JSON file that will be loaded by the web interface. This solution allows AIDA Dashboard to be extremely scalable, since for a given conference it needs to query the server only once, to retrieve its associated file. Every other operation is handled by the front-end. AIDA Dashboard is highly scalable and allows to browse the different facets of a conference according to seven tabs: Overview, Citation Analysis, Organizations, Authors, Topics, Similar Conferences, and Industry. Figure 1 shows the Overview tab. This is the main view of a conference that provides introductory information about its performance, the main authors and organization, and the conference rank in its main fields in terms of average citations for paper during the last five years. The Citation Analysis tab reports the evolution in time of several citation- based metrics such as the impact factor and the average citations for paper. It also shows the evolution of the rank and the percentile of the conference in different fields. For instance, the Conference on Neural Information Processing Systems (NeurIPS) is currently the second conference in terms of average ci- tations in Neural Network, the third in Machine Learning, and the twelfth in Artificial Intelligence. This visualization is typically used by Springer Nature ed- itors to assess the performance of conferences within different communities and to identify emerging conferences. The Organizations and Authors tabs show several analytics about the main institutions and researchers active in the conference. Organizations can be filtered according to their type (academia or industry) and are associated with their number of publications, citations, and average citations for paper. The researchers are associated with similar analytics, but also with their H-index and H5-index, in order to quickly identify high impact researchers. Editors use this information to understand the quality of researchers and organizations attracted by the conferences. This is particularly important for assessing relatively young conferences that may not have developed yet a strong citation record. The Topic tab allows users to analyse the topic trends in time. Specifically it shows two selections of topics: frequent topics and fingerprint topics. The first is the set of topics which appear more frequently in the conference. The second is the set of most distinctive topics of the conference. It is obtained by computing the difference between the topic distribution of the conference and the one of the full dataset. Preliminary analyses revealed that this second set is usually able to better represent the topics considered central to the conference. The Similar Conferences tab compares the conference under analysis with all the other conferences in the same fields according to their number of publica- tions, citations, and average citations for paper. The user can contextualise the comparison to different fields. For example, ISWC can be compared with all the other conferences in the fields of Semantic Web, Internet, or Computer Science. Finally, the Industry tab reports the percentage of publications and cita- tions from academia, industry, and collaborative efforts as well as the industrial sectors analysis. The latter shows the percentage of produced publications and citations received by companies in different sectors. For instance, the main in- dustrial sectors of ISWC are Computing and IT, Information Technology, Man- agement, Telecommunication, and Health Care. 3 Conclusions The current version of AIDA Dashboard already provides an array of interesting functionalities, many of which go beyond what is available in other current tools. Nevertheless, we are still at a relatively early stage and we are planning to introduce new ones. As first step, we plan to add a geographical tab for analysing the distribution of countries active in a conference. We also want to expand the set of entities that could be analysed by the dashboard, producing similar analytics also for journals, organizations, and scientific communities. Finally, we plan to perform a comprehensive user study with editors and researchers from different communities in order to assess the system and collect useful feedback. For such a purpose, to generalize the presented dashboard we only need to replace the CSO ontology with others within the domain under study. As such, we have already started working with the MeSH ontology within the bio-informatics domain to have our dashboard working in that domain as well. References 1. Angioni, S., Osborne, F., Salatino, A.A., Recupero, D.R., Motta, E.: Integrating knowledge graphs for comparing the scientific output of academia and industry. In: Proc. of the ISWC 2019 Satellite Tracks. CEUR Workshop Proceedings, vol. 2456, pp. 85–88 (2019) 2. Angioni, S., Salatino, A., Osborne, F., Reforgiato Recupero, D., Motta, E.: Integrat- ing knowledge graphs for analysing academia and industry dynamics. In: ADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium. Springer In- ternational Publishing, Cham (2020) 3. Presutti, V., Nuzzolese, A.G., Consoli, S., Gangemi, A., Recupero, D.R.: From hy- perlinks to semantic web properties using open knowledge extraction. Semantic Web 7(4), 351–378 (2016). https://doi.org/10.3233/SW-160221 4. Salatino, A., Osborne, F., Motta, E.: Researchflow: Understanding the knowledge flow between academia and industry. In: Knowledge Engineering and Knowledge Management. Springer International Publishing (2020) 5. Salatino, A.A., Osborne, F., Birukou, A., Motta, E.: Improving editorial workflow and metadata quality at springer nature. In: The Semantic Web – ISWC 2019. pp. 507–525. Springer International Publishing, Cham (2019) 6. Salatino, A.A., Osborne, F., Thanapalasingam, T., Motta, E.: The cso classifier: Ontology-driven detection of research topics in scholarly articles. In: Digital Li- braries for Open Knowledge. pp. 296–311. Springer International Publishing, Cham (2019) 7. Salatino, A.A., Thanapalasingam, T., Mannocci, A., Osborne, F., Motta, E.: The computer science ontology: a large-scale taxonomy of research areas. In: Interna- tional Semantic Web Conference. pp. 187–205. Springer (2018)