SciNoon: Exploratory Search System for Scientific Groups Yaroslav Nedumov Anton Babichev ISP RAS ISP RAS Moscow, Russia Moscow, Russia yaroslav.nedumov@ispras.ru babichev@ispras.ru Ivan Mashonsky Natalia Semina ISP RAS ISP RAS Moscow, Russia Moscow, Russia ivan2110@ispras.ru semina@ispras.ru ABSTRACT 1 INTRODUCTION Exploratory search task poses three challenges to search engines: Exploration of a new topic is important task for many people. Stu- low specifity of the search goal, long duration of the search and dents and postgraduates have to learn state-of-the-art while work- hard to consume search results. Exploratory searches are iterative, ing on their first research project. R&D department researchers multi-tactical and better performed by groups. need to understand available task definitions and solutions while We present the demonstration of the first prototype of the ex- solving a customer problem. Reviewers often have to review a paper ploratory search system for scientific groups. Its main goal is to help that doesn’t perfectly fit their main scope of work and so they need with collection of scientific articles related to a scientific group’s to refresh their understanding of the adjacent field of study. current project. We tried to meet all exploratory search challenges Exploration of a new topic requires big time investments. In with focus on support of team work. the beginning you often don’t fully understand the task and don’t SciNoon provides a shared workspace where articles could be know right keywords to use. You have to ask someone for help or collected and annotated. The workspace could be visualized either use general purpose information sources like Wikipedia [2]. You as an interactive graphical research map or as a table. The research cannot quickly understand results of a search engine and you have map shows citation relations between articles and could be used to repeat search while improving your understanding of a domain. for better understanding of the structure of the field. The search Such search tasks with open-ended, persistent, and multi-faceted progress could be estimated using article coloring by values of problem context and opportunistic, iterative, and multi-tactical their attributes. SciNoon also simplifies keyword search extracting search process are called exploratory search tasks [12]. Exploratory possible keywords from already collected articles and integrating search tasks are hard and challenging for search engines which are them with existing search engines by the browser plugin. mostly intended for lookup search. Lookup search is focused on Using SciNoon the members of a scientific group can search, high precision but exploratory search needs high recall. Lookup collect, and process articles and get notifications about each other’s search lasts seconds but exploratory search can take weeks. Results progress by the chat bot. of lookup search are easy to consume but for exploratory search you need time for estimating its relevance. All these is particularly CCS CONCEPTS actual for academic search domain. There are specialized search engines for searching scientific • Information systems → Collaborative search; Digital libraries articles such as Google Scholar, Microsoft Academic or Semantic and archives; • Human-centered computing → Computer sup- Scholar. They have big databases and good text search engines ported cooperative work. but their support for exploration is quite limited. While query formulation support is good enough, search results exploration KEYWORDS as well as team work and long lasting searches are supported pretty Exploratory search, collaborative search, academic search engines bad. In our demo system1 we try to augment existing systems and ACM Reference Format: provide a user with a shared workspace which may be used by a Yaroslav Nedumov, Anton Babichev, Ivan Mashonsky, and Natalia Semina. team for collection and exploration of intermediate results. 2019. SciNoon: Exploratory Search System for Scientific Groups. In Joint Proceedings of the ACM IUI 2019 Workshops, Los Angeles, USA, March 20, 2019 2 SCINOON , 6 pages. The main use case supported by the current SciNoon prototype is the exploration of a new topic. According to the study [2] and our own examination of existing academic search systems the one of the most missing features is integrated support for collaboration. IUI Workshops’19, March 20, 2019, Los Angeles, USA Copyright ©2019 for the individual papers by the papers’ authors. Copying permitted There are three key features for collaborative search: awareness, for private and academic purposes. This volume is published and copyrighted by its division of labor, and persistence [8]. We are providing users with editors. 1 https://scinoon.com/research/esida-demo IUI Workshops’19, March 20, 2019, Los Angeles, USA Nedumov, et al. Figure 1: SciNoon user interface elements. a1−3 – time-based orbits of radial layout (from the old ones to the new ones), b – article’s node with navigational controls, c – layout managing dialog, d – clustering managing dialog, e – PDF upload drop zone, f – visualization of a cluster, g – research questionnaire managing dialog. the shared workspace (see figure 1) where articles could be collected installed the user will be able to click "Add to research map" button and processed and a set of corresponding tools. For maintaining from a search results page and selected articles will be added to the awareness we have implemented a chat bot which could be added workspace. into research group chat and then will report each team member Google Scholar provides query suggestions and "Related searches" activity. based on the current query. We augment this functionality by pro- Exploration of a new topic is the long-lasting iterative process viding research-specific terms. We assume that collected articles, including several activities: collection of potentially relevant arti- not the current query, explain what a user wants to find. SciNoon ex- cles, selection and reading of the most interesting ones, summation tracts terms from the collected articles using ComboBasic algorithm of read articles according to research-specific aspects. [1]. With SciNoon browser plugin extracted terms are integrated In the following subsections we will present tools supporting directly into Google Scholar pages and the user can either use them each of these activities implemented in SciNoon. for search alone or append to the current query. Alternative strategy for collecting articles is snowballing. Using data extracted from uploaded PDFs and cached data from Google 2.1 Collecting articles Scholar we maintain possibility to "expand" an article node adding Collecting data about unknown domain is challenging because of either citing articles or cited articles. quite unspecific search goal. A user doesn’t know what to search for The last tool for collecting articles is content-based recommen- and needs help for starting. It is a typical situation for exploratory dations. They require some amount of already collected articles. search task and there are well known partial solutions: query sug- There are four types currently implemented: gestion, dynamic queries, recommendations. In the case of scientific research one probably already has either (1) Most cited locally. Recommends articles which haven’t a couple of relevant articles obtained from scientific adviser or some been added yet but are highly cited by the articles from the keywords to start search from. map. It could be hard to spot such articles manually but it is If the user already has several relevant articles he or she can a trivial task for the system. upload them to the workspace. SciNoon will parse them, extract (2) Cutting edge. Aggregated "cited by" for all novel articles in metadata and full texts and display them in the workspace. In the the workspace. Could be used for finding novel research. background it will extract keywords which may be used later. (3) Old surveys. Recommends survey articles cited by the ar- If the user doesn’t have PDFs he or she probably will use any ticles from the research map. Could be used as a general search engine in order to find articles. SciNoon doesn’t maintain its domain knowledge source. own full text index of articles and instead integrates with Google (4) New surveys. Recommends survey articles citing articles Scholar. We implemented the plugin for Chromium-based browsers on the map. (checked on Chromium and Opera) which is able to grab data au- After the user has collected enough articles he or she will need tomatically from Google Scholar pages for the user. With plugin to further process them. The first task is selecting the article to SciNoon: Exploratory Search System for Scientific Groups IUI Workshops’19, March 20, 2019, Los Angeles, USA start from and here we provide the user with interactive graphical but could be saved in order to save the other group members interface called research map. time when the article looks relevant until full text reading. (3) Main contribution. Highlights most important for the re- 2.2 Selecting articles search contribution: survey, original method or experiments. Most probably this question should be adapted to the field The home page of research is called a research map view and dis- of study. plays already collected articles with citation links between them – (4) Readability. Ranges from 1 (hard to read) to 5 (easy to read). subgraph of the citation graph. Particularly useful for recommendation articles for students. There are three possible layouts: manual layout, force-based (5) Reproducibility. Subjective estimation of possibility to re- layout and radial layout. Using manual layout the user can place produce research, ranges from 1 (hard to reproduce) to 5 articles as he or she wishes. Force based layout takes into account (easy to reproduce). links between articles and moves connected articles close to each (6) Reliability. Subjective estimation of reliability of the article other. results. Our novel radial layout is similar to the time layout by Chen (7) Notes. Free text notes. [4], but we used polar coordinates in order to better deal with the larger amount of newer articles. So the oldest articles are placed into The user is free to use or drop them and can add additional the center and the newer ones are placed into concentric orbites questions if needed. depending on the publication year and citations. The position of Users can collect, select and process articles iteratively, pop- the articles inside the orbit are determined by citation links as in ulating the common workspace. They can work simultaneously the force layout, so a research field tends to form a sector. and independently, but can maintain awareness of each other’s Each article is represented by a rounded rectangle with several work using SciNoon’s Telegram chat-bot. SciNoon’s chat bot (@Sci- text compartments depending on the zoom level. There are four graphLoggerBot) allows subscription to various events from the zoom levels and corresponding views: research: adding new article, answering a question and so on. In the next section we provide examples how all this features could (1) 10000ft view could be used for understanding the general be used together. structure of the field and processing progress. Each article is represented as rounded corner square without text and with size depending on amount of citations. The whole research 3 USAGE SCENARIOS EXAMPLES graph could be seen at once. In this section we are going to show how different SciNoon features (2) At 1000ft view (Figure 2) there is the single line text com- could be used together in order to deal with two important tasks: partment with first author name and year of publication of directing student’s work and writing a review. the article. Couple of dozens of articles could be seen at once. (3) At 100ft view there are compartments for full list of authors 3.1 Maintaining students work (up to 10), article title and controls for graph expansion and Typical situation for exploratory search is giving a research task to search in Google Scholar. So only a couple of articles could a student that is interesting both to the student and his scientific be seen at once. adviser. The field of study is completely new to the student and the (4) At 10ft view there is also the compartment for the article scientific adviser is not so familiar with the given problem either. abstract. We can see mostly the single-article with only parts Moreover he is pretty busy to dive into details and do the research of neighbour articles. together with the student, so he gives him the basic understanding Each particular node at any zoom level could be manually ex- of the task and two-three articles to start from. The adviser prepares panded by double-clicking. new research map in SciNoon, creates initial list of questions for Using the questionnaire described in the next section the user questionnaire and setups group chat with the student and SciNoon can color article’s node depending on the answer to the selected chat bot for getting updates. question and then easily find articles with particular answer on the The student starts with studying the articles that his scientific research map. Using coloring by “familiarity” question the whole adviser gave him and adds them to the research map. While reading research progress could be estimated. articles he answers questions from the questionnaire. For example, "familiarity" question shows his progress in studying the articles, 2.3 Processing articles "relevance" question represents articles relevance to the given task, "notes" – is where the student puts his thoughts and summary Collection of articles assumes some further summation. In SciNoon about research. The adviser is notified by chat bot about each such we are providing the users with customizable questionnaire which update and is able to correct his student when needed. could be answered by each research group member for each article As the student continues his research he needs some more ar- in the research. By default there are seven questions: ticles. SciNoon helps him with search queries in Google Scholar (1) Familiarity. Assumes the following reading order: abstract, by showing keywords extracted from his research map and per- conclusion, brief reading, full text reading, complete under- sonal recommendations. It also provides him with forward citation standing of the whole article text. chaining either from SciNoon internal database or from Google (2) Relevance. Ranges from 1 (not relevant) to 5 (very relevant). Scholar for recent advances in the field along with backwards cita- Unrelevant articles probably should be deleted from the map, tion chaining for better understanding the field roots. For articles IUI Workshops’19, March 20, 2019, Los Angeles, USA Nedumov, et al. Figure 2: A fragment of research map at 1000ft view with one expanded article node and several unexpanded connected by citation links. Articles are colored according to their relevance from yellow to blue. Radial layout is used. with full text available backwards citation chaining is also available. The other features for article collection also could be used. Since the adviser is notified about his student progress he can easily correct student’s article selection mistakes. So, after working together in such manner some time the team will have a list of relevant articles and answers for questions important for their research. As the last step all work could be exported to the csv file. 3.2 Writing review The second frequent case where fast exploration of a new topic could be needed is writing a critical review. In this case preliminary familiarity with the topic is much higher, but anyway you need to recall the exact topic of the article and freshen your knowledge regarding recent advances in the field. SciNoon probably can help. A reviewer can add article’s PDF to the fresh research map and then do one backwards citation chaining step using "Cites" button on the added article. Following this by "Cutting edge" recommen- dation all competing articles could be easily found. In the next section we will briefly explain SciNoon’s internals. 4 SYSTEM DESIGN SciNoon is the client-server web-based application. The server part has modular design and is based on Play framework2 and Akka3 . The client part comprised in rich web application and browser plugin. Client and server are communicating via HTTP API. 2 https://www.playframework.com/ 3 https://akka.io/ Figure 3: Architecture of SciNoon SciNoon: Exploratory Search System for Scientific Groups IUI Workshops’19, March 20, 2019, Los Angeles, USA 4.1 SciNoon server annotating articles, mind mapping module for managing all infor- Play framework assumes MVC architecture. Since we use rich client mation, word processing module for writing articles and reference the view component is reduced and consists of trivial HTML tem- manager for managing bibliography. Docear is an amazing demo, plate and JavaScript code of the client building the main part of but it is not built for search and this is single-user application miss- HTML in browser. ing any collaboration features which makes it difficult to use it in We use graph-based datamodel so all data is represented either research groups. as nodes or as edges. Main types of nodes are articles, scientists, In the works [9, 10] the authors describe the IntentRadar search answers and researches. Nodes have properties depending on their system specifically designed for exploratory search for scientific ar- type. For example article node has title, scientist node has first name ticles. The main idea is to model user’s search intents by keywords and so on. There are several types of edges: cites directed edge starts and interactively evolve them getting relevance information from from citing article node and ends in cited article node, author edge the specially designed user interface. The authors proves efficiency connects article with its author and so on. We use JanusGraph4 of the proposed technique in the series of experiments. The devel- graph database with Cassandra backend for persistence. oped interactive user interface covers both query formulation and HTTP API is divided into several controllers (look at Figure 3) search results exploration tasks and so could be very helpful for the doing data conversion and passing data for processing into Akka exploration of a new topic. However, as well as Docear, IntentRadar actor system for asynchronous processing. is a single-user application. We use external systems for extracting metadata and bibliogra- There is some research regarding exploration of particular fields phy from PDFs: CERMINE [11] and GROBID [6]. of study: science mapping. One of the most known science mapping tool is CiteSpace [5]. This is standalone Java application implement- 4.2 SciNoon client ing methods for co-citation analysis enabling the users with the possibility to reveal the structure of the field and emerging trends. SciNoon’s client part is written in Typescript language which is CiteSpace doesn’t maintain its own database of articles and should translated into JavaScript. We use d3js and Bootstrap frameworks. be provided with data exported from Web of Science. This compli- There are several interconnected modules for getting data from the cates usage of CiteSpace for exploratory search but we are going server, drawing research map, processing user’s input and so on. to implement some methods of co-citation analysis in the future SciNoon’s browser plugin is written in JavaScript using Web versions of SciNoon in order to make them available for interactive Extensions API. It integrates with SciNoon and Google Scholar use by a team of scientists. sites using content scripts. 5 RELATED WORK There are several well-known search engines for scientists such as Google Scholar, Microsoft Academic, Semantic Scholar, aMiner, CiteSeerX, PubMed and the others. There are specialized social 6 CONCLUSION networks like ResearchGate or Academia.edu also providing some Exploratory search is a complex task challenging search systems in search possibilities. At last many digital libraries provide search many ways. tools on their sites. But their support for exploratory search task is We developed SciNoon – exploratory search system for scientific quite limited. groups providing unique combination of tools helping exploring However there are several systems and approaches less known new domains. but better suited for exploratory search (in general or for scientific To collect articles, we provide the user with three tools: nav- articles). In this section we will describe some of them. igation on citation graph enabling possibility to do snowballing, SearchTogether [8] was aimed at the task very similar to ours: integration with the Google Scholar search engine (with custom small-group collaborative searching. This system was general pur- keywords generation) and recommendations, based on already col- pose and was build on top of classic search engines integrating them lected articles. together with instant messaging and recommendations, and provid- To help the user to select the most interesting articles we im- ing a common workspace. The authors showed that searching with plemented interactive graphical interface visualizing the citation SearchTogether is more efficient than searching without it. Unfor- graph of the collected articles. It supports several graph layouts and tunately the project didn’t evolve and after some time with the rise different level of details. We also proposed the new radial layout of social networks was claimed by the authors to be outdated [7]. for easier overview of a research. Despite this fact we believe that its focus on awareness, division The user can create questionnaire specific for his or her research of labor and persistence will be much more useful for professional and then fill it in for each collected article. Article nodes will be users such as scientists and R&D specialists. colored depending on the given answers. Interesting approach for organization of a scientist’s workspace All research team members could work simultaneously and in- was proposed by Beel et al [3]. Their tool, Docear, was developed dependently and they will be notified of each other’s activity by as "Microsoft Office for scientists". It contains several modules: dig- SciNoon’s chat bot. ital library module providing access to research articles, research All this makes our system a helpful companion to existing full module providing keyword search, PDF viewer for reading and text search academic search engines enabling possibility to do last- 4 http://janusgraph.org/ ing research by the several team members together. IUI Workshops’19, March 20, 2019, Los Angeles, USA Nedumov, et al. 7 ACKNOWLEDGMENTS arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/asi.21309 [6] Patrice Lopez. 2009. GROBID: Combining Automatic Bibliographic Data Recogni- Denis Turdakov, Nikita Astrakhantsev and Anna Loik for reading tion and Term Extraction for Scholarship Publications. In Research and Advanced early versions of the paper. The reported study was partially funded Technology for Digital Libraries, Maristella Agosti, José Borbinha, Sarantos Kap- idakis, Christos Papatheodorou, and Giannis Tsakonas (Eds.). Springer Berlin by RFBR according to the research project 17-07-00978 A. Heidelberg, Berlin, Heidelberg, 473–474. [7] Meredith Ringel Morris. 2013. Collaborative Search Revisited. In Proceedings of REFERENCES the 2013 Conference on Computer Supported Cooperative Work (CSCW ’13). ACM, New York, NY, USA, 1181–1192. https://doi.org/10.1145/2441776.2441910 [1] Nikita Astrakhantsev. 2015. Methods and software for terminology extraction from [8] Meredith Ringel Morris and Eric Horvitz. 2007. SearchTogether: An Interface for domain-specific text collection. Ph.D. Dissertation. Ph. D. thesis, Institute for Collaborative Web Search. In Proceedings of the 20th Annual ACM Symposium System Programming of Russian Academy of Sciences. on User Interface Software and Technology (UIST ’07). ACM, New York, NY, USA, [2] Kumaripaba Athukorala, Eve Hoggan, Anu Lehtiö, Tuukka Ruotsalo, and Giulio 3–12. https://doi.org/10.1145/1294211.1294215 Jacucci. 2013. Information-seeking behaviors of computer scientists: Challenges [9] Tuukka Ruotsalo, Giulio Jacucci, Petri Myllymäki, and Samuel Kaski. 2015. Inter- for electronic literature search tools. Proceedings of the Association for Information active intent modeling: Information discovery beyond search. Commun. ACM Science and Technology 50, 1 (2013), 1–11. 58, 1 (2015), 86–92. [3] Joeran Beel, Bela Gipp, Stefan Langer, and Marcel Genzmehr. 2011. Docear: An [10] Tuukka Ruotsalo, Jaakko Peltonen, Manuel JA Eugster, Dorota Głowacka, Patrik Academic Literature Suite for Searching, Organizing and Creating Academic Floréen, Petri Myllymäki, Giulio Jacucci, and Samuel Kaski. 2018. Interactive Literature. In Proceedings of the 11th Annual International ACM/IEEE Joint Con- Intent Modeling for Exploratory Search. ACM Transactions on Information Systems ference on Digital Libraries (JCDL ’11). ACM, New York, NY, USA, 465–466. (TOIS) 36, 4 (2018), 44. https://doi.org/10.1145/1998076.1998188 [11] Dominika Tkaczyk, Paweł Szostek, Mateusz Fedoryszak, Piotr Jan Dendek, and [4] Chaomei Chen. 2006. CiteSpace II: Detecting and visualizing emerging trends and Łukasz Bolikowski. 2015. CERMINE: automatic extraction of structured meta- transient patterns in scientific literature. Journal of the Association for Information data from scientific literature. International Journal on Document Analysis Science and Technology 57, 3 (2006), 359–377. and Recognition (IJDAR) 18, 4 (01 Dec 2015), 317–335. https://doi.org/10.1007/ [5] Chaomei Chen, Fidelia Ibekwe-SanJuan, and Jianhua Hou. 2010. The s10032-015-0249-8 structure and dynamics of cocitation clusters: A multiple-perspective coc- [12] Ryen W White and Resa A Roth. 2009. Exploratory search: Beyond the query- itation analysis. Journal of the American Society for Information Science response paradigm. Synthesis lectures on information concepts, retrieval, and and Technology 61, 7 (2010), 1386–1409. https://doi.org/10.1002/asi.21309 services 1, 1 (2009), 1–98.