=Paper=
{{Paper
|id=Vol-2327/ESIDA3
|storemode=property
|title=SciNoon: Exploratory Search System for Scientific Groups
|pdfUrl=https://ceur-ws.org/Vol-2327/IUI19WS-ESIDA-3.pdf
|volume=Vol-2327
|authors=Yaroslav Nedumov,Anton Babichev,Ivan Mashonsky,Natalia Semina
|dblpUrl=https://dblp.org/rec/conf/iui/NedumovBMS19
}}
==SciNoon: Exploratory Search System for Scientific Groups==
SciNoon: Exploratory Search System for Scientific Groups
Yaroslav Nedumov Anton Babichev
ISP RAS ISP RAS
Moscow, Russia Moscow, Russia
yaroslav.nedumov@ispras.ru babichev@ispras.ru
Ivan Mashonsky Natalia Semina
ISP RAS ISP RAS
Moscow, Russia Moscow, Russia
ivan2110@ispras.ru semina@ispras.ru
ABSTRACT 1 INTRODUCTION
Exploratory search task poses three challenges to search engines: Exploration of a new topic is important task for many people. Stu-
low specifity of the search goal, long duration of the search and dents and postgraduates have to learn state-of-the-art while work-
hard to consume search results. Exploratory searches are iterative, ing on their first research project. R&D department researchers
multi-tactical and better performed by groups. need to understand available task definitions and solutions while
We present the demonstration of the first prototype of the ex- solving a customer problem. Reviewers often have to review a paper
ploratory search system for scientific groups. Its main goal is to help that doesn’t perfectly fit their main scope of work and so they need
with collection of scientific articles related to a scientific group’s to refresh their understanding of the adjacent field of study.
current project. We tried to meet all exploratory search challenges Exploration of a new topic requires big time investments. In
with focus on support of team work. the beginning you often don’t fully understand the task and don’t
SciNoon provides a shared workspace where articles could be know right keywords to use. You have to ask someone for help or
collected and annotated. The workspace could be visualized either use general purpose information sources like Wikipedia [2]. You
as an interactive graphical research map or as a table. The research cannot quickly understand results of a search engine and you have
map shows citation relations between articles and could be used to repeat search while improving your understanding of a domain.
for better understanding of the structure of the field. The search Such search tasks with open-ended, persistent, and multi-faceted
progress could be estimated using article coloring by values of problem context and opportunistic, iterative, and multi-tactical
their attributes. SciNoon also simplifies keyword search extracting search process are called exploratory search tasks [12]. Exploratory
possible keywords from already collected articles and integrating search tasks are hard and challenging for search engines which are
them with existing search engines by the browser plugin. mostly intended for lookup search. Lookup search is focused on
Using SciNoon the members of a scientific group can search, high precision but exploratory search needs high recall. Lookup
collect, and process articles and get notifications about each other’s search lasts seconds but exploratory search can take weeks. Results
progress by the chat bot. of lookup search are easy to consume but for exploratory search
you need time for estimating its relevance. All these is particularly
CCS CONCEPTS actual for academic search domain.
There are specialized search engines for searching scientific
• Information systems → Collaborative search; Digital libraries
articles such as Google Scholar, Microsoft Academic or Semantic
and archives; • Human-centered computing → Computer sup-
Scholar. They have big databases and good text search engines
ported cooperative work.
but their support for exploration is quite limited. While query
formulation support is good enough, search results exploration
KEYWORDS as well as team work and long lasting searches are supported pretty
Exploratory search, collaborative search, academic search engines bad.
In our demo system1 we try to augment existing systems and
ACM Reference Format: provide a user with a shared workspace which may be used by a
Yaroslav Nedumov, Anton Babichev, Ivan Mashonsky, and Natalia Semina. team for collection and exploration of intermediate results.
2019. SciNoon: Exploratory Search System for Scientific Groups. In Joint
Proceedings of the ACM IUI 2019 Workshops, Los Angeles, USA, March 20, 2019 2 SCINOON
, 6 pages.
The main use case supported by the current SciNoon prototype is
the exploration of a new topic. According to the study [2] and our
own examination of existing academic search systems the one of
the most missing features is integrated support for collaboration.
IUI Workshops’19, March 20, 2019, Los Angeles, USA
Copyright ©2019 for the individual papers by the papers’ authors. Copying permitted
There are three key features for collaborative search: awareness,
for private and academic purposes. This volume is published and copyrighted by its division of labor, and persistence [8]. We are providing users with
editors.
1 https://scinoon.com/research/esida-demo
IUI Workshops’19, March 20, 2019, Los Angeles, USA Nedumov, et al.
Figure 1: SciNoon user interface elements. a1−3 – time-based orbits of radial layout (from the old ones to the new ones), b –
article’s node with navigational controls, c – layout managing dialog, d – clustering managing dialog, e – PDF upload drop
zone, f – visualization of a cluster, g – research questionnaire managing dialog.
the shared workspace (see figure 1) where articles could be collected installed the user will be able to click "Add to research map" button
and processed and a set of corresponding tools. For maintaining from a search results page and selected articles will be added to the
awareness we have implemented a chat bot which could be added workspace.
into research group chat and then will report each team member Google Scholar provides query suggestions and "Related searches"
activity. based on the current query. We augment this functionality by pro-
Exploration of a new topic is the long-lasting iterative process viding research-specific terms. We assume that collected articles,
including several activities: collection of potentially relevant arti- not the current query, explain what a user wants to find. SciNoon ex-
cles, selection and reading of the most interesting ones, summation tracts terms from the collected articles using ComboBasic algorithm
of read articles according to research-specific aspects. [1]. With SciNoon browser plugin extracted terms are integrated
In the following subsections we will present tools supporting directly into Google Scholar pages and the user can either use them
each of these activities implemented in SciNoon. for search alone or append to the current query.
Alternative strategy for collecting articles is snowballing. Using
data extracted from uploaded PDFs and cached data from Google
2.1 Collecting articles
Scholar we maintain possibility to "expand" an article node adding
Collecting data about unknown domain is challenging because of either citing articles or cited articles.
quite unspecific search goal. A user doesn’t know what to search for The last tool for collecting articles is content-based recommen-
and needs help for starting. It is a typical situation for exploratory dations. They require some amount of already collected articles.
search task and there are well known partial solutions: query sug- There are four types currently implemented:
gestion, dynamic queries, recommendations.
In the case of scientific research one probably already has either (1) Most cited locally. Recommends articles which haven’t
a couple of relevant articles obtained from scientific adviser or some been added yet but are highly cited by the articles from the
keywords to start search from. map. It could be hard to spot such articles manually but it is
If the user already has several relevant articles he or she can a trivial task for the system.
upload them to the workspace. SciNoon will parse them, extract (2) Cutting edge. Aggregated "cited by" for all novel articles in
metadata and full texts and display them in the workspace. In the the workspace. Could be used for finding novel research.
background it will extract keywords which may be used later. (3) Old surveys. Recommends survey articles cited by the ar-
If the user doesn’t have PDFs he or she probably will use any ticles from the research map. Could be used as a general
search engine in order to find articles. SciNoon doesn’t maintain its domain knowledge source.
own full text index of articles and instead integrates with Google (4) New surveys. Recommends survey articles citing articles
Scholar. We implemented the plugin for Chromium-based browsers on the map.
(checked on Chromium and Opera) which is able to grab data au- After the user has collected enough articles he or she will need
tomatically from Google Scholar pages for the user. With plugin to further process them. The first task is selecting the article to
SciNoon: Exploratory Search System for Scientific Groups IUI Workshops’19, March 20, 2019, Los Angeles, USA
start from and here we provide the user with interactive graphical but could be saved in order to save the other group members
interface called research map. time when the article looks relevant until full text reading.
(3) Main contribution. Highlights most important for the re-
2.2 Selecting articles search contribution: survey, original method or experiments.
Most probably this question should be adapted to the field
The home page of research is called a research map view and dis-
of study.
plays already collected articles with citation links between them –
(4) Readability. Ranges from 1 (hard to read) to 5 (easy to read).
subgraph of the citation graph.
Particularly useful for recommendation articles for students.
There are three possible layouts: manual layout, force-based
(5) Reproducibility. Subjective estimation of possibility to re-
layout and radial layout. Using manual layout the user can place
produce research, ranges from 1 (hard to reproduce) to 5
articles as he or she wishes. Force based layout takes into account
(easy to reproduce).
links between articles and moves connected articles close to each
(6) Reliability. Subjective estimation of reliability of the article
other.
results.
Our novel radial layout is similar to the time layout by Chen
(7) Notes. Free text notes.
[4], but we used polar coordinates in order to better deal with the
larger amount of newer articles. So the oldest articles are placed into The user is free to use or drop them and can add additional
the center and the newer ones are placed into concentric orbites questions if needed.
depending on the publication year and citations. The position of Users can collect, select and process articles iteratively, pop-
the articles inside the orbit are determined by citation links as in ulating the common workspace. They can work simultaneously
the force layout, so a research field tends to form a sector. and independently, but can maintain awareness of each other’s
Each article is represented by a rounded rectangle with several work using SciNoon’s Telegram chat-bot. SciNoon’s chat bot (@Sci-
text compartments depending on the zoom level. There are four graphLoggerBot) allows subscription to various events from the
zoom levels and corresponding views: research: adding new article, answering a question and so on. In
the next section we provide examples how all this features could
(1) 10000ft view could be used for understanding the general
be used together.
structure of the field and processing progress. Each article is
represented as rounded corner square without text and with
size depending on amount of citations. The whole research 3 USAGE SCENARIOS EXAMPLES
graph could be seen at once. In this section we are going to show how different SciNoon features
(2) At 1000ft view (Figure 2) there is the single line text com- could be used together in order to deal with two important tasks:
partment with first author name and year of publication of directing student’s work and writing a review.
the article. Couple of dozens of articles could be seen at once.
(3) At 100ft view there are compartments for full list of authors 3.1 Maintaining students work
(up to 10), article title and controls for graph expansion and Typical situation for exploratory search is giving a research task to
search in Google Scholar. So only a couple of articles could a student that is interesting both to the student and his scientific
be seen at once. adviser. The field of study is completely new to the student and the
(4) At 10ft view there is also the compartment for the article scientific adviser is not so familiar with the given problem either.
abstract. We can see mostly the single-article with only parts Moreover he is pretty busy to dive into details and do the research
of neighbour articles. together with the student, so he gives him the basic understanding
Each particular node at any zoom level could be manually ex- of the task and two-three articles to start from. The adviser prepares
panded by double-clicking. new research map in SciNoon, creates initial list of questions for
Using the questionnaire described in the next section the user questionnaire and setups group chat with the student and SciNoon
can color article’s node depending on the answer to the selected chat bot for getting updates.
question and then easily find articles with particular answer on the The student starts with studying the articles that his scientific
research map. Using coloring by “familiarity” question the whole adviser gave him and adds them to the research map. While reading
research progress could be estimated. articles he answers questions from the questionnaire. For example,
"familiarity" question shows his progress in studying the articles,
2.3 Processing articles "relevance" question represents articles relevance to the given task,
"notes" – is where the student puts his thoughts and summary
Collection of articles assumes some further summation. In SciNoon about research. The adviser is notified by chat bot about each such
we are providing the users with customizable questionnaire which update and is able to correct his student when needed.
could be answered by each research group member for each article As the student continues his research he needs some more ar-
in the research. By default there are seven questions: ticles. SciNoon helps him with search queries in Google Scholar
(1) Familiarity. Assumes the following reading order: abstract, by showing keywords extracted from his research map and per-
conclusion, brief reading, full text reading, complete under- sonal recommendations. It also provides him with forward citation
standing of the whole article text. chaining either from SciNoon internal database or from Google
(2) Relevance. Ranges from 1 (not relevant) to 5 (very relevant). Scholar for recent advances in the field along with backwards cita-
Unrelevant articles probably should be deleted from the map, tion chaining for better understanding the field roots. For articles
IUI Workshops’19, March 20, 2019, Los Angeles, USA Nedumov, et al.
Figure 2: A fragment of research map at 1000ft view with one expanded article node and several unexpanded connected by
citation links. Articles are colored according to their relevance from yellow to blue. Radial layout is used.
with full text available backwards citation chaining is also available.
The other features for article collection also could be used.
Since the adviser is notified about his student progress he can
easily correct student’s article selection mistakes. So, after working
together in such manner some time the team will have a list of
relevant articles and answers for questions important for their
research. As the last step all work could be exported to the csv file.
3.2 Writing review
The second frequent case where fast exploration of a new topic
could be needed is writing a critical review. In this case preliminary
familiarity with the topic is much higher, but anyway you need
to recall the exact topic of the article and freshen your knowledge
regarding recent advances in the field. SciNoon probably can help.
A reviewer can add article’s PDF to the fresh research map and
then do one backwards citation chaining step using "Cites" button
on the added article. Following this by "Cutting edge" recommen-
dation all competing articles could be easily found.
In the next section we will briefly explain SciNoon’s internals.
4 SYSTEM DESIGN
SciNoon is the client-server web-based application. The server part
has modular design and is based on Play framework2 and Akka3 .
The client part comprised in rich web application and browser
plugin. Client and server are communicating via HTTP API.
2 https://www.playframework.com/
3 https://akka.io/ Figure 3: Architecture of SciNoon
SciNoon: Exploratory Search System for Scientific Groups IUI Workshops’19, March 20, 2019, Los Angeles, USA
4.1 SciNoon server annotating articles, mind mapping module for managing all infor-
Play framework assumes MVC architecture. Since we use rich client mation, word processing module for writing articles and reference
the view component is reduced and consists of trivial HTML tem- manager for managing bibliography. Docear is an amazing demo,
plate and JavaScript code of the client building the main part of but it is not built for search and this is single-user application miss-
HTML in browser. ing any collaboration features which makes it difficult to use it in
We use graph-based datamodel so all data is represented either research groups.
as nodes or as edges. Main types of nodes are articles, scientists, In the works [9, 10] the authors describe the IntentRadar search
answers and researches. Nodes have properties depending on their system specifically designed for exploratory search for scientific ar-
type. For example article node has title, scientist node has first name ticles. The main idea is to model user’s search intents by keywords
and so on. There are several types of edges: cites directed edge starts and interactively evolve them getting relevance information from
from citing article node and ends in cited article node, author edge the specially designed user interface. The authors proves efficiency
connects article with its author and so on. We use JanusGraph4 of the proposed technique in the series of experiments. The devel-
graph database with Cassandra backend for persistence. oped interactive user interface covers both query formulation and
HTTP API is divided into several controllers (look at Figure 3) search results exploration tasks and so could be very helpful for the
doing data conversion and passing data for processing into Akka exploration of a new topic. However, as well as Docear, IntentRadar
actor system for asynchronous processing. is a single-user application.
We use external systems for extracting metadata and bibliogra- There is some research regarding exploration of particular fields
phy from PDFs: CERMINE [11] and GROBID [6]. of study: science mapping. One of the most known science mapping
tool is CiteSpace [5]. This is standalone Java application implement-
4.2 SciNoon client ing methods for co-citation analysis enabling the users with the
possibility to reveal the structure of the field and emerging trends.
SciNoon’s client part is written in Typescript language which is
CiteSpace doesn’t maintain its own database of articles and should
translated into JavaScript. We use d3js and Bootstrap frameworks.
be provided with data exported from Web of Science. This compli-
There are several interconnected modules for getting data from the
cates usage of CiteSpace for exploratory search but we are going
server, drawing research map, processing user’s input and so on.
to implement some methods of co-citation analysis in the future
SciNoon’s browser plugin is written in JavaScript using Web
versions of SciNoon in order to make them available for interactive
Extensions API. It integrates with SciNoon and Google Scholar
use by a team of scientists.
sites using content scripts.
5 RELATED WORK
There are several well-known search engines for scientists such
as Google Scholar, Microsoft Academic, Semantic Scholar, aMiner,
CiteSeerX, PubMed and the others. There are specialized social 6 CONCLUSION
networks like ResearchGate or Academia.edu also providing some Exploratory search is a complex task challenging search systems in
search possibilities. At last many digital libraries provide search many ways.
tools on their sites. But their support for exploratory search task is We developed SciNoon – exploratory search system for scientific
quite limited. groups providing unique combination of tools helping exploring
However there are several systems and approaches less known new domains.
but better suited for exploratory search (in general or for scientific To collect articles, we provide the user with three tools: nav-
articles). In this section we will describe some of them. igation on citation graph enabling possibility to do snowballing,
SearchTogether [8] was aimed at the task very similar to ours: integration with the Google Scholar search engine (with custom
small-group collaborative searching. This system was general pur- keywords generation) and recommendations, based on already col-
pose and was build on top of classic search engines integrating them lected articles.
together with instant messaging and recommendations, and provid- To help the user to select the most interesting articles we im-
ing a common workspace. The authors showed that searching with plemented interactive graphical interface visualizing the citation
SearchTogether is more efficient than searching without it. Unfor- graph of the collected articles. It supports several graph layouts and
tunately the project didn’t evolve and after some time with the rise different level of details. We also proposed the new radial layout
of social networks was claimed by the authors to be outdated [7]. for easier overview of a research.
Despite this fact we believe that its focus on awareness, division The user can create questionnaire specific for his or her research
of labor and persistence will be much more useful for professional and then fill it in for each collected article. Article nodes will be
users such as scientists and R&D specialists. colored depending on the given answers.
Interesting approach for organization of a scientist’s workspace All research team members could work simultaneously and in-
was proposed by Beel et al [3]. Their tool, Docear, was developed dependently and they will be notified of each other’s activity by
as "Microsoft Office for scientists". It contains several modules: dig- SciNoon’s chat bot.
ital library module providing access to research articles, research All this makes our system a helpful companion to existing full
module providing keyword search, PDF viewer for reading and text search academic search engines enabling possibility to do last-
4 http://janusgraph.org/ ing research by the several team members together.
IUI Workshops’19, March 20, 2019, Los Angeles, USA Nedumov, et al.
7 ACKNOWLEDGMENTS arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/asi.21309
[6] Patrice Lopez. 2009. GROBID: Combining Automatic Bibliographic Data Recogni-
Denis Turdakov, Nikita Astrakhantsev and Anna Loik for reading tion and Term Extraction for Scholarship Publications. In Research and Advanced
early versions of the paper. The reported study was partially funded Technology for Digital Libraries, Maristella Agosti, José Borbinha, Sarantos Kap-
idakis, Christos Papatheodorou, and Giannis Tsakonas (Eds.). Springer Berlin
by RFBR according to the research project 17-07-00978 A. Heidelberg, Berlin, Heidelberg, 473–474.
[7] Meredith Ringel Morris. 2013. Collaborative Search Revisited. In Proceedings of
REFERENCES the 2013 Conference on Computer Supported Cooperative Work (CSCW ’13). ACM,
New York, NY, USA, 1181–1192. https://doi.org/10.1145/2441776.2441910
[1] Nikita Astrakhantsev. 2015. Methods and software for terminology extraction from
[8] Meredith Ringel Morris and Eric Horvitz. 2007. SearchTogether: An Interface for
domain-specific text collection. Ph.D. Dissertation. Ph. D. thesis, Institute for
Collaborative Web Search. In Proceedings of the 20th Annual ACM Symposium
System Programming of Russian Academy of Sciences.
on User Interface Software and Technology (UIST ’07). ACM, New York, NY, USA,
[2] Kumaripaba Athukorala, Eve Hoggan, Anu Lehtiö, Tuukka Ruotsalo, and Giulio
3–12. https://doi.org/10.1145/1294211.1294215
Jacucci. 2013. Information-seeking behaviors of computer scientists: Challenges
[9] Tuukka Ruotsalo, Giulio Jacucci, Petri Myllymäki, and Samuel Kaski. 2015. Inter-
for electronic literature search tools. Proceedings of the Association for Information
active intent modeling: Information discovery beyond search. Commun. ACM
Science and Technology 50, 1 (2013), 1–11.
58, 1 (2015), 86–92.
[3] Joeran Beel, Bela Gipp, Stefan Langer, and Marcel Genzmehr. 2011. Docear: An
[10] Tuukka Ruotsalo, Jaakko Peltonen, Manuel JA Eugster, Dorota Głowacka, Patrik
Academic Literature Suite for Searching, Organizing and Creating Academic
Floréen, Petri Myllymäki, Giulio Jacucci, and Samuel Kaski. 2018. Interactive
Literature. In Proceedings of the 11th Annual International ACM/IEEE Joint Con-
Intent Modeling for Exploratory Search. ACM Transactions on Information Systems
ference on Digital Libraries (JCDL ’11). ACM, New York, NY, USA, 465–466.
(TOIS) 36, 4 (2018), 44.
https://doi.org/10.1145/1998076.1998188
[11] Dominika Tkaczyk, Paweł Szostek, Mateusz Fedoryszak, Piotr Jan Dendek, and
[4] Chaomei Chen. 2006. CiteSpace II: Detecting and visualizing emerging trends and
Łukasz Bolikowski. 2015. CERMINE: automatic extraction of structured meta-
transient patterns in scientific literature. Journal of the Association for Information
data from scientific literature. International Journal on Document Analysis
Science and Technology 57, 3 (2006), 359–377.
and Recognition (IJDAR) 18, 4 (01 Dec 2015), 317–335. https://doi.org/10.1007/
[5] Chaomei Chen, Fidelia Ibekwe-SanJuan, and Jianhua Hou. 2010. The
s10032-015-0249-8
structure and dynamics of cocitation clusters: A multiple-perspective coc-
[12] Ryen W White and Resa A Roth. 2009. Exploratory search: Beyond the query-
itation analysis. Journal of the American Society for Information Science
response paradigm. Synthesis lectures on information concepts, retrieval, and
and Technology 61, 7 (2010), 1386–1409. https://doi.org/10.1002/asi.21309
services 1, 1 (2009), 1–98.