=Paper= {{Paper |id=Vol-2327/ESIDA3 |storemode=property |title=SciNoon: Exploratory Search System for Scientific Groups |pdfUrl=https://ceur-ws.org/Vol-2327/IUI19WS-ESIDA-3.pdf |volume=Vol-2327 |authors=Yaroslav Nedumov,Anton Babichev,Ivan Mashonsky,Natalia Semina |dblpUrl=https://dblp.org/rec/conf/iui/NedumovBMS19 }} ==SciNoon: Exploratory Search System for Scientific Groups== https://ceur-ws.org/Vol-2327/IUI19WS-ESIDA-3.pdf
         SciNoon: Exploratory Search System for Scientific Groups
                            Yaroslav Nedumov                                                                  Anton Babichev
                                  ISP RAS                                                                          ISP RAS
                              Moscow, Russia                                                                    Moscow, Russia
                        yaroslav.nedumov@ispras.ru                                                            babichev@ispras.ru

                              Ivan Mashonsky                                                                   Natalia Semina
                                   ISP RAS                                                                         ISP RAS
                                Moscow, Russia                                                                  Moscow, Russia
                              ivan2110@ispras.ru                                                               semina@ispras.ru

ABSTRACT                                                                              1    INTRODUCTION
Exploratory search task poses three challenges to search engines:                     Exploration of a new topic is important task for many people. Stu-
low specifity of the search goal, long duration of the search and                     dents and postgraduates have to learn state-of-the-art while work-
hard to consume search results. Exploratory searches are iterative,                   ing on their first research project. R&D department researchers
multi-tactical and better performed by groups.                                        need to understand available task definitions and solutions while
   We present the demonstration of the first prototype of the ex-                     solving a customer problem. Reviewers often have to review a paper
ploratory search system for scientific groups. Its main goal is to help               that doesn’t perfectly fit their main scope of work and so they need
with collection of scientific articles related to a scientific group’s                to refresh their understanding of the adjacent field of study.
current project. We tried to meet all exploratory search challenges                      Exploration of a new topic requires big time investments. In
with focus on support of team work.                                                   the beginning you often don’t fully understand the task and don’t
   SciNoon provides a shared workspace where articles could be                        know right keywords to use. You have to ask someone for help or
collected and annotated. The workspace could be visualized either                     use general purpose information sources like Wikipedia [2]. You
as an interactive graphical research map or as a table. The research                  cannot quickly understand results of a search engine and you have
map shows citation relations between articles and could be used                       to repeat search while improving your understanding of a domain.
for better understanding of the structure of the field. The search                       Such search tasks with open-ended, persistent, and multi-faceted
progress could be estimated using article coloring by values of                       problem context and opportunistic, iterative, and multi-tactical
their attributes. SciNoon also simplifies keyword search extracting                   search process are called exploratory search tasks [12]. Exploratory
possible keywords from already collected articles and integrating                     search tasks are hard and challenging for search engines which are
them with existing search engines by the browser plugin.                              mostly intended for lookup search. Lookup search is focused on
   Using SciNoon the members of a scientific group can search,                        high precision but exploratory search needs high recall. Lookup
collect, and process articles and get notifications about each other’s                search lasts seconds but exploratory search can take weeks. Results
progress by the chat bot.                                                             of lookup search are easy to consume but for exploratory search
                                                                                      you need time for estimating its relevance. All these is particularly
CCS CONCEPTS                                                                          actual for academic search domain.
                                                                                         There are specialized search engines for searching scientific
• Information systems → Collaborative search; Digital libraries
                                                                                      articles such as Google Scholar, Microsoft Academic or Semantic
and archives; • Human-centered computing → Computer sup-
                                                                                      Scholar. They have big databases and good text search engines
ported cooperative work.
                                                                                      but their support for exploration is quite limited. While query
                                                                                      formulation support is good enough, search results exploration
KEYWORDS                                                                              as well as team work and long lasting searches are supported pretty
Exploratory search, collaborative search, academic search engines                     bad.
                                                                                         In our demo system1 we try to augment existing systems and
ACM Reference Format:                                                                 provide a user with a shared workspace which may be used by a
Yaroslav Nedumov, Anton Babichev, Ivan Mashonsky, and Natalia Semina.                 team for collection and exploration of intermediate results.
2019. SciNoon: Exploratory Search System for Scientific Groups. In Joint
Proceedings of the ACM IUI 2019 Workshops, Los Angeles, USA, March 20, 2019           2    SCINOON
, 6 pages.
                                                                                      The main use case supported by the current SciNoon prototype is
                                                                                      the exploration of a new topic. According to the study [2] and our
                                                                                      own examination of existing academic search systems the one of
                                                                                      the most missing features is integrated support for collaboration.
IUI Workshops’19, March 20, 2019, Los Angeles, USA
Copyright ©2019 for the individual papers by the papers’ authors. Copying permitted
                                                                                      There are three key features for collaborative search: awareness,
for private and academic purposes. This volume is published and copyrighted by its    division of labor, and persistence [8]. We are providing users with
editors.
                                                                                      1 https://scinoon.com/research/esida-demo
IUI Workshops’19, March 20, 2019, Los Angeles, USA                                                                           Nedumov, et al.




Figure 1: SciNoon user interface elements. a1−3 – time-based orbits of radial layout (from the old ones to the new ones), b –
article’s node with navigational controls, c – layout managing dialog, d – clustering managing dialog, e – PDF upload drop
zone, f – visualization of a cluster, g – research questionnaire managing dialog.


the shared workspace (see figure 1) where articles could be collected    installed the user will be able to click "Add to research map" button
and processed and a set of corresponding tools. For maintaining          from a search results page and selected articles will be added to the
awareness we have implemented a chat bot which could be added            workspace.
into research group chat and then will report each team member              Google Scholar provides query suggestions and "Related searches"
activity.                                                                based on the current query. We augment this functionality by pro-
   Exploration of a new topic is the long-lasting iterative process      viding research-specific terms. We assume that collected articles,
including several activities: collection of potentially relevant arti-   not the current query, explain what a user wants to find. SciNoon ex-
cles, selection and reading of the most interesting ones, summation      tracts terms from the collected articles using ComboBasic algorithm
of read articles according to research-specific aspects.                 [1]. With SciNoon browser plugin extracted terms are integrated
   In the following subsections we will present tools supporting         directly into Google Scholar pages and the user can either use them
each of these activities implemented in SciNoon.                         for search alone or append to the current query.
                                                                            Alternative strategy for collecting articles is snowballing. Using
                                                                         data extracted from uploaded PDFs and cached data from Google
2.1    Collecting articles
                                                                         Scholar we maintain possibility to "expand" an article node adding
Collecting data about unknown domain is challenging because of           either citing articles or cited articles.
quite unspecific search goal. A user doesn’t know what to search for        The last tool for collecting articles is content-based recommen-
and needs help for starting. It is a typical situation for exploratory   dations. They require some amount of already collected articles.
search task and there are well known partial solutions: query sug-       There are four types currently implemented:
gestion, dynamic queries, recommendations.
   In the case of scientific research one probably already has either       (1) Most cited locally. Recommends articles which haven’t
a couple of relevant articles obtained from scientific adviser or some          been added yet but are highly cited by the articles from the
keywords to start search from.                                                  map. It could be hard to spot such articles manually but it is
   If the user already has several relevant articles he or she can              a trivial task for the system.
upload them to the workspace. SciNoon will parse them, extract              (2) Cutting edge. Aggregated "cited by" for all novel articles in
metadata and full texts and display them in the workspace. In the               the workspace. Could be used for finding novel research.
background it will extract keywords which may be used later.                (3) Old surveys. Recommends survey articles cited by the ar-
   If the user doesn’t have PDFs he or she probably will use any                ticles from the research map. Could be used as a general
search engine in order to find articles. SciNoon doesn’t maintain its           domain knowledge source.
own full text index of articles and instead integrates with Google          (4) New surveys. Recommends survey articles citing articles
Scholar. We implemented the plugin for Chromium-based browsers                  on the map.
(checked on Chromium and Opera) which is able to grab data au-              After the user has collected enough articles he or she will need
tomatically from Google Scholar pages for the user. With plugin          to further process them. The first task is selecting the article to
SciNoon: Exploratory Search System for Scientific Groups                                IUI Workshops’19, March 20, 2019, Los Angeles, USA


start from and here we provide the user with interactive graphical                but could be saved in order to save the other group members
interface called research map.                                                    time when the article looks relevant until full text reading.
                                                                              (3) Main contribution. Highlights most important for the re-
2.2    Selecting articles                                                         search contribution: survey, original method or experiments.
                                                                                  Most probably this question should be adapted to the field
The home page of research is called a research map view and dis-
                                                                                  of study.
plays already collected articles with citation links between them –
                                                                              (4) Readability. Ranges from 1 (hard to read) to 5 (easy to read).
subgraph of the citation graph.
                                                                                  Particularly useful for recommendation articles for students.
   There are three possible layouts: manual layout, force-based
                                                                              (5) Reproducibility. Subjective estimation of possibility to re-
layout and radial layout. Using manual layout the user can place
                                                                                  produce research, ranges from 1 (hard to reproduce) to 5
articles as he or she wishes. Force based layout takes into account
                                                                                  (easy to reproduce).
links between articles and moves connected articles close to each
                                                                              (6) Reliability. Subjective estimation of reliability of the article
other.
                                                                                  results.
   Our novel radial layout is similar to the time layout by Chen
                                                                              (7) Notes. Free text notes.
[4], but we used polar coordinates in order to better deal with the
larger amount of newer articles. So the oldest articles are placed into      The user is free to use or drop them and can add additional
the center and the newer ones are placed into concentric orbites          questions if needed.
depending on the publication year and citations. The position of             Users can collect, select and process articles iteratively, pop-
the articles inside the orbit are determined by citation links as in      ulating the common workspace. They can work simultaneously
the force layout, so a research field tends to form a sector.             and independently, but can maintain awareness of each other’s
   Each article is represented by a rounded rectangle with several        work using SciNoon’s Telegram chat-bot. SciNoon’s chat bot (@Sci-
text compartments depending on the zoom level. There are four             graphLoggerBot) allows subscription to various events from the
zoom levels and corresponding views:                                      research: adding new article, answering a question and so on. In
                                                                          the next section we provide examples how all this features could
   (1) 10000ft view could be used for understanding the general
                                                                          be used together.
       structure of the field and processing progress. Each article is
       represented as rounded corner square without text and with
       size depending on amount of citations. The whole research          3     USAGE SCENARIOS EXAMPLES
       graph could be seen at once.                                       In this section we are going to show how different SciNoon features
   (2) At 1000ft view (Figure 2) there is the single line text com-       could be used together in order to deal with two important tasks:
       partment with first author name and year of publication of         directing student’s work and writing a review.
       the article. Couple of dozens of articles could be seen at once.
   (3) At 100ft view there are compartments for full list of authors      3.1     Maintaining students work
       (up to 10), article title and controls for graph expansion and     Typical situation for exploratory search is giving a research task to
       search in Google Scholar. So only a couple of articles could       a student that is interesting both to the student and his scientific
       be seen at once.                                                   adviser. The field of study is completely new to the student and the
   (4) At 10ft view there is also the compartment for the article         scientific adviser is not so familiar with the given problem either.
       abstract. We can see mostly the single-article with only parts     Moreover he is pretty busy to dive into details and do the research
       of neighbour articles.                                             together with the student, so he gives him the basic understanding
   Each particular node at any zoom level could be manually ex-           of the task and two-three articles to start from. The adviser prepares
panded by double-clicking.                                                new research map in SciNoon, creates initial list of questions for
   Using the questionnaire described in the next section the user         questionnaire and setups group chat with the student and SciNoon
can color article’s node depending on the answer to the selected          chat bot for getting updates.
question and then easily find articles with particular answer on the         The student starts with studying the articles that his scientific
research map. Using coloring by “familiarity” question the whole          adviser gave him and adds them to the research map. While reading
research progress could be estimated.                                     articles he answers questions from the questionnaire. For example,
                                                                          "familiarity" question shows his progress in studying the articles,
2.3    Processing articles                                                "relevance" question represents articles relevance to the given task,
                                                                          "notes" – is where the student puts his thoughts and summary
Collection of articles assumes some further summation. In SciNoon         about research. The adviser is notified by chat bot about each such
we are providing the users with customizable questionnaire which          update and is able to correct his student when needed.
could be answered by each research group member for each article             As the student continues his research he needs some more ar-
in the research. By default there are seven questions:                    ticles. SciNoon helps him with search queries in Google Scholar
   (1) Familiarity. Assumes the following reading order: abstract,        by showing keywords extracted from his research map and per-
       conclusion, brief reading, full text reading, complete under-      sonal recommendations. It also provides him with forward citation
       standing of the whole article text.                                chaining either from SciNoon internal database or from Google
   (2) Relevance. Ranges from 1 (not relevant) to 5 (very relevant).      Scholar for recent advances in the field along with backwards cita-
       Unrelevant articles probably should be deleted from the map,       tion chaining for better understanding the field roots. For articles
IUI Workshops’19, March 20, 2019, Los Angeles, USA                                                       Nedumov, et al.




Figure 2: A fragment of research map at 1000ft view with one expanded article node and several unexpanded connected by
citation links. Articles are colored according to their relevance from yellow to blue. Radial layout is used.


with full text available backwards citation chaining is also available.
The other features for article collection also could be used.
   Since the adviser is notified about his student progress he can
easily correct student’s article selection mistakes. So, after working
together in such manner some time the team will have a list of
relevant articles and answers for questions important for their
research. As the last step all work could be exported to the csv file.

3.2      Writing review
The second frequent case where fast exploration of a new topic
could be needed is writing a critical review. In this case preliminary
familiarity with the topic is much higher, but anyway you need
to recall the exact topic of the article and freshen your knowledge
regarding recent advances in the field. SciNoon probably can help.
   A reviewer can add article’s PDF to the fresh research map and
then do one backwards citation chaining step using "Cites" button
on the added article. Following this by "Cutting edge" recommen-
dation all competing articles could be easily found.
   In the next section we will briefly explain SciNoon’s internals.

4     SYSTEM DESIGN
SciNoon is the client-server web-based application. The server part
has modular design and is based on Play framework2 and Akka3 .
The client part comprised in rich web application and browser
plugin. Client and server are communicating via HTTP API.

2 https://www.playframework.com/
3 https://akka.io/                                                        Figure 3: Architecture of SciNoon
SciNoon: Exploratory Search System for Scientific Groups                               IUI Workshops’19, March 20, 2019, Los Angeles, USA


4.1      SciNoon server                                                   annotating articles, mind mapping module for managing all infor-
Play framework assumes MVC architecture. Since we use rich client         mation, word processing module for writing articles and reference
the view component is reduced and consists of trivial HTML tem-           manager for managing bibliography. Docear is an amazing demo,
plate and JavaScript code of the client building the main part of         but it is not built for search and this is single-user application miss-
HTML in browser.                                                          ing any collaboration features which makes it difficult to use it in
   We use graph-based datamodel so all data is represented either         research groups.
as nodes or as edges. Main types of nodes are articles, scientists,          In the works [9, 10] the authors describe the IntentRadar search
answers and researches. Nodes have properties depending on their          system specifically designed for exploratory search for scientific ar-
type. For example article node has title, scientist node has first name   ticles. The main idea is to model user’s search intents by keywords
and so on. There are several types of edges: cites directed edge starts   and interactively evolve them getting relevance information from
from citing article node and ends in cited article node, author edge      the specially designed user interface. The authors proves efficiency
connects article with its author and so on. We use JanusGraph4            of the proposed technique in the series of experiments. The devel-
graph database with Cassandra backend for persistence.                    oped interactive user interface covers both query formulation and
   HTTP API is divided into several controllers (look at Figure 3)        search results exploration tasks and so could be very helpful for the
doing data conversion and passing data for processing into Akka           exploration of a new topic. However, as well as Docear, IntentRadar
actor system for asynchronous processing.                                 is a single-user application.
   We use external systems for extracting metadata and bibliogra-            There is some research regarding exploration of particular fields
phy from PDFs: CERMINE [11] and GROBID [6].                               of study: science mapping. One of the most known science mapping
                                                                          tool is CiteSpace [5]. This is standalone Java application implement-
4.2      SciNoon client                                                   ing methods for co-citation analysis enabling the users with the
                                                                          possibility to reveal the structure of the field and emerging trends.
SciNoon’s client part is written in Typescript language which is
                                                                          CiteSpace doesn’t maintain its own database of articles and should
translated into JavaScript. We use d3js and Bootstrap frameworks.
                                                                          be provided with data exported from Web of Science. This compli-
There are several interconnected modules for getting data from the
                                                                          cates usage of CiteSpace for exploratory search but we are going
server, drawing research map, processing user’s input and so on.
                                                                          to implement some methods of co-citation analysis in the future
   SciNoon’s browser plugin is written in JavaScript using Web
                                                                          versions of SciNoon in order to make them available for interactive
Extensions API. It integrates with SciNoon and Google Scholar
                                                                          use by a team of scientists.
sites using content scripts.

5     RELATED WORK
There are several well-known search engines for scientists such
as Google Scholar, Microsoft Academic, Semantic Scholar, aMiner,
CiteSeerX, PubMed and the others. There are specialized social            6   CONCLUSION
networks like ResearchGate or Academia.edu also providing some            Exploratory search is a complex task challenging search systems in
search possibilities. At last many digital libraries provide search       many ways.
tools on their sites. But their support for exploratory search task is       We developed SciNoon – exploratory search system for scientific
quite limited.                                                            groups providing unique combination of tools helping exploring
   However there are several systems and approaches less known            new domains.
but better suited for exploratory search (in general or for scientific       To collect articles, we provide the user with three tools: nav-
articles). In this section we will describe some of them.                 igation on citation graph enabling possibility to do snowballing,
   SearchTogether [8] was aimed at the task very similar to ours:         integration with the Google Scholar search engine (with custom
small-group collaborative searching. This system was general pur-         keywords generation) and recommendations, based on already col-
pose and was build on top of classic search engines integrating them      lected articles.
together with instant messaging and recommendations, and provid-             To help the user to select the most interesting articles we im-
ing a common workspace. The authors showed that searching with            plemented interactive graphical interface visualizing the citation
SearchTogether is more efficient than searching without it. Unfor-        graph of the collected articles. It supports several graph layouts and
tunately the project didn’t evolve and after some time with the rise      different level of details. We also proposed the new radial layout
of social networks was claimed by the authors to be outdated [7].         for easier overview of a research.
Despite this fact we believe that its focus on awareness, division           The user can create questionnaire specific for his or her research
of labor and persistence will be much more useful for professional        and then fill it in for each collected article. Article nodes will be
users such as scientists and R&D specialists.                             colored depending on the given answers.
   Interesting approach for organization of a scientist’s workspace          All research team members could work simultaneously and in-
was proposed by Beel et al [3]. Their tool, Docear, was developed         dependently and they will be notified of each other’s activity by
as "Microsoft Office for scientists". It contains several modules: dig-   SciNoon’s chat bot.
ital library module providing access to research articles, research          All this makes our system a helpful companion to existing full
module providing keyword search, PDF viewer for reading and               text search academic search engines enabling possibility to do last-
4 http://janusgraph.org/                                                  ing research by the several team members together.
IUI Workshops’19, March 20, 2019, Los Angeles, USA                                                                                                             Nedumov, et al.


7    ACKNOWLEDGMENTS                                                                               arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/asi.21309
                                                                                               [6] Patrice Lopez. 2009. GROBID: Combining Automatic Bibliographic Data Recogni-
Denis Turdakov, Nikita Astrakhantsev and Anna Loik for reading                                     tion and Term Extraction for Scholarship Publications. In Research and Advanced
early versions of the paper. The reported study was partially funded                               Technology for Digital Libraries, Maristella Agosti, José Borbinha, Sarantos Kap-
                                                                                                   idakis, Christos Papatheodorou, and Giannis Tsakonas (Eds.). Springer Berlin
by RFBR according to the research project 17-07-00978 A.                                           Heidelberg, Berlin, Heidelberg, 473–474.
                                                                                               [7] Meredith Ringel Morris. 2013. Collaborative Search Revisited. In Proceedings of
REFERENCES                                                                                         the 2013 Conference on Computer Supported Cooperative Work (CSCW ’13). ACM,
                                                                                                   New York, NY, USA, 1181–1192. https://doi.org/10.1145/2441776.2441910
[1] Nikita Astrakhantsev. 2015. Methods and software for terminology extraction from
                                                                                               [8] Meredith Ringel Morris and Eric Horvitz. 2007. SearchTogether: An Interface for
    domain-specific text collection. Ph.D. Dissertation. Ph. D. thesis, Institute for
                                                                                                   Collaborative Web Search. In Proceedings of the 20th Annual ACM Symposium
    System Programming of Russian Academy of Sciences.
                                                                                                   on User Interface Software and Technology (UIST ’07). ACM, New York, NY, USA,
[2] Kumaripaba Athukorala, Eve Hoggan, Anu Lehtiö, Tuukka Ruotsalo, and Giulio
                                                                                                   3–12. https://doi.org/10.1145/1294211.1294215
    Jacucci. 2013. Information-seeking behaviors of computer scientists: Challenges
                                                                                               [9] Tuukka Ruotsalo, Giulio Jacucci, Petri Myllymäki, and Samuel Kaski. 2015. Inter-
    for electronic literature search tools. Proceedings of the Association for Information
                                                                                                   active intent modeling: Information discovery beyond search. Commun. ACM
    Science and Technology 50, 1 (2013), 1–11.
                                                                                                   58, 1 (2015), 86–92.
[3] Joeran Beel, Bela Gipp, Stefan Langer, and Marcel Genzmehr. 2011. Docear: An
                                                                                              [10] Tuukka Ruotsalo, Jaakko Peltonen, Manuel JA Eugster, Dorota Głowacka, Patrik
    Academic Literature Suite for Searching, Organizing and Creating Academic
                                                                                                   Floréen, Petri Myllymäki, Giulio Jacucci, and Samuel Kaski. 2018. Interactive
    Literature. In Proceedings of the 11th Annual International ACM/IEEE Joint Con-
                                                                                                   Intent Modeling for Exploratory Search. ACM Transactions on Information Systems
    ference on Digital Libraries (JCDL ’11). ACM, New York, NY, USA, 465–466.
                                                                                                   (TOIS) 36, 4 (2018), 44.
    https://doi.org/10.1145/1998076.1998188
                                                                                              [11] Dominika Tkaczyk, Paweł Szostek, Mateusz Fedoryszak, Piotr Jan Dendek, and
[4] Chaomei Chen. 2006. CiteSpace II: Detecting and visualizing emerging trends and
                                                                                                   Łukasz Bolikowski. 2015. CERMINE: automatic extraction of structured meta-
    transient patterns in scientific literature. Journal of the Association for Information
                                                                                                   data from scientific literature. International Journal on Document Analysis
    Science and Technology 57, 3 (2006), 359–377.
                                                                                                   and Recognition (IJDAR) 18, 4 (01 Dec 2015), 317–335. https://doi.org/10.1007/
[5] Chaomei Chen, Fidelia Ibekwe-SanJuan, and Jianhua Hou. 2010.                       The
                                                                                                   s10032-015-0249-8
    structure and dynamics of cocitation clusters: A multiple-perspective coc-
                                                                                              [12] Ryen W White and Resa A Roth. 2009. Exploratory search: Beyond the query-
    itation analysis. Journal of the American Society for Information Science
                                                                                                   response paradigm. Synthesis lectures on information concepts, retrieval, and
    and Technology 61, 7 (2010), 1386–1409.            https://doi.org/10.1002/asi.21309
                                                                                                   services 1, 1 (2009), 1–98.