=Paper=
{{Paper
|id=Vol-2950/paper-04
|storemode=property
|title=Towards Supporting Complex Retrieval Tasks Through Graph-Based Information Retrieval and Visual Analytics
|pdfUrl=https://ceur-ws.org/Vol-2950/paper-04.pdf
|volume=Vol-2950
|authors=Aleksandar Bobic,Jean-Marie Le Goff,Christian Gütl
|dblpUrl=https://dblp.org/rec/conf/desires/BobicGG21
}}
==Towards Supporting Complex Retrieval Tasks Through Graph-Based Information Retrieval and Visual Analytics==
<pdf width="1500px">https://ceur-ws.org/Vol-2950/paper-04.pdf</pdf>
<pre>
Towards Supporting Complex Retrieval Tasks Through
Graph-Based Information Retrieval and Visual Analytics
Aleksandar Bobic1,2 , Jean-Marie Le Goff1 and Christian Gütl2
1
    CERN, Espl. des Particules 1, Meyrin, 1211, Switzerland
2
    Graz University of Technology, Rechbauerstraße 12, Graz, 8010, Austria


                                             Abstract
                                             The retrieval result analysis approaches of existing retrieval solutions tend to be either too simple, provide too few features
                                             for exploring retrieval results or are very narrowly focused. We present an enhanced approach that attempts to address
                                             these issues and help the wider community to get more insight from their retrieved data. To this end, this paper presents an
                                             enhanced graph-based retrieval prototype built on the Collaboration Spotting platform. It combines information retrieval
                                             and visual analytics concepts to provide an advanced solution for data retrieval and exploration. It enables users to retrieve
                                             information, explore it from different perspectives using a graph representation and perform further searches based on their
                                             navigation and selection interactively. Compared to traditional retrieval solutions, a search action in CS can reveal more
                                             detailed aspects/techniques when visually analysing the search output. To gain initial feedback, we interviewed five domain
                                             experts in related fields. Findings reveal that the developed retrieval approach provides users with helpful ways of exploring
                                             search results and provides mechanisms of connecting features that are not explicitly linked otherwise. Furthermore, several
                                             research directions and improvements have been identified for future work, which should be addressed.

                                             Keywords
                                             information retrieval, visual analytics, knowledge discovery, visualization system


1. Introduction                                                                                                       port analysing correlations between papers. This ordered
                                                                                                                      list format does not help users to extract complex relation-
With the recent digitalisation efforts and steadily growing                                                           ships and gain deeper insights from large retrieval results
data piles, the amount of generated information rapidly                                                               [6, 7, 1]. In the context of bibliometric data, examples of
increased over a short period. This increase in data quan-                                                            data retrieval insights might include identifying author
tity made the need for efficient retrieval and visual analyt-                                                         collaboration networks, identifying trending research ar-
ics tools apparent. This need is also reflected in multiple                                                           eas in recent years, and discovering common concepts
works which identified the necessity for IR applications                                                              shared among fields. User-centred interactive analysis
that would enable users to carry out complex retrieval                                                                of bibliometric data can lead to better insights, novel
tasks, visualise hidden connections by leveraging interac-                                                            research projects, and more informed decision-making
tion and visualisation and extract implicit insights from                                                             [8, 9].
retrieved data automatically [1, 2]. Examples of such com-                                                               A variety of visual analytics (VA) tools and visu-
plex retrieval tasks could include retrieval of institution                                                           alisation approaches were created as a result of the
collaborating in a specific field, identification of author                                                           above-outlined needs for supporting bibliometric data
collaboration networks, retrieval of upcoming research                                                                exploration, and analysis workflows by different interest
topics connected to existing topics and more.                                                                         groups [10, 8, 6]. A straightforward and broad division
   As one example, the need for the above-mentioned                                                                   can be made between solutions created for bibliomet-
features to analyse data and grasp connections is also                                                                ric mapping and general-purpose VA tools [6]. Both
present in bibliometric data. For this application scenario,                                                          groups leverage multiple visualisation techniques to pro-
data are traditionally gathered, indexed and made acces-                                                              vide users with an insightful exploration process and
sible by services such as Google Scholar [3], Microsoft                                                               reveal hidden connections which can not be easily in-
Academic [4] and ArXiv [5] which present search results                                                               ferred from an ordered list of retrieval results. A common
as an ordered list based on assumed relevance and do not                                                              approach to representing large connected datasets is dis-
offer advanced analytics approaches which would sup-                                                                  playing and analysing them as a connected graph. The
                                                                                                                      potential of the graph representation has been apparent
DESIRES 2021 – 2nd International Conference on Design of                                                              to researchers and tool creators for quite some time [11].
Experimental Search Information REtrieval Systems, September
15–18, 2021, Padua, Italy
                                                                                                                         Another example of a graph-based representation is
" aleksandar.bobic@cern.ch (A. Bobic);                                                                                Collaboration Spotting (CS). It is a graph-based visual
jean-marie.le.goff@cern.ch (J. L. Goff); c.guetl@tugraz.at (C. Gütl)                                                  analytics (VA) platform created to address the limitations
 0000-0001-5403-8475 (A. Bobic); 0000-0001-9589-1966 (C. Gütl)                                                       of existing graph-based exploration tools such as limited
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative
                                       Commons License Attribution 4.0 International (CC BY 4.0).                     leveraging of interactivity and network visualisations,
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
and visualisation of explicit and implicit connections         researchers and the broader community. Even though
between features [12]. It enables users to explore sizeable    various natural language processing (NLP) and IR ap-
connected datasets by navigating through or changing           proaches can be applied to bibliometric data, they might
perspectives1 and contexts2 .                                  not produce insightful results [10]. Therefore, multi-
   To enable users to execute complex retrieval tasks and      ple visualisation techniques, VA tools and bibliometric-
gain further insight into their retrieval results, and based   oriented solutions were created to provide better insights
on existing work, we develop an enhanced CS-based re-          into the increasing amount of bibliometric data.
trieval system as a prototype. However, as an example             A commonly used graph-based visual analysis tool
and due to large amounts of available data, we focus on        with a broad application range, including analysis of
bibliometric data. As our main contribution, we integrate      bibliometric data, is Gephi [11]. An example of a more
an enhanced retrieval mechanism in the CS platform’s           narrowly focused bibliometric data tool is Galex which
main version. We combine graph-based VA and informa-           represents disciplines, areas and institutions as an inter-
tion retrieval (IR) by introducing an enhanced IR system       active galaxy [22]. BiblioViz focuses specifically on table
that retrieves data from a search provider and leverages       and graph visualisation to enable users to investigate bib-
an interactive graph representation to display the search      liometric data from various perspectives [23]. Another
results. It also provides a mechanism for further search       tool for analysing publication data is VISPubComPAS
refinement through simple graph interactions. Further-         which focuses on the analysis of institutions and authors
more, to identify the needs of experts, understand how         [24]. Additionally, a solution for exploring university bib-
to develop a system supporting users at multiple steps of      liometric data for driving strategic decisions is presented
their retrieval tasks and potentially expanding the sys-       by [9].
tem for broader use, we interview five experts with a             As a result of this research area’s growing popularity,
semi-structured approach.                                      multiple surveys were created covering different aspects
   This paper is structured in the following manner: Sec-      and solutions. [10] provides an overview of interactive
tion 2 introduces briefly related concepts and related         VA approaches for patent and publication data. Next, [8]
work. Section 3 describes the requirements, architec-          report on approaches for extracting and visualising bib-
ture, technical details and the user interface (UI) of the     liometric data. Finally, [6] identify multiple solutions and
retrieval system. Section 4 describes three sample case        two common workflows for processing and visualising
studies with real-world data and presents how the re-          publication data. These surveys indicate the potential
trieval solution in CS could be used to gain further insight   of reviewed approaches but also identify multiple open
into bibliometric data. Additionally, it also describes the    challenges, such as lack of applications leveraging user
feedback gathered from experts and discusses potential         interaction for analysis, lack of empirical research regard-
future research directions. The paper concludes with           ing the effectiveness of visualisation techniques and tools,
Section 5 where we discuss the current implementation          visualisation of relationships between different data fea-
and future work.                                               tures and more.


2. Related Work                                                2.2. Bibliometric-Oriented Search
                                                                    Systems
2.1. Visual Analysis Bibliometric                               Although some of the aforementioned search engines
     Approaches                                                 and repositories provide further insight into author in-
A variety of modern solutions such as search engines [3, fluence, relations between papers, and more, their main
4, 13], repositories [14, 15, 5] and services [16, 17] collect, focus is still related to representing content as an ordered
create and retrieve large amounts of bibliometric data list. This almost never-ending list of results ranked by
which can potentially provide new insights. This data assumed relevance does not provide a way of gaining
can be analysed using VA, which is a science that aims in-depth insights into data [6, 7, 1]. As identified by [2] IR
to provide explainable insight into large abstract data systems should enable the execution of elaborate retrieval
through interactive data visualisation [18]. It can be tasks, which might lead to more significant insights and
combined with IR approaches to provide a deeper insight drive decision making processes by leveraging visuali-
into retrieval results by visualising them and enabling sation methods to display connections in the retrieved
the use of advanced tools for their analysis [19, 20, 21]. data. Multiple approaches have been created to mitigate
   Bibliometric data analysis is usually demanding, te- the issues of traditional bibliometric search engines by
dious and time-consuming and can overwhelm novice combining VA with IR. An example that leverages the
                                                                above-mentioned connections is Rexplore, an analytics
    1
      Data features represented as the graph nodes.             tool that enables retrieval of research publication data
    2
      Data features represented as graph edges.                 via facets and sorting of results [19]. Another example
is PivotSlice, which focuses on searching and analysis of      3. Design and Implementation
retrieval results using a combination of filters and facets
[20].                                                          3.1. Prototype Requirements
   An example from industry is Connected Papers3 which
                                                             Based on the identified gaps and needs outlined in the
visualises retrieval results as connected graphs where pa-
                                                             previous sections, our goal is to build an enhanced graph-
pers are connected based on their similarity. Another
                                                             based retrieval and exploration prototype based on the
similar solution is Open Knowledge Maps which visu-
                                                             existing CS system. As an example application scenario,
alises retrieval results as a multi-level bubble chart where
                                                             we chose to use bibliometric data due to its vast accessibil-
papers are grouped based on text similarity [21]. Al-
                                                             ity. To this end, the retrieval system should provide suffi-
though there are many existing approaches and services,
                                                             cient flexibility to enable CS users to search via multiple
[7] identify the limits of these tools, focusing on provid-
                                                             queries and through a wide variety of data. Additionally,
ing search results in the form of individual papers or
                                                             the retrieval system should leverage users’ interactions
focusing on bibliometric analysis and provide a concep-
                                                             to provide an efficient retrieval and exploration workflow.
tual solution.
                                                             Furthermore, the system should enable the investigation
                                                             of implicit connections between the entities of a dataset
2.3. Collaboration Spotting                                  (e.g. Institution collaborations based on co-authorship).
The approaches above are, to the authors’ knowledge, Finally, the expanded CS system should be ready for
either not actively developed anymore, are not accessi- empirical analysis studies and gather interaction data.
ble, cannot be used on large scale data, or too simple to High-level requirements can be summarized as:
provide users with advanced analytics insights.                   1. Support integration of multiple datasets and
   As a possible alternative, CS is a graph-based VA plat-            search providers.
form that enables users to analyse large quantities of
                                                                  2. Support exploration of implicit and explicit entity
connected data through the use of filters, facets, and con-
                                                                      connections.
texts [12]. Unlike other approaches, it can be used to
                                                                  3. Collect user interaction data for empirical studies.
analyse a wide variety of datasets and enables users to
change the graph structure dynamically. A separate CSC            4. Visualise complex search results using various
version was developed to explore how to provide users                 visual cues.
with complete retrieval and analytics experience [25].            5. Enable exploration of search results using graph
However, this version did not enable users to manipulate              interactions.
their subsequent searches with a finer granularity (for           6.  Enable search query refinement through graph
example by combining their selected nodes that repre-                 interactions.
sent the search result features with Boolean operators)           7. Support complex search query creation.
since it relied on document embeddings and was never              8. Provide explainable report generation.
implemented in the primary CS version. Furthermore, it            9. Enable visual creation of retrieval queries and
did not explicitly combine graph interactions with the                filtering steps.
retrieval process.                                               10. Enable graph analysis approaches to gain further
   Based on insights and data analysis requirements, we               insight.
aim to incorporate a prototype IR system into the primary        11. Enable usage of graphs for knowledge retrieval.
CS platform to support users in performing complex re-
trieval tasks. As part of this process, we introduce a novel    As part of the initial prototype we focus on require-
way of performing searches by exploring intrinsic graph ments 1 to 6.
patterns and selecting graph nodes from combinations
of different features using the prototype. Furthermore,
                                                             3.2. Prototype Architecture
we add new connections to external services in CS and
an analytics integration to perform empirical evaluation To address the novel combination of graph interaction
studies. Finally, we discuss use cases in bibliometric data and retrieval concepts described in this work and based
analysis, describe possible approaches to analysing such on the above-listed requirements we enhanced the exist-
data with CS, report feedback from expert interviews and ing architecture seen in Fig. 1 with new components. The
discuss potential future research directions.                architecture is split into multiple conceptual components
                                                             for clarity. However, in reality, the Graph Calculation,
                                                             API Request Handlers and the Search are one module.
                                                             This architecture is set to change once the move from
                                                             a prototype to a production system is made. The Graph
    3
      https://www.connectedpapers.com/                       Vis. & Interaction (Fig. 1 a) component and the Menus &
Side-Panels (Fig. 1 b) component handle interactions such       Graph Calculation                               h
                                                                                                                                               f
                                                                                                                                                       Search

as selecting a search source, entering search queries, nav-                                                                            e


igating graphs and selecting graph elements for search            Graph
                                                                 Generator
                                                                                    Community
                                                                                     Detection
                                                                                                     Layout
                                                                                                   Generation
                                                                                                                       d
                                                                                                                                     Search
                                                                                                                                    Source &        Search
                                                                                                                                    Provider       Handlers
refinement. Once users start a new search, the Search                                                                 API
                                                                                                                                    Selector


Handler (Fig. 1 c) sends a request with their query and se-                                         Front-end
                                                                                                                    Request
                                                                                                                    Handlers
                                                                                                                                               g
lected dataset to the API Request Handler (Fig. 1 d), which           c                  a             b


forwards the information to the Search Source & Provider          Search            Graph Vis. &     Menus &        Analytics
                                                                                                                                i
                                                                                                                                                     Data
                                                                                                                                                    Search
                                                                  Handler           Interaction    Side-Panels       System                        Providers
Selector (Fig. 1 e) component. Here the request is parsed,
and the appropriate data search provider is selected based
on a project environment variable.
   The currently supported data search providers include      Figure 1: Simplified prototype architecture diagram. The
Elasticsearch4 , Whoosh5 and the ArXiv API. Users who         filled rectangles represent conceptual modules in which the
aim to perform an initial shallow exploration with a small    code is grouped. Light-grey rectangles represent previously
amount of data and no advanced pre-processing can use         existing modules; dark-grey rectangles represent previously
the ArXiv API or an API from another existing hosted          existing modules that were updated, while the blue rectan-
search provider. However, the introduction of new search      gles represent newly introduced modules. The arrows repre-
                                                              sent a simplified data flow between components. The dashed
providers would require implementing a new Python
                                                              rectangles represent groupings of conceptual modules.
search component that would communicate with the
search providers. On the other hand, users who aim to
get a deeper insight into their data and perform a more
thorough exploration can use an existing search provider
like Elasticsearch or Whoosh. Furthermore, the latter
search providers enable the use of time-demanding pre-
processing and pre-analytic steps outside of the expanded
CS system. For example, a user might wish to extract
named entities or add additional data features before
importing them into the system.
   The selected dataset, query and search operator are
sent to the Search Handler (Fig. 1 f) component, where the
search is executed using the previously selected search
provider (Fig. 1 g), and the results are transformed into a   Figure 2: Prototype search interface. Users start by writing
CS-specific format. Results represent a network of data       queries in the search field (j). They then select a search source
out of which a graph corresponding to users selection         and the binding operator for their queries (k). Next, they can
is built using the Graph Calculation (Fig. 1 h) module,       inspect their search terms and delete terms from the search
                                                              term list (l). Finally, they run their search by pressing the
which retrieves the graph id from the newly generated
                                                              search button.
graph. The id is then sent back to the Front-end to re-
trieve the newly generated graph. The users’ retrieval
and exploration process is enhanced by features such as
navigation through the result graph and multiple itera-      using the right drop-down. Additionally, they can also re-
tions on their search. The interactions users perform on     move the queries from the query list (Fig. 2 l) by hovering
the Front-end are tracked using Matomo6 as part of the       over them and clicking the "x" button. They access the re-
Analytics System (Fig. 1 i) component for user behaviour     sults by expanding the corresponding node into a graph,
and engagement analysis.                                     selecting the graph parameters of their choice from the
                                                             menu, and navigating through the network. Furthermore,
                                                             users can select one or multiple nodes, communities or
3.3. User Interface                                          connected components to perform another search. Once
To retrieve information on an initial dataset, users per- they selected the relevant nodes, they can open the search
form searches by opening the search modal seen in Fig. 2. modal, whose search box is populated with the labels of
Once they enter their queries, they can select one of the the selected nodes as keyphrases. If they select a commu-
available data sources visible in the left drop-down (Fig. 2 nity or a connected component, only the most significant
k) and select a binding Boolean operator for the queries (based on size) three nodes will be retained for the search.
                                                             Furthermore, users can navigate through the data net-
                                                             work by selecting multiple perspectives and contexts and
    4
      https://www.elastic.co/elasticsearch                   exploring either explicit connections in the dataset or
    5
      https://whoosh.readthedocs.io/en/latest/index.html
    6
      https://matomo.org/
                                                             identifying new implicit connections.
3.4. Data Preparation                                             Table 1
                                                                  Frequency of Special Characters
The dataset should be appropriately pre-processed to
leverage the prototype’s features effectively. The ex-             What is your occupation, and what are your daily tasks?
ample dataset is retrieved from the Journal of Univer-             Where do you see the strengths of CS?
sal Computer Science (J.UCS) [26] since the authors                Where do you see the weaknesses of CS?
had full access to it’s detailed metadata. The data in-            What could be improved in CS?
cludes the doi, title, abstract, authors, affiliations, author     Did you identify any other uses-cases for the system?
keyphrases and publication categories. Since the author-
defined keyphrases might be biased and reflect only
on a subset of the paper content, we extract additional           location. They then explore institutions that are con-
keyphrases using the keyphrase extraction tool YAKE!              nected if their representatives wrote a joint paper. Using
[27] to provide an alternative view on the paper con-             this view, the executive can identify institutions where
tent. Additionally, we split the publication categories           they might know someone and establish a collaboration.
into level 1, 2 and 3 to provide users with the possibility
of exploring categorical data through graph navigation.           4.1.3. Introduction to a New Topic
We extend this data with data from Scopus7 by extracting
                                                                  A novice researcher in software engineering explores
the affiliation name, affiliation city and affiliation country.
                                                                  the J.UCS categories using the prototype system. They
Finally, we convert the data into a format that the CS
                                                                  further explore the author keyphrase of papers in the
platform can process. Once a graph is generated from the
                                                                  software engineering category to identify points of inter-
data, users can explore which authors and institutions
                                                                  est relevant to their research. They notice that software
collaborate, identify authors’ focus categories, and more.
                                                                  engineering is connected to formal methods and decide
                                                                  to investigate both topics’ authors. Only a few authors
4. Case Studies and Evaluation                                    published in J.USC about these topics, so they return to
                                                                  the previous author keyphrase graph and search for the
The focus of the case studies is on bibliometric data from        same topics using the ArXiv API. The search results rep-
the J.UCS journal as described above.                             resent a more diverse set of documents that can be used
                                                                  to identify prominent authors in the field of interest by
4.1. Case Studies                                                 observing the node sizes.

4.1.1. Potential Reviewers
                                                                  4.2. Expert Evaluation
A journal editor would like to identify potential reviewers
                                                                  4.2.1. Study Environment
for an IR and NLP paper. Using the prototype system,
they search for "IR" and "NLP" on the J.UCS dataset. They        To identify further potential users’ needs, we organised
first filter out the resulting journal categories that do not    individual interviews with five experts from different do-
fall into one of the two above mentioned topics. Next,           mains who could benefit from using CS. The interviews
they navigate to a new key phrase graph where they               were semi-structured to gain quick feedback that will
select phrases closely related to NLP or IR and navigate         guide further research and development efforts and po-
to the author view. The authors are connected if they            tentially enable the discovery of additional edge cases
have joint publications. The editor can now identify             that the authors might not have identified yet. Further-
potential candidates who are likely knowledgeable in             more, we aimed to identify how to implement future
the fields mentioned above and avoid authors who have            versions of CS in particular in a way which will enable
previously published papers with the submission author.          users to perform complex retrieval and analysis tasks,
                                                                 support users at multiple steps of the retrieval process
4.1.2. Identification of Potential Collaborators                 and gain potential users’ view for shaping future system
                                                                 features. As part of the interview, which was held as
A company executive searches for online education us- an online meeting, we presented the enhanced CS sys-
ing the prototype system to identify potential collabora- tem, discussed the three use cases mentioned earlier and
tors in online education. They explore the results from demonstrated how users could use CS for the first use
keyphrases’ perspective to identify relevant phrases and case using a dataset from J.UCS as an example through
use them to perform a search. Next, they explore and fil- screen sharing. Finally, the experts were asked the five
ter out countries that are not easily accessible from their questions depicted in Table 1. During the interview, they
                                                                 could ask to view specific sections of CS again and asked
    7
      The data was downloaded from Scopus in winter of 2020-2021 further questions about how the system works.
using the Python library Pybliometrics [28]
4.2.2. Study Participants                                                more, similarly to what was concluded in [10] experts
                                                                         suggested the use of other data types such as source code
The first participant was a librarian with more than 30
                                                                         and multimedia attached to scientific work. Menus could
years of experience who also had experience in database
                                                                         be improved by including wording which calls for action9
usage and is leading the library services for the last 11
                                                                         and is understandable for the general public. Addition-
years. The next participant was a computer scientist
                                                                         ally, it was proposed that they should take up less space.
and doctoral student focusing on learning environments
                                                                         To simplify the graph search and exploration, the sys-
and learning analytics. The third participant was a post-
                                                                         tem should support natural language queries that can
doctoral researcher focusing on computer science and
                                                                         be automatically translated into search and exploration
psychology who participated in research projects focus-
                                                                         actions. The UI could be additionally improved by pro-
ing on VA, UI design, mitigation of cognitive biases and
                                                                         viding an onboarding tutorial with short introductory
more. The fourth participant was a senior data scientist
                                                                         examples, introducing an advanced UI mode with the
who analyses literature based on clients’ requirements
                                                                         complete set of features and a simple UI mode that can
and implements machine learning algorithms for various
                                                                         be used to navigate through predefined templates and
datasets based on this analysis. The final participant was
                                                                         presenting a traditional list view of results alongside the
a Knowledge Transfer Officer, who, among other things,
                                                                         graph view. The accommodation of novice users was
focuses on patent and research paper exploration and re-
                                                                         recognised as a critical feature also by [6] who suggested
trieval. All participants were previously vaguely familiar
                                                                         that the amount of data shown should be adjustable in or-
with the project but did not know how it works or the
                                                                         der not to overwhelm novice users. Furthermore, it was
details of how it can be used and what are its features.
                                                                         mentioned that creating reports based on the performed
                                                                         actions and enabling easy graph export with the search
4.2.3. Study Results                                                     and navigation history and the option to customise the
A commonly identified strength of the prototype com-                     background colour to better fit in professional reports
pared to traditional web search systems is that users can                would be beneficial.
explore results efficiently and avoid fine-tuning precision                 We also identified additional use cases such as creat-
and recall through keyphrases by navigating through                      ing yearly reports about larger institutions’ publications,
the graph. Additional strengths include the ability to                   code analysis evaluation where concepts used and bugs
identify relations between fields and authors, the visual                encountered by each user could be visualised, and analy-
feedback provided through the node sizes, the ability                    sis of personal email corpora. A use case that two experts
to make sense of information that would be difficult to                  mentioned is the visualisation and exploration of em-
analyse with simpler representations and the ability to                  ployee skills and project participation inside companies.
explore implicit connections. An expert also mentioned                      In conclusion, the combination of IR and VA helps
that "Navigation is the door to serendipity". In the context             facilitate user exploration through graph navigation and
of the prototype system, navigation is well supported by                 helps avoid fine-tuning keyphrases for relevant results.
enabling different perspectives and contexts.
   We also identified much room for improvements. Sug-                   4.3. Future Research Directions
gestions include visualising other data relationships such
as the impact of papers on different fields, using a wider               Based on the expert feedback, literature survey, initial
variety of visual cues to display new dimensions and                     requirements and our own experience, we identified sev-
avoid node label overlap. The need to handle visualisa-                  eral future research directions. Some of the identified
tions of multidimensional datasets was also identified                   directions are listed below.
by [10, 8]. Moreover, data should also be presented with
traditional charts to give the user a familiar overview of               IR aspects include the use of retrieved graphs not
the data. Furthermore, more quantitative details about                   only for gaining analytical insights but also for advanced
the retrieved data and more insightful details such as the               knowledge retrieval for example by exploiting graph pat-
largest clusters and what they include were among the                    terns for further retrieval processes. Furthermore, we
suggestions. Experts also proposed exploring ways of                     need to identify how to support user groups to perform
integrating financial data and general impact data8 to                   multi-user retrieval and analysis tasks together. Another
increase the added value of data exploration. A similar                  broad question identified by [1] is how to support users
conclusion was reached by [6] who suggest using social                   in complex retrieval tasks.
media for the expansion of scientific datasets. Further-
                                                                         Graph analysis aspects may include content summa-
    8
      For example, if a solution is mentioned in news articles with-     rization of larger graph clusters, entity generation from
out an explicit citation it should still count as a mention which con-
                                                                             9
tributes to the general impact of a work.                                        For example "Select by:"
graph patterns and identification of improved clustering      Acknowledgments
and layout techniques which might be more appropriate
for the dynamic nature of the graphs in this work.            We want to thank the five interviewed experts for their
                                                              time and for contributing valuable feedback. We would
Machine learning aspects contain an exploration of            also like to thank André Rattinger for scraping and sup-
conversational IR approaches to enhance users analytical      plying the primary J.UCS dataset.
abilities of result graphs as well as generate user models
based on user interactions which could aid users in the       References
retrieval process [1, 2].
                                                               [1] Z. Chen, X. Cheng, S. Dong, Z. Dou, J. Guo,
Engineering aspects of future work include im-                     X. Huang, Y. Lan, C. Li, R. Li, T.-Y. Liu, et al., Infor-
proved connection generation and system refactoring.               mation retrieval: a view from the chinese ir com-
The current system is not scalable and should be rewrit-           munity, Frontiers of Computer Science 15 (2021)
ten in modern technologies with modularity in mind.                1–15. doi:10.1007/s11704-020-9159-0.
Furthermore, the connection calculation process should         [2] J. S. Culpepper, F. Diaz, M. D. Smucker, Research
be refactored to avoid implying connections between                frontiers in information retrieval: Report from
points that might not directly connect in the retrieved            the third strategic workshop on information re-
dataset.                                                           trieval in lorne (swirl 2018), SIGIR Forum 52
                                                                   (2018) 34–90. URL: https://doi.org/10.1145/3274784.
Evaluation aspects which represent the final key as-               3274788. doi:10.1145/3274784.3274788.
pect and are a prevalent issue in VA systems are con-          [3] P. Jacsó, Google scholar: the pros and the cons, On-
cerned with efficient quantitative evaluation, which will          line information review 29 (2005) 208–214. doi:10.
provide a clearer picture about the usefulness of the sys-         1108/14684520510598066.
tem [1].                                                       [4] A. Sinha, Z. Shen, Y. Song, H. Ma, D. Eide, B.-J.
                                                                   Hsu, K. Wang, An overview of microsoft academic
                                                                   service (mas) and applications, in: Proceedings of
5. Conclusion and Future Work                                      the 24th international conference on world wide
                                                                   web, 2015, pp. 243–246. doi:10.1145/2740908.
This paper describes a graph-based visual analytics and            2742839.
IR prototype that enables the search and exploration of        [5] P. Ginsparg, Arxiv at 20, Nature 476 (2011) 145–147.
data through a combination of IR and VA approaches.                doi:10.1038/476145a.
The solution is built as an enhancement to the CS sys-         [6] M. E. Bales, D. N. Wright, P. R. Oxley, T. R. Wheeler,
tem. As part of the IR process, users perform a traditional        Bibliometric visualization and analysis software:
search whose results are then presented as an interactive          State of the art, workflows, and best practices
graph that can be explored or used to perform multiple             (2020).
additional searches. To investigate how the introduced         [7] J. P. Bascur, N. J. van Eck, L. Waltman, An interac-
solution could help users in their retrieval process, iden-        tive visual tool for scientific literature search: Pro-
tify users needs and ideas for future system development,          posal and algorithmic specification., in: BIR@ ECIR,
we held interviews with five experts. Their answers indi-          2019, pp. 76–87.
cate that the prototype does provide a helpful workflow        [8] J. Liu, T. Tang, W. Wang, B. Xu, X. Kong, F. Xia, A
for analysing data but that there is also room for improve-        survey of scholarly data visualization, Ieee Access 6
ment. Among the areas of improvement, we identified                (2018) 19205–19221. doi:10.1109/ACCESS.2018.
enrichment of the dataset using data from other domains,           2815030.
UI simplifications, the introduction of new interaction        [9] P. Rosenthal, N. H. Müller, F. Bolte, Visual an-
approaches and displaying the search result data in tradi-         alytics of bibliographical data for strategic deci-
tional and graph form. Furthermore, visualisations could           sion support of university leaders: A design study.,
be enhanced by additional visual cues. We also discuss             in: VISIGRAPP (3: IVAPP), 2019, pp. 297–305.
future research directions that would be beneficial for the        doi:10.5220/0007396302970305.
proposed system. We plan to improve and refactor the          [10] P. Federico, F. Heimerl, S. Koch, S. Miksch, A sur-
system and conduct an empirical study to gain further              vey on visual approaches for analyzing scientific
insight into how this approach can help support users in           literature and patents, IEEE transactions on visual-
their retrieval process.                                           ization and computer graphics 23 (2017) 2179–2198.
                                                                   doi:10.1109/TVCG.2016.2610422.
                                                              [11] M. Bastian, S. Heymann, M. Jacomy, Gephi: an open
     source software for exploring and manipulating net-           Vispubcompas: a comparative analytical system
     works, in: Proceedings of the International AAAI              for visualization publication data, Journal of
     Conference on Web and Social Media, volume 3,                 Visualization 22 (2019) 941–953. doi:10.1007/
     2009. doi:10.13140/2.1.1341.1520.                             s12650-019-00585-2.
[12] A. Agocs, D. Dardanis, R. Forster, J.-M. Le Goff,        [25] A. Rattinger, J.-M. Le Goff, C. Guetl, Collaboration
     X. Ouvrard, A. Rattinger, Collaboration spotting: A           spotting cite: An exploration system for the bibli-
     visual analytics platform to assist knowledge dis-            ographic information of publications and patents.,
     covery, ERCIM News (2017) 46–48.                              in: Proceedings of the 11th International Joint Con-
[13] S. Fricke, Semantic scholar, Journal of the Medi-             ference on Knowledge Discovery, volume 1, 2019,
     cal Library Association: JMLA 106 (2018) 145–147.             pp. 548–554. doi:10.5220/0008366105480554.
     URL: http://jmla.pitt.edu/ojs/jmla/article/view/280.     [26] N. Baloian, J. A. Pino, G. Zurita, V. Lobos-Ossandón,
     doi:10.5195/jmla.2018.280.                                    H. Maurer,       Twenty-five years of journal of
[14] J. F. Burnham, Scopus database: a review, Biomed-             universal computer science: A bibliometric
     ical digital libraries 3 (2006) 1–8. doi:10.1186/             overview, JUCS - Journal of Universal Computer
     1742-5581-3-1.                                                Science 27 (2021) 3–39. URL: https://doi.org/
[15] M. Ley, Dblp: some lessons learned, Proceedings of            10.3897/jucs.64594. doi:10.3897/jucs.64594.
     the VLDB Endowment 2 (2009) 1493–1500. doi:10.                arXiv:https://doi.org/10.3897/jucs.64594.
     14778/1687553.1687577.                                   [27] R. Campos, V. Mangaravite, A. Pasquali, A. M. Jorge,
[16] S. Ovadia, Researchgate and academia. edu: Aca-               C. Nunes, A. Jatowt, Yake! collection-independent
     demic social networks, Behavioral & social sci-               automatic keyword extractor, in: European Con-
     ences librarian 33 (2014) 165–169. doi:10.1080/               ference on Information Retrieval, Springer Interna-
     01639269.2014.934093.                                         tional Publishing, Cham, 2018, pp. 806–810.
[17] V. Henning, J. Reichelt, Mendeley-a last. fm for         [28] M. E. Rose, J. R. Kitchin, pybliometrics: Script-
     research?, in: 2008 IEEE fourth international                 able bibliometrics using a python interface to sco-
     conference on eScience, IEEE, 2008, pp. 327–328.              pus, SoftwareX 10 (2019) 100263. doi:https://
     doi:10.1109/eScience.2008.128.                                doi.org/10.1016/j.softx.2019.100263.
[18] J. J. Thomas, K. A. Cook, A visual analytics agenda,
     IEEE computer graphics and applications 26 (2006)
     10–13. doi:10.1109/MCG.2006.5.
[19] F. Osborne, E. Motta, P. Mulholland, Exploring
     scholarly data with rexplore, in: International se-
     mantic web conference, Springer, Springer Berlin
     Heidelberg, 2013, pp. 460–477.
[20] J. Zhao, C. Collins, F. Chevalier, R. Balakrishnan,
     Interactive exploration of implicit and explicit rela-
     tions in faceted datasets, IEEE Transactions on Vi-
     sualization and Computer Graphics 19 (2013) 2080–
     2089. doi:10.1109/TVCG.2013.167.
[21] P. Kraker, C. Kittel, A. Enkhbayar, Open knowledge
     maps: Creating a visual interface to the world’s sci-
     entific knowledge based on natural language pro-
     cessing, 027.7 Zeitschrift für Bibliothekskultur 4
     (2016) 98–103. doi:10.12685/027.7-4-2-157.
[22] Z. Li, C. Zhang, S. Jia, J. Zhang, Galex: Exploring
     the evolution and intersection of disciplines, IEEE
     transactions on visualization and computer graph-
     ics 26 (2020) 1182–1192. doi:10.1109/TVCG.2019.
     2934667.
[23] Z. Shen, M. Ogawa, S. T. Teoh, K.-L. Ma, Biblioviz: a
     system for visualizing bibliography information, in:
     Proceedings of the 2006 Asia-Pacific Symposium on
     Information Visualisation-Volume 60, volume 60,
     Citeseer, 2006, pp. 93–102. doi:10.1145/1151903.
     1151918.
[24] Y. Wang, M. Yu, G. Shan, H.-W. Shen, Z. Lu,

</pre>