=Paper=
{{Paper
|id=Vol-2950/paper-04
|storemode=property
|title=Towards Supporting Complex Retrieval Tasks Through Graph-Based Information Retrieval and Visual Analytics
|pdfUrl=https://ceur-ws.org/Vol-2950/paper-04.pdf
|volume=Vol-2950
|authors=Aleksandar Bobic,Jean-Marie Le Goff,Christian Gütl
|dblpUrl=https://dblp.org/rec/conf/desires/BobicGG21
}}
==Towards Supporting Complex Retrieval Tasks Through Graph-Based Information Retrieval and Visual Analytics==
Towards Supporting Complex Retrieval Tasks Through
Graph-Based Information Retrieval and Visual Analytics
Aleksandar Bobic1,2 , Jean-Marie Le Goff1 and Christian Gütl2
1
CERN, Espl. des Particules 1, Meyrin, 1211, Switzerland
2
Graz University of Technology, Rechbauerstraße 12, Graz, 8010, Austria
Abstract
The retrieval result analysis approaches of existing retrieval solutions tend to be either too simple, provide too few features
for exploring retrieval results or are very narrowly focused. We present an enhanced approach that attempts to address
these issues and help the wider community to get more insight from their retrieved data. To this end, this paper presents an
enhanced graph-based retrieval prototype built on the Collaboration Spotting platform. It combines information retrieval
and visual analytics concepts to provide an advanced solution for data retrieval and exploration. It enables users to retrieve
information, explore it from different perspectives using a graph representation and perform further searches based on their
navigation and selection interactively. Compared to traditional retrieval solutions, a search action in CS can reveal more
detailed aspects/techniques when visually analysing the search output. To gain initial feedback, we interviewed five domain
experts in related fields. Findings reveal that the developed retrieval approach provides users with helpful ways of exploring
search results and provides mechanisms of connecting features that are not explicitly linked otherwise. Furthermore, several
research directions and improvements have been identified for future work, which should be addressed.
Keywords
information retrieval, visual analytics, knowledge discovery, visualization system
1. Introduction port analysing correlations between papers. This ordered
list format does not help users to extract complex relation-
With the recent digitalisation efforts and steadily growing ships and gain deeper insights from large retrieval results
data piles, the amount of generated information rapidly [6, 7, 1]. In the context of bibliometric data, examples of
increased over a short period. This increase in data quan- data retrieval insights might include identifying author
tity made the need for efficient retrieval and visual analyt- collaboration networks, identifying trending research ar-
ics tools apparent. This need is also reflected in multiple eas in recent years, and discovering common concepts
works which identified the necessity for IR applications shared among fields. User-centred interactive analysis
that would enable users to carry out complex retrieval of bibliometric data can lead to better insights, novel
tasks, visualise hidden connections by leveraging interac- research projects, and more informed decision-making
tion and visualisation and extract implicit insights from [8, 9].
retrieved data automatically [1, 2]. Examples of such com- A variety of visual analytics (VA) tools and visu-
plex retrieval tasks could include retrieval of institution alisation approaches were created as a result of the
collaborating in a specific field, identification of author above-outlined needs for supporting bibliometric data
collaboration networks, retrieval of upcoming research exploration, and analysis workflows by different interest
topics connected to existing topics and more. groups [10, 8, 6]. A straightforward and broad division
As one example, the need for the above-mentioned can be made between solutions created for bibliomet-
features to analyse data and grasp connections is also ric mapping and general-purpose VA tools [6]. Both
present in bibliometric data. For this application scenario, groups leverage multiple visualisation techniques to pro-
data are traditionally gathered, indexed and made acces- vide users with an insightful exploration process and
sible by services such as Google Scholar [3], Microsoft reveal hidden connections which can not be easily in-
Academic [4] and ArXiv [5] which present search results ferred from an ordered list of retrieval results. A common
as an ordered list based on assumed relevance and do not approach to representing large connected datasets is dis-
offer advanced analytics approaches which would sup- playing and analysing them as a connected graph. The
potential of the graph representation has been apparent
DESIRES 2021 – 2nd International Conference on Design of to researchers and tool creators for quite some time [11].
Experimental Search Information REtrieval Systems, September
15–18, 2021, Padua, Italy
Another example of a graph-based representation is
" aleksandar.bobic@cern.ch (A. Bobic); Collaboration Spotting (CS). It is a graph-based visual
jean-marie.le.goff@cern.ch (J. L. Goff); c.guetl@tugraz.at (C. Gütl) analytics (VA) platform created to address the limitations
0000-0001-5403-8475 (A. Bobic); 0000-0001-9589-1966 (C. Gütl) of existing graph-based exploration tools such as limited
© 2021 Copyright for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0). leveraging of interactivity and network visualisations,
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
CEUR Workshop Proceedings (CEUR-WS.org)
and visualisation of explicit and implicit connections researchers and the broader community. Even though
between features [12]. It enables users to explore sizeable various natural language processing (NLP) and IR ap-
connected datasets by navigating through or changing proaches can be applied to bibliometric data, they might
perspectives1 and contexts2 . not produce insightful results [10]. Therefore, multi-
To enable users to execute complex retrieval tasks and ple visualisation techniques, VA tools and bibliometric-
gain further insight into their retrieval results, and based oriented solutions were created to provide better insights
on existing work, we develop an enhanced CS-based re- into the increasing amount of bibliometric data.
trieval system as a prototype. However, as an example A commonly used graph-based visual analysis tool
and due to large amounts of available data, we focus on with a broad application range, including analysis of
bibliometric data. As our main contribution, we integrate bibliometric data, is Gephi [11]. An example of a more
an enhanced retrieval mechanism in the CS platform’s narrowly focused bibliometric data tool is Galex which
main version. We combine graph-based VA and informa- represents disciplines, areas and institutions as an inter-
tion retrieval (IR) by introducing an enhanced IR system active galaxy [22]. BiblioViz focuses specifically on table
that retrieves data from a search provider and leverages and graph visualisation to enable users to investigate bib-
an interactive graph representation to display the search liometric data from various perspectives [23]. Another
results. It also provides a mechanism for further search tool for analysing publication data is VISPubComPAS
refinement through simple graph interactions. Further- which focuses on the analysis of institutions and authors
more, to identify the needs of experts, understand how [24]. Additionally, a solution for exploring university bib-
to develop a system supporting users at multiple steps of liometric data for driving strategic decisions is presented
their retrieval tasks and potentially expanding the sys- by [9].
tem for broader use, we interview five experts with a As a result of this research area’s growing popularity,
semi-structured approach. multiple surveys were created covering different aspects
This paper is structured in the following manner: Sec- and solutions. [10] provides an overview of interactive
tion 2 introduces briefly related concepts and related VA approaches for patent and publication data. Next, [8]
work. Section 3 describes the requirements, architec- report on approaches for extracting and visualising bib-
ture, technical details and the user interface (UI) of the liometric data. Finally, [6] identify multiple solutions and
retrieval system. Section 4 describes three sample case two common workflows for processing and visualising
studies with real-world data and presents how the re- publication data. These surveys indicate the potential
trieval solution in CS could be used to gain further insight of reviewed approaches but also identify multiple open
into bibliometric data. Additionally, it also describes the challenges, such as lack of applications leveraging user
feedback gathered from experts and discusses potential interaction for analysis, lack of empirical research regard-
future research directions. The paper concludes with ing the effectiveness of visualisation techniques and tools,
Section 5 where we discuss the current implementation visualisation of relationships between different data fea-
and future work. tures and more.
2. Related Work 2.2. Bibliometric-Oriented Search
Systems
2.1. Visual Analysis Bibliometric Although some of the aforementioned search engines
Approaches and repositories provide further insight into author in-
A variety of modern solutions such as search engines [3, fluence, relations between papers, and more, their main
4, 13], repositories [14, 15, 5] and services [16, 17] collect, focus is still related to representing content as an ordered
create and retrieve large amounts of bibliometric data list. This almost never-ending list of results ranked by
which can potentially provide new insights. This data assumed relevance does not provide a way of gaining
can be analysed using VA, which is a science that aims in-depth insights into data [6, 7, 1]. As identified by [2] IR
to provide explainable insight into large abstract data systems should enable the execution of elaborate retrieval
through interactive data visualisation [18]. It can be tasks, which might lead to more significant insights and
combined with IR approaches to provide a deeper insight drive decision making processes by leveraging visuali-
into retrieval results by visualising them and enabling sation methods to display connections in the retrieved
the use of advanced tools for their analysis [19, 20, 21]. data. Multiple approaches have been created to mitigate
Bibliometric data analysis is usually demanding, te- the issues of traditional bibliometric search engines by
dious and time-consuming and can overwhelm novice combining VA with IR. An example that leverages the
above-mentioned connections is Rexplore, an analytics
1
Data features represented as the graph nodes. tool that enables retrieval of research publication data
2
Data features represented as graph edges. via facets and sorting of results [19]. Another example
is PivotSlice, which focuses on searching and analysis of 3. Design and Implementation
retrieval results using a combination of filters and facets
[20]. 3.1. Prototype Requirements
An example from industry is Connected Papers3 which
Based on the identified gaps and needs outlined in the
visualises retrieval results as connected graphs where pa-
previous sections, our goal is to build an enhanced graph-
pers are connected based on their similarity. Another
based retrieval and exploration prototype based on the
similar solution is Open Knowledge Maps which visu-
existing CS system. As an example application scenario,
alises retrieval results as a multi-level bubble chart where
we chose to use bibliometric data due to its vast accessibil-
papers are grouped based on text similarity [21]. Al-
ity. To this end, the retrieval system should provide suffi-
though there are many existing approaches and services,
cient flexibility to enable CS users to search via multiple
[7] identify the limits of these tools, focusing on provid-
queries and through a wide variety of data. Additionally,
ing search results in the form of individual papers or
the retrieval system should leverage users’ interactions
focusing on bibliometric analysis and provide a concep-
to provide an efficient retrieval and exploration workflow.
tual solution.
Furthermore, the system should enable the investigation
of implicit connections between the entities of a dataset
2.3. Collaboration Spotting (e.g. Institution collaborations based on co-authorship).
The approaches above are, to the authors’ knowledge, Finally, the expanded CS system should be ready for
either not actively developed anymore, are not accessi- empirical analysis studies and gather interaction data.
ble, cannot be used on large scale data, or too simple to High-level requirements can be summarized as:
provide users with advanced analytics insights. 1. Support integration of multiple datasets and
As a possible alternative, CS is a graph-based VA plat- search providers.
form that enables users to analyse large quantities of
2. Support exploration of implicit and explicit entity
connected data through the use of filters, facets, and con-
connections.
texts [12]. Unlike other approaches, it can be used to
3. Collect user interaction data for empirical studies.
analyse a wide variety of datasets and enables users to
change the graph structure dynamically. A separate CSC 4. Visualise complex search results using various
version was developed to explore how to provide users visual cues.
with complete retrieval and analytics experience [25]. 5. Enable exploration of search results using graph
However, this version did not enable users to manipulate interactions.
their subsequent searches with a finer granularity (for 6. Enable search query refinement through graph
example by combining their selected nodes that repre- interactions.
sent the search result features with Boolean operators) 7. Support complex search query creation.
since it relied on document embeddings and was never 8. Provide explainable report generation.
implemented in the primary CS version. Furthermore, it 9. Enable visual creation of retrieval queries and
did not explicitly combine graph interactions with the filtering steps.
retrieval process. 10. Enable graph analysis approaches to gain further
Based on insights and data analysis requirements, we insight.
aim to incorporate a prototype IR system into the primary 11. Enable usage of graphs for knowledge retrieval.
CS platform to support users in performing complex re-
trieval tasks. As part of this process, we introduce a novel As part of the initial prototype we focus on require-
way of performing searches by exploring intrinsic graph ments 1 to 6.
patterns and selecting graph nodes from combinations
of different features using the prototype. Furthermore,
3.2. Prototype Architecture
we add new connections to external services in CS and
an analytics integration to perform empirical evaluation To address the novel combination of graph interaction
studies. Finally, we discuss use cases in bibliometric data and retrieval concepts described in this work and based
analysis, describe possible approaches to analysing such on the above-listed requirements we enhanced the exist-
data with CS, report feedback from expert interviews and ing architecture seen in Fig. 1 with new components. The
discuss potential future research directions. architecture is split into multiple conceptual components
for clarity. However, in reality, the Graph Calculation,
API Request Handlers and the Search are one module.
This architecture is set to change once the move from
a prototype to a production system is made. The Graph
3
https://www.connectedpapers.com/ Vis. & Interaction (Fig. 1 a) component and the Menus &
Side-Panels (Fig. 1 b) component handle interactions such Graph Calculation h
f
Search
as selecting a search source, entering search queries, nav- e
igating graphs and selecting graph elements for search Graph
Generator
Community
Detection
Layout
Generation
d
Search
Source & Search
Provider Handlers
refinement. Once users start a new search, the Search API
Selector
Handler (Fig. 1 c) sends a request with their query and se- Front-end
Request
Handlers
g
lected dataset to the API Request Handler (Fig. 1 d), which c a b
forwards the information to the Search Source & Provider Search Graph Vis. & Menus & Analytics
i
Data
Search
Handler Interaction Side-Panels System Providers
Selector (Fig. 1 e) component. Here the request is parsed,
and the appropriate data search provider is selected based
on a project environment variable.
The currently supported data search providers include Figure 1: Simplified prototype architecture diagram. The
Elasticsearch4 , Whoosh5 and the ArXiv API. Users who filled rectangles represent conceptual modules in which the
aim to perform an initial shallow exploration with a small code is grouped. Light-grey rectangles represent previously
amount of data and no advanced pre-processing can use existing modules; dark-grey rectangles represent previously
the ArXiv API or an API from another existing hosted existing modules that were updated, while the blue rectan-
search provider. However, the introduction of new search gles represent newly introduced modules. The arrows repre-
sent a simplified data flow between components. The dashed
providers would require implementing a new Python
rectangles represent groupings of conceptual modules.
search component that would communicate with the
search providers. On the other hand, users who aim to
get a deeper insight into their data and perform a more
thorough exploration can use an existing search provider
like Elasticsearch or Whoosh. Furthermore, the latter
search providers enable the use of time-demanding pre-
processing and pre-analytic steps outside of the expanded
CS system. For example, a user might wish to extract
named entities or add additional data features before
importing them into the system.
The selected dataset, query and search operator are
sent to the Search Handler (Fig. 1 f) component, where the
search is executed using the previously selected search
provider (Fig. 1 g), and the results are transformed into a Figure 2: Prototype search interface. Users start by writing
CS-specific format. Results represent a network of data queries in the search field (j). They then select a search source
out of which a graph corresponding to users selection and the binding operator for their queries (k). Next, they can
is built using the Graph Calculation (Fig. 1 h) module, inspect their search terms and delete terms from the search
term list (l). Finally, they run their search by pressing the
which retrieves the graph id from the newly generated
search button.
graph. The id is then sent back to the Front-end to re-
trieve the newly generated graph. The users’ retrieval
and exploration process is enhanced by features such as
navigation through the result graph and multiple itera- using the right drop-down. Additionally, they can also re-
tions on their search. The interactions users perform on move the queries from the query list (Fig. 2 l) by hovering
the Front-end are tracked using Matomo6 as part of the over them and clicking the "x" button. They access the re-
Analytics System (Fig. 1 i) component for user behaviour sults by expanding the corresponding node into a graph,
and engagement analysis. selecting the graph parameters of their choice from the
menu, and navigating through the network. Furthermore,
users can select one or multiple nodes, communities or
3.3. User Interface connected components to perform another search. Once
To retrieve information on an initial dataset, users per- they selected the relevant nodes, they can open the search
form searches by opening the search modal seen in Fig. 2. modal, whose search box is populated with the labels of
Once they enter their queries, they can select one of the the selected nodes as keyphrases. If they select a commu-
available data sources visible in the left drop-down (Fig. 2 nity or a connected component, only the most significant
k) and select a binding Boolean operator for the queries (based on size) three nodes will be retained for the search.
Furthermore, users can navigate through the data net-
work by selecting multiple perspectives and contexts and
4
https://www.elastic.co/elasticsearch exploring either explicit connections in the dataset or
5
https://whoosh.readthedocs.io/en/latest/index.html
6
https://matomo.org/
identifying new implicit connections.
3.4. Data Preparation Table 1
Frequency of Special Characters
The dataset should be appropriately pre-processed to
leverage the prototype’s features effectively. The ex- What is your occupation, and what are your daily tasks?
ample dataset is retrieved from the Journal of Univer- Where do you see the strengths of CS?
sal Computer Science (J.UCS) [26] since the authors Where do you see the weaknesses of CS?
had full access to it’s detailed metadata. The data in- What could be improved in CS?
cludes the doi, title, abstract, authors, affiliations, author Did you identify any other uses-cases for the system?
keyphrases and publication categories. Since the author-
defined keyphrases might be biased and reflect only
on a subset of the paper content, we extract additional location. They then explore institutions that are con-
keyphrases using the keyphrase extraction tool YAKE! nected if their representatives wrote a joint paper. Using
[27] to provide an alternative view on the paper con- this view, the executive can identify institutions where
tent. Additionally, we split the publication categories they might know someone and establish a collaboration.
into level 1, 2 and 3 to provide users with the possibility
of exploring categorical data through graph navigation. 4.1.3. Introduction to a New Topic
We extend this data with data from Scopus7 by extracting
A novice researcher in software engineering explores
the affiliation name, affiliation city and affiliation country.
the J.UCS categories using the prototype system. They
Finally, we convert the data into a format that the CS
further explore the author keyphrase of papers in the
platform can process. Once a graph is generated from the
software engineering category to identify points of inter-
data, users can explore which authors and institutions
est relevant to their research. They notice that software
collaborate, identify authors’ focus categories, and more.
engineering is connected to formal methods and decide
to investigate both topics’ authors. Only a few authors
4. Case Studies and Evaluation published in J.USC about these topics, so they return to
the previous author keyphrase graph and search for the
The focus of the case studies is on bibliometric data from same topics using the ArXiv API. The search results rep-
the J.UCS journal as described above. resent a more diverse set of documents that can be used
to identify prominent authors in the field of interest by
4.1. Case Studies observing the node sizes.
4.1.1. Potential Reviewers
4.2. Expert Evaluation
A journal editor would like to identify potential reviewers
4.2.1. Study Environment
for an IR and NLP paper. Using the prototype system,
they search for "IR" and "NLP" on the J.UCS dataset. They To identify further potential users’ needs, we organised
first filter out the resulting journal categories that do not individual interviews with five experts from different do-
fall into one of the two above mentioned topics. Next, mains who could benefit from using CS. The interviews
they navigate to a new key phrase graph where they were semi-structured to gain quick feedback that will
select phrases closely related to NLP or IR and navigate guide further research and development efforts and po-
to the author view. The authors are connected if they tentially enable the discovery of additional edge cases
have joint publications. The editor can now identify that the authors might not have identified yet. Further-
potential candidates who are likely knowledgeable in more, we aimed to identify how to implement future
the fields mentioned above and avoid authors who have versions of CS in particular in a way which will enable
previously published papers with the submission author. users to perform complex retrieval and analysis tasks,
support users at multiple steps of the retrieval process
4.1.2. Identification of Potential Collaborators and gain potential users’ view for shaping future system
features. As part of the interview, which was held as
A company executive searches for online education us- an online meeting, we presented the enhanced CS sys-
ing the prototype system to identify potential collabora- tem, discussed the three use cases mentioned earlier and
tors in online education. They explore the results from demonstrated how users could use CS for the first use
keyphrases’ perspective to identify relevant phrases and case using a dataset from J.UCS as an example through
use them to perform a search. Next, they explore and fil- screen sharing. Finally, the experts were asked the five
ter out countries that are not easily accessible from their questions depicted in Table 1. During the interview, they
could ask to view specific sections of CS again and asked
7
The data was downloaded from Scopus in winter of 2020-2021 further questions about how the system works.
using the Python library Pybliometrics [28]
4.2.2. Study Participants more, similarly to what was concluded in [10] experts
suggested the use of other data types such as source code
The first participant was a librarian with more than 30
and multimedia attached to scientific work. Menus could
years of experience who also had experience in database
be improved by including wording which calls for action9
usage and is leading the library services for the last 11
and is understandable for the general public. Addition-
years. The next participant was a computer scientist
ally, it was proposed that they should take up less space.
and doctoral student focusing on learning environments
To simplify the graph search and exploration, the sys-
and learning analytics. The third participant was a post-
tem should support natural language queries that can
doctoral researcher focusing on computer science and
be automatically translated into search and exploration
psychology who participated in research projects focus-
actions. The UI could be additionally improved by pro-
ing on VA, UI design, mitigation of cognitive biases and
viding an onboarding tutorial with short introductory
more. The fourth participant was a senior data scientist
examples, introducing an advanced UI mode with the
who analyses literature based on clients’ requirements
complete set of features and a simple UI mode that can
and implements machine learning algorithms for various
be used to navigate through predefined templates and
datasets based on this analysis. The final participant was
presenting a traditional list view of results alongside the
a Knowledge Transfer Officer, who, among other things,
graph view. The accommodation of novice users was
focuses on patent and research paper exploration and re-
recognised as a critical feature also by [6] who suggested
trieval. All participants were previously vaguely familiar
that the amount of data shown should be adjustable in or-
with the project but did not know how it works or the
der not to overwhelm novice users. Furthermore, it was
details of how it can be used and what are its features.
mentioned that creating reports based on the performed
actions and enabling easy graph export with the search
4.2.3. Study Results and navigation history and the option to customise the
A commonly identified strength of the prototype com- background colour to better fit in professional reports
pared to traditional web search systems is that users can would be beneficial.
explore results efficiently and avoid fine-tuning precision We also identified additional use cases such as creat-
and recall through keyphrases by navigating through ing yearly reports about larger institutions’ publications,
the graph. Additional strengths include the ability to code analysis evaluation where concepts used and bugs
identify relations between fields and authors, the visual encountered by each user could be visualised, and analy-
feedback provided through the node sizes, the ability sis of personal email corpora. A use case that two experts
to make sense of information that would be difficult to mentioned is the visualisation and exploration of em-
analyse with simpler representations and the ability to ployee skills and project participation inside companies.
explore implicit connections. An expert also mentioned In conclusion, the combination of IR and VA helps
that "Navigation is the door to serendipity". In the context facilitate user exploration through graph navigation and
of the prototype system, navigation is well supported by helps avoid fine-tuning keyphrases for relevant results.
enabling different perspectives and contexts.
We also identified much room for improvements. Sug- 4.3. Future Research Directions
gestions include visualising other data relationships such
as the impact of papers on different fields, using a wider Based on the expert feedback, literature survey, initial
variety of visual cues to display new dimensions and requirements and our own experience, we identified sev-
avoid node label overlap. The need to handle visualisa- eral future research directions. Some of the identified
tions of multidimensional datasets was also identified directions are listed below.
by [10, 8]. Moreover, data should also be presented with
traditional charts to give the user a familiar overview of IR aspects include the use of retrieved graphs not
the data. Furthermore, more quantitative details about only for gaining analytical insights but also for advanced
the retrieved data and more insightful details such as the knowledge retrieval for example by exploiting graph pat-
largest clusters and what they include were among the terns for further retrieval processes. Furthermore, we
suggestions. Experts also proposed exploring ways of need to identify how to support user groups to perform
integrating financial data and general impact data8 to multi-user retrieval and analysis tasks together. Another
increase the added value of data exploration. A similar broad question identified by [1] is how to support users
conclusion was reached by [6] who suggest using social in complex retrieval tasks.
media for the expansion of scientific datasets. Further-
Graph analysis aspects may include content summa-
8
For example, if a solution is mentioned in news articles with- rization of larger graph clusters, entity generation from
out an explicit citation it should still count as a mention which con-
9
tributes to the general impact of a work. For example "Select by:"
graph patterns and identification of improved clustering Acknowledgments
and layout techniques which might be more appropriate
for the dynamic nature of the graphs in this work. We want to thank the five interviewed experts for their
time and for contributing valuable feedback. We would
Machine learning aspects contain an exploration of also like to thank André Rattinger for scraping and sup-
conversational IR approaches to enhance users analytical plying the primary J.UCS dataset.
abilities of result graphs as well as generate user models
based on user interactions which could aid users in the References
retrieval process [1, 2].
[1] Z. Chen, X. Cheng, S. Dong, Z. Dou, J. Guo,
Engineering aspects of future work include im- X. Huang, Y. Lan, C. Li, R. Li, T.-Y. Liu, et al., Infor-
proved connection generation and system refactoring. mation retrieval: a view from the chinese ir com-
The current system is not scalable and should be rewrit- munity, Frontiers of Computer Science 15 (2021)
ten in modern technologies with modularity in mind. 1–15. doi:10.1007/s11704-020-9159-0.
Furthermore, the connection calculation process should [2] J. S. Culpepper, F. Diaz, M. D. Smucker, Research
be refactored to avoid implying connections between frontiers in information retrieval: Report from
points that might not directly connect in the retrieved the third strategic workshop on information re-
dataset. trieval in lorne (swirl 2018), SIGIR Forum 52
(2018) 34–90. URL: https://doi.org/10.1145/3274784.
Evaluation aspects which represent the final key as- 3274788. doi:10.1145/3274784.3274788.
pect and are a prevalent issue in VA systems are con- [3] P. Jacsó, Google scholar: the pros and the cons, On-
cerned with efficient quantitative evaluation, which will line information review 29 (2005) 208–214. doi:10.
provide a clearer picture about the usefulness of the sys- 1108/14684520510598066.
tem [1]. [4] A. Sinha, Z. Shen, Y. Song, H. Ma, D. Eide, B.-J.
Hsu, K. Wang, An overview of microsoft academic
service (mas) and applications, in: Proceedings of
5. Conclusion and Future Work the 24th international conference on world wide
web, 2015, pp. 243–246. doi:10.1145/2740908.
This paper describes a graph-based visual analytics and 2742839.
IR prototype that enables the search and exploration of [5] P. Ginsparg, Arxiv at 20, Nature 476 (2011) 145–147.
data through a combination of IR and VA approaches. doi:10.1038/476145a.
The solution is built as an enhancement to the CS sys- [6] M. E. Bales, D. N. Wright, P. R. Oxley, T. R. Wheeler,
tem. As part of the IR process, users perform a traditional Bibliometric visualization and analysis software:
search whose results are then presented as an interactive State of the art, workflows, and best practices
graph that can be explored or used to perform multiple (2020).
additional searches. To investigate how the introduced [7] J. P. Bascur, N. J. van Eck, L. Waltman, An interac-
solution could help users in their retrieval process, iden- tive visual tool for scientific literature search: Pro-
tify users needs and ideas for future system development, posal and algorithmic specification., in: BIR@ ECIR,
we held interviews with five experts. Their answers indi- 2019, pp. 76–87.
cate that the prototype does provide a helpful workflow [8] J. Liu, T. Tang, W. Wang, B. Xu, X. Kong, F. Xia, A
for analysing data but that there is also room for improve- survey of scholarly data visualization, Ieee Access 6
ment. Among the areas of improvement, we identified (2018) 19205–19221. doi:10.1109/ACCESS.2018.
enrichment of the dataset using data from other domains, 2815030.
UI simplifications, the introduction of new interaction [9] P. Rosenthal, N. H. Müller, F. Bolte, Visual an-
approaches and displaying the search result data in tradi- alytics of bibliographical data for strategic deci-
tional and graph form. Furthermore, visualisations could sion support of university leaders: A design study.,
be enhanced by additional visual cues. We also discuss in: VISIGRAPP (3: IVAPP), 2019, pp. 297–305.
future research directions that would be beneficial for the doi:10.5220/0007396302970305.
proposed system. We plan to improve and refactor the [10] P. Federico, F. Heimerl, S. Koch, S. Miksch, A sur-
system and conduct an empirical study to gain further vey on visual approaches for analyzing scientific
insight into how this approach can help support users in literature and patents, IEEE transactions on visual-
their retrieval process. ization and computer graphics 23 (2017) 2179–2198.
doi:10.1109/TVCG.2016.2610422.
[11] M. Bastian, S. Heymann, M. Jacomy, Gephi: an open
source software for exploring and manipulating net- Vispubcompas: a comparative analytical system
works, in: Proceedings of the International AAAI for visualization publication data, Journal of
Conference on Web and Social Media, volume 3, Visualization 22 (2019) 941–953. doi:10.1007/
2009. doi:10.13140/2.1.1341.1520. s12650-019-00585-2.
[12] A. Agocs, D. Dardanis, R. Forster, J.-M. Le Goff, [25] A. Rattinger, J.-M. Le Goff, C. Guetl, Collaboration
X. Ouvrard, A. Rattinger, Collaboration spotting: A spotting cite: An exploration system for the bibli-
visual analytics platform to assist knowledge dis- ographic information of publications and patents.,
covery, ERCIM News (2017) 46–48. in: Proceedings of the 11th International Joint Con-
[13] S. Fricke, Semantic scholar, Journal of the Medi- ference on Knowledge Discovery, volume 1, 2019,
cal Library Association: JMLA 106 (2018) 145–147. pp. 548–554. doi:10.5220/0008366105480554.
URL: http://jmla.pitt.edu/ojs/jmla/article/view/280. [26] N. Baloian, J. A. Pino, G. Zurita, V. Lobos-Ossandón,
doi:10.5195/jmla.2018.280. H. Maurer, Twenty-five years of journal of
[14] J. F. Burnham, Scopus database: a review, Biomed- universal computer science: A bibliometric
ical digital libraries 3 (2006) 1–8. doi:10.1186/ overview, JUCS - Journal of Universal Computer
1742-5581-3-1. Science 27 (2021) 3–39. URL: https://doi.org/
[15] M. Ley, Dblp: some lessons learned, Proceedings of 10.3897/jucs.64594. doi:10.3897/jucs.64594.
the VLDB Endowment 2 (2009) 1493–1500. doi:10. arXiv:https://doi.org/10.3897/jucs.64594.
14778/1687553.1687577. [27] R. Campos, V. Mangaravite, A. Pasquali, A. M. Jorge,
[16] S. Ovadia, Researchgate and academia. edu: Aca- C. Nunes, A. Jatowt, Yake! collection-independent
demic social networks, Behavioral & social sci- automatic keyword extractor, in: European Con-
ences librarian 33 (2014) 165–169. doi:10.1080/ ference on Information Retrieval, Springer Interna-
01639269.2014.934093. tional Publishing, Cham, 2018, pp. 806–810.
[17] V. Henning, J. Reichelt, Mendeley-a last. fm for [28] M. E. Rose, J. R. Kitchin, pybliometrics: Script-
research?, in: 2008 IEEE fourth international able bibliometrics using a python interface to sco-
conference on eScience, IEEE, 2008, pp. 327–328. pus, SoftwareX 10 (2019) 100263. doi:https://
doi:10.1109/eScience.2008.128. doi.org/10.1016/j.softx.2019.100263.
[18] J. J. Thomas, K. A. Cook, A visual analytics agenda,
IEEE computer graphics and applications 26 (2006)
10–13. doi:10.1109/MCG.2006.5.
[19] F. Osborne, E. Motta, P. Mulholland, Exploring
scholarly data with rexplore, in: International se-
mantic web conference, Springer, Springer Berlin
Heidelberg, 2013, pp. 460–477.
[20] J. Zhao, C. Collins, F. Chevalier, R. Balakrishnan,
Interactive exploration of implicit and explicit rela-
tions in faceted datasets, IEEE Transactions on Vi-
sualization and Computer Graphics 19 (2013) 2080–
2089. doi:10.1109/TVCG.2013.167.
[21] P. Kraker, C. Kittel, A. Enkhbayar, Open knowledge
maps: Creating a visual interface to the world’s sci-
entific knowledge based on natural language pro-
cessing, 027.7 Zeitschrift für Bibliothekskultur 4
(2016) 98–103. doi:10.12685/027.7-4-2-157.
[22] Z. Li, C. Zhang, S. Jia, J. Zhang, Galex: Exploring
the evolution and intersection of disciplines, IEEE
transactions on visualization and computer graph-
ics 26 (2020) 1182–1192. doi:10.1109/TVCG.2019.
2934667.
[23] Z. Shen, M. Ogawa, S. T. Teoh, K.-L. Ma, Biblioviz: a
system for visualizing bibliography information, in:
Proceedings of the 2006 Asia-Pacific Symposium on
Information Visualisation-Volume 60, volume 60,
Citeseer, 2006, pp. 93–102. doi:10.1145/1151903.
1151918.
[24] Y. Wang, M. Yu, G. Shan, H.-W. Shen, Z. Lu,