<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Joint Proceedings of BIR</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Which topics are visualized by science maps? A topic-driven clustering efectiveness analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Juan Pablo Bascur</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Suzan Verberne</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nees Jan van Eck</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ludo Waltman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre for Science and Technology Studies (CWTS), Leiden University</institution>
          ,
          <addr-line>Leiden</addr-line>
          ,
          <country country="NL">the Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Leiden Institute of Advanced Computer Science (LIACS), Leiden University</institution>
          ,
          <addr-line>Leiden</addr-line>
          ,
          <country country="NL">the Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>14</volume>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>In this paper, we analyze the efectiveness of science maps based on citation similarity for clustering biomedical topics, in particular MeSH terms. Science maps group documents by topic to provide an academic literature overview. Because documents have multiple topics, some topics will correspond to the document clusters, while other topics might not be represented by a cluster. To identify these topics, we analyze the extent to which document clusters are formed about topics and use this as a signal to evaluate the topical clustering efectiveness. We found that the best clustered topics are related to diseases, organisms, anatomy, and techniques and equipment for diagnostics and therapy, while the worst are related to geographical entities, information sciences, natural science fields, and health care and occupations. Our findings indicate that science maps are better at clustering some topics than others, which is relevant for science maps users. The analysis method that we developed is also a contribution to topic-driven evaluation of science maps.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Science maps</kwd>
        <kwd>Clustering</kwd>
        <kwd>MeSH terms</kwd>
        <kwd>Topical analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Science maps [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] are visualizations that provide an overview of the content of academic
documents collections. The goal of science mapping is to find meaningful structures in the data
sets of scientific publications, which can then be used for literature analysis or information
retrieval [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. Some of the uses of science maps are field delimitation [4], research policy [5],
and enhanced document browsing [6]. A well established practice to create science maps is
to cluster similar academic documents together, and then to summarize the content of these
documents. In other words, the map is a set of clusters that emerged from document similarity,
and the topic of the clusters is inferred from the documents it contain.
      </p>
      <p>When using science maps, it is important to remember that academic documents can have
more than a single topic (e.g. a document about lung cancer is both about lungs and cancer), but
in a science map they typically can be assigned to only one cluster with a single topic. Losing
information when creating overviews is unavoidable, but it does raise the question of which of
the topics (aspects) of the documents the clustering will be based on. This is not an idle question,
as there is an increasing disagreement between expert-identified topics and clusters-identified
topics [7], indicating that the expert-identified topics are poorly represented. More specifically,
experts find that their identified topic documents are placed on clusters where they are minority,
instead of forming clusters where they are majority.</p>
      <p>In the current work, we investigate clustering for biomedical topics – in particular MeSH
terms, on clustering solutions based on citation similarity. We aim to find out which of the MeSH
terms form clusters where they are majority, a phenomenon that we will refer as clustering
efectiveness. Our approach is to group the biomedical topics into topic categories, and measure
the clustering efectiveness of each topic, so to create a ranking of topic categories.</p>
      <p>Our main research question is: Which topic categories have the highest and lowest clustering
efectiveness in science maps? Our contributions are twofold:
• We develop a methodology to compare the representation of diferent topics categories in
a clustering solution.
• We find that some topics categories that are better represented in a science map, and
which topics categories these are.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>Klavans and Boyack [8] analyzed how the areas of science relate to each other by aggregating
several maps of science that were created using diferent methods. Ruiz-Castillo and Waltman [9]
analyzed a method to normalize the citation count of documents according to the citation count
of their neighboring documents. This normalization method is used by the Leiden Ranking
to evaluate universities scientific production [ 10]. Science mapping has also been used for
information retrieval purposes, by grouping the documents in hierarchical levels to facilitate
the navigation of the map at diferent granularity [ 11, 12, 13, 14, 6]. Creating new groupings of
documents allows the user to increase the topic diversity of the retrieved documents [15].</p>
      <p>The most common method to evaluate the quality of a map of science is to ask experts
from the scientific fields of the documents if they think that the map of science reflects their
knowledge about the fields. The utility of this evaluation method is limited because it usually
gives an inconclusive result: The experts tend to agree with most of the science map but identify
caveats about certain details [16]. Additionally, there are several problematic issues intrinsic to
the expert evaluation method: The evaluation criteria may change between experts; seeing the
map may afect the expert own interpretation of the fields; the expert may be biased towards
the fields of their interest; and the expert may have limited competence in some (sub)fields [ 16].</p>
      <p>An alternative method to evaluate the quality of a map of science is to consider the intrinsic
properties of the network, which does not require additional metadata about the documents.
Commonly used intrinsic properties are specific desirable characteristics: Homogeneous cluster
sizes, few small clusters, stable clustering solutions between diferent runs of the cluster
algorithm, and a short computing time to create the clusters [17]. Another approach to evaluate a
map by its intrinsic properties is to assume that there exists an ideal map that can be intuitively
recognized by users and that any similarity metric that one can calculate between documents
is a proxy of the similarity metric that would generate the ideal map [18]. Ahlgren et al. [19]
used this method to measure the accuracy of several similarity metrics using a MeSH terms
similarity network for evaluation.</p>
      <p>The third approach to evaluate the quality of a map of science is to define a ground truth
made of documents that correspond to a given field, and evaluate the overlap between the
clustering solution and the ground truth: either to which extent all the documents of each
ifeld are contained in a single cluster [ 7, 20], or to which extent each cluster contains only
documents of a single field [ 21, 22, 23]. Some works obtain the ground truth from the references
of review articles [24, 13], but most works obtain the ground truth using expert knowledge. To
our knowledge, MeSH terms have not been used as ground truths, but instead have been used
as one of the tools to support expert knowledge ground truths.</p>
      <p>Evaluations that use expert knowledge ground truths have recently question the quality
of maps of science by challenging their capacity to identify fields of science [ 23, 7, 20, 22].
One explanation for this negative result is that it happens because a single document can
belong to several fields but it can only belong to a single cluster [ 7, 22], although some maps
allow documents to belong to multiple clusters [25, 26]. Another explanation is that the use of
diferent clustering algorithms can have a significant influence on the quality of the maps, and
it is impossible to know beforehand which clustering algorithm will give the best result for a
given map [27, 21]. In this paper, we investigate the clustering efectiveness of MeSH terms as a
start to close this gap.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <sec id="sec-3-1">
        <title>3.1. Data selection</title>
        <p>This section has the following structure: In subsection 3.1, we define how we select the
documents, clusters, MeSH tree branches, and MeSH terms that we use in our analysis. In subsection
3.2, we explain how we measure the clustering efectiveness of the topics.</p>
        <p>Documents The collection of documents that we use in our work comes from the work by
Ahlgren et al. [19]. This is a collection of 2,941,119 PubMed documents published between 2013
and 2017.</p>
        <p>Clustering solutions The clustering solutions that we use are the ones generated by Ahlgren
et al., who clustered the above-mentioned documents using diferent similarity metrics and
granularities, and the Leiden algorithm [28] for clustering, where the parameter Resolution
controls the granularity of the clustering solution (a higher Resolution value generates smaller
clusters). We selected the clustering solutions that Ahlgren et al. generated with the metric
Extended direct citation (EDC), which is calculated using the direct citation between documents
plus the citations to documents outside the document collection [18]. We selected clustering
solutions based on this metric because it has a good performance in the work by Ahlgren et al.
From the EDC based solutions, we selected the clustering solutions of three Resolution values:
2 * 10 −6 , 2 * 10 −5 and 2 * 10 −4 , because the first and second values yield cluster sizes similar to
those in the CWTS algorithmic mapping of science [11] (which is used in the Leiden Ranking
[10]), while the third value enables us to evaluate clusters of smaller size.</p>
        <p>We cleaned the clustering solutions by removing the clusters with a size of fewer than 10
documents because these clusters usually had documents that were disconnected from the
largest connected component of the similarity network. Removing these clusters removed only
a minor fraction of the total number of documents. The statistics of each clustering solution
after this process can be seen in Table 1.</p>
        <p>Topics Our topics are the Medical Subject Headings (MeSH) terms, a controlled vocabulary
thesaurus from the National Library of Medicine (NLM) used for indexing PubMed. MeSH terms
are semi-automatically annotated to documents by the NLM [29]. We obtained the MeSH terms
annotated for each document in our document collection, plus the metadata of the MeSH terms,
from the in-house CWTS database of PubMed and MeSH (version from 2023).</p>
        <p>We expanded the MeSH terms per document using the functionality exploding MeSH terms of
Pubmed, where a MeSH terms query retrieves not only the documents annotated with the query
MeSH terms but also the documents annotated with a descendant of the MeSH terms in the
MeSH terms tree. The result of this search is equivalent to adding to the documents the MeSH
term parents of the MeSH terms annotated in the documents and then performing the query.
We replicated the latter process by annotating the parents of the MeSH terms to the documents.
A single MeSH term can belong to more than one MeSH term tree branch (whose importance
is explained below), so to avoid documents connecting to additional branches through the
additional parent MeSH terms, we included in the parent MeSH terms the branches of the
annotated MeSH terms they came from.</p>
        <p>We also removed the annotated parents that where too redundant with their children, that
is, were present in about the same documents. We did this by creating groups of MeSH terms
with a Jaccard similarity of at least 0.9 with each other, where the similarity is calculated as the
number of documents where both MeSH terms are present, and then removed all but the MeSH
term with the least documents from these group. After that, we made three groups of MeSH
terms according to number of MeSH term documents they have, which we will refer as Size
bins: 501-1,000, 2,001-4,000, and 8,001-16,000. We made Size bins because the number of MeSH
term documents afect the evaluation, as explained in the next subsection, and we chose these
three Size bins because they contain a high number of MeSH terms, while also being spread
apart from each other in terms of number of MeSH term documents.</p>
        <p>Topic categories Our topic categories are the 16 nodes at the first level of the MeSH
hierarchical tree of topics [29], also known as the branches of the MeSH tree. The MeSH terms Male
and Female are not part of any branch, so we will ignore them in our analysis. We use branches
because they group the MeSH terms in epistemological categories (e.g. organisms), which are
the categories sometimes used for topical analysis of clusters [7, 30]. We remove the branches
with fewer than 10 MeSH terms per bin. We ended up with the 14 branches shown in Table 2.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Clustering efectiveness</title>
        <p>Selection of clusters In Section 1 we loosely defined clustering efectiveness as the extent
that the documents of a given MeSH term form clusters where they are majority, which can be
interpreted as the extent to which the MeSH term documents create clusters about the academic
topic. To measure this, we first identify the clusters in a clustering solution that contain the
MeSH term documents. Then, we sort these clusters by number of MeSH term documents, from
higher to lower, and sequentially select them until we cover half of the MeSH term documents.
We only use the selected clusters in our evaluation because we want our analysis to be resistant
to MeSH term documents with an outlier behaviour, which could be caused by poorly annotated
MeSH terms or by very small clusters. Our cluster selection criterion is inspired by cluster
quality metrics of Yuan, Zobel and Ling [31], and we expect that this criterion reflects the kind
of clusters a user is likely to select while browsing documents in a science map.
Clustering efectiveness metrics Once we have the selected clusters for a given MeSH
term, we measure clustering efectiveness using two metrics:
1. Purity: Purity represents the extent to which the selected clusters are composed of MeSH
term documents. It is the fraction of documents in the selected clusters that are MeSH
term documents. In mathematical terms, Purity is defined as:
  = ∑︀=∑1︀| ∩  | ,</p>
        <p>=1 ||
where  denotes the number of selected clusters,  denotes the documents in selected
cluster  and  denotes the MeSH term documents. The higher Purity, the more efective
the clustering. Purity is bounded between zero and one.
2. Inverse count of clusters (ICC): ICC represents the extent to which the MeSH term are
contained in a small number of clusters. ICC is defined as one divided by the number
of selected clusters. The higher ICC, the more efective the clustering. ICC is bounded
between zero and one.</p>
        <p>We use two metrics instead of only one so we can take the influence of Size bin and Resolution
into account, as they afect each metric in opposite ways: Higher Size bin and Resolution increase
Purity because they increase the probability of a given cluster to have a high concentration of
documents, while they decrease ICC because they increases the number of clusters necessary
to cover the MeSH term documents.</p>
        <p>The Purity or ICC of a branch is the median metric of their MeSH terms for a given Resolution
and Size bin. We intend to answer our research question by ranking the branches according to
Purity or ICC, for each Resolution and Size bin combination.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>• The Purity and ICC of any given Resolution and Size Bin combination tend to rank the
branches in the same order.
• The position of the branches tend to be similar among the Resolution and Size Bin
combinations.
• On the top four positions, the following branches appear more than half of the time: C
(Diseases), B (Organisms), E (Analytical, Diagnostic and Therapeutic Techniques, and
Equipment) and A (Anatomy).
• On the bottom four positions, the following branches appear more than half of the time: Z
(Geographicals), L (Information Science), H (Disciplines and Occupations) and N (Health
Care).
• C and Z are always the highest and lowest branches, and their Purity and ICC are
significantly higher and lower than the closest branch.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>In this section we discuss what we learn from the results from highest and lowest clustering
efectiveness, and the strengths and weaknesses of our work.</p>
      <sec id="sec-5-1">
        <title>5.1. Which topic categories have the highest and lowest clustering efectiveness in science maps?</title>
        <p>The similarity in the rankings between Purity and ICC suggest that the branches with higher
Purity than others also have higher ICC, which is to be expected as both metrics reflect clustering
efectiveness. The similarity in the rankings between diferent combinations of Resolution
and Size bin value suggests that the clustering is systematically biased toward certain topic
categories, with the strongest bias found at the top and bottom four positions of the ranking.
This is a relevant result because the clusters might as well be indiferent to the topic category.</p>
        <p>The top and bottom four branches are the most relevant information for a user of science
maps because they inform which topics will be best identified by the clustering. We found a
use case of sciences maps in the literature of which the findings agree with our results: Held
and Velden [22] reported that, in a science map, documents from the field of invasive biology
are spread among several clusters whose main topic is not this field but rather a given species.
Invasive biology belongs to the branch H (Disciplines and Occupations), which is one of the
bottom four branches, and species belongs to the branch B (Organisms), which is one of the top
four branches. Therefore, we would have expected the science maps to be poor at showing the
ifeld of invasive biology, and to place the invasive biology documents in clusters about species.</p>
        <p>It is specially interesting that H (Disciplines and Occupations) is among the bottom branches,
because this branch contains the MeSH terms for natural science fields. The low clustering
efectiveness of this branch suggests a diference between how scientists define fields of science
and how scientists cite each other. A solution could be to update how scientists define a scientific
ifeld by using science map clusters.</p>
        <p>The positions of J (Technology, Industry, and Agriculture) and M (Named Groups) vary
significantly in each ranking. The variance between diferent Resolution rankings is likely to be
stochastic. The variance between diferent size bins could be due to the topic of their MeSH
terms being very diferent between size bins. This is particularly likely for branch J because it
has a broad scope of topics.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Strengths and weaknesses</title>
        <p>Using a knowledge organization system of academic publications such as MeSH terms provides
a massive number of topics, and very diverse compared to what a single expert could propose.
Using a Biomedical knowledge organization system like MeSH terms instead of the ones from
other fields, like the Mathematics Subject Classification, the ACM Computing Classification
System, or the Physics Subject Headings, has the advantage that the Biomedical field produces
the highest number of documents among the fields of science. On the other hand, a high number
of Biomedical documents is detrimental to the creation of non-Biomedical clusters (the kind
of problems that emerge in Internal Mapping vs External Mapping [22]), and we diminished
this efect using a parameter for the Leiden algorithm (Objective function: Constant Potts Model)
that is specifically intended to resist this efect. We also diminished the efect of mislabeled
documents (e.g. the document DOI: 10.1007/s12603-020-1457-6 is incorrectly labeled with the
MeSH term Alcohol Drinking) by ignoring a share of the MeSH documents through the Coverage
parameter. Exploding the MeSH terms while respecting their source branch made our labeling
of documents more robust. Using the MeSH tree branches as topic categories allowed us to
use a curated classification of topics. However, some topic categories might be absent from the
MeSH tree (e.g. the topic that links diseases with their medicines) and some lower levels of the
MeSH tree might work better as topic categories (e.g. the children of the the branch Disciplines
and Occupations [H] are Natural Science Disciplines and Health Occupations, which might be
more informative topic categories than the branch).</p>
        <p>We only used the Leiden algorithm because it is the most commonly used by the science
mapping community. However, similar works have used more than one algorithm: Held,
Laudel and Gläser [7, 20] analyzed the topics in clusters created by the Leiden algorithm and the
Infomap algorithm. Held [27] assessed the suitability of the Leiden, Louvian, OSLM and Infomap
algorithms for creating topical clusters. Rossetti, Pappalardo and Rinzivillo [21] showed that
diferent clustering algorithms (Louvain, Infohiermap, cFinder, Demon, iLCD and Ego-Network)
have diferential performance on diferent types of networks (DBLP co-authorship network,
Amazon co-purchase network, YouTube users network, and LiveJournal users network). We
only used one citation similarity metric (EDC), so it is impossible to say if our results are
representative of all citation similarity metrics.</p>
        <p>MeSH terms are oriented to Biomedical topics, which makes our findings dificult to generalize
beyond the Biomedical domain. However, the same methodology can be used with knowledge
organization systems oriented to other topics types, like computer science or astronomy.</p>
        <p>Our work has the advantage that we calculate clustering efectiveness per MeSH term, while
other methods, like Waltman et al. [18], evaluate the clustering solution as a whole. Our method
is also agnostic of the size relation between the MeSH terms and the clusters because we are
trying to rank the clustering efectiveness of diferent topic categories instead of achieving
optimal clustering efectiveness. Future work could ideally control for diferent MeSH term sizes
by normalizing the Purity and ICC values instead of only comparing MeSH terms of similar
sizes, but creating a normalizing function was beyond the scope of our work. The disadvantages
of our method is that we lack a clustering efectiveness baseline to compare against, and that it
is hard to compare our results with the literature because we are using a novel experimental
setup.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In the current work we explored the extent to which the documents of given topic categories form
clusters where they are majority, a phenomenon that we refer to as clustering efectiveness. The
purpose of this is informing users of science maps what to expect from the maps. We found that
some topic categories have consistently higher clustering efectiveness than others. Describing
the topics with our own words, the highest topics categories are diseases, organisms, anatomy,
and techniques and equipment for diagnostics and therapy, while the lowest are geographical
entities, information sciences, natural science fields, and health care and occupations.</p>
      <p>Our work has shown that science maps serve some topics better than others in a predictable
way. Our analysis approach contributes to future topic-driven analysis of science maps. Future
research should evaluate the clustering efectiveness of topic categories in science maps based
on text similarity, which is another popular technique for making science maps after citation
similarity.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration of Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT in order to: grammar and
spelling check, drafting content, improve writing style, citation management and formatting
assistance. After using this tool, the authors reviewed and edited the content as needed and
take full responsibility for the publication’s content.
[4] M. Zitt, Meso-level retrieval: Ir-bibliometrics interplay and hybrid citation-words
methods in scientific fields delineation, Scientometrics 102 (2015) 2223–2245. doi: 10.1007/
s11192-014-1482-5.
[5] R. Sullivan, S. Eckhouse, G. Lewison, Using bibliometrics to inform cancer research policy
and spending, Monitoring financial flows for health research (2007) 67–78.
[6] J. P. Bascur, S. Verberne, N. J. van Eck, L. Waltman, Academic information retrieval using
citation clusters: in-depth evaluation based on systematic reviews, Scientometrics (2023)
1–27. doi:10.1007/s11192-023-04681-x.
[7] M. Held, G. Laudel, J. Gläser, Challenges to the validity of topic reconstruction,
Scientometrics 126 (2021) 4511–4536. doi:10.1007/s11192-021-03920-3.
[8] R. Klavans, K. W. Boyack, Toward a consensus map of science, Journal of the American
Society for information science and technology 60 (2009) 455–476. doi:10.1002/asi.
20991.
[9] J. Ruiz-Castillo, L. Waltman, Field-normalized citation impact indicators using
algorithmically constructed classification systems of science, Journal of Informetrics 9 (2015) 102–117.
doi:10.1016/j.joi.2014.11.010.
[10] CWTS, Leiden ranking fields, 2023. URL: https://www.leidenranking.com/information/
ifelds, [Online; accessed 20-March-2023].
[11] L. Waltman, N. J. Van Eck, A new methodology for constructing a publication-level
classification system of science, Journal of the American Society for Information Science
and Technology 63 (2012) 2378–2392. doi:10.1002/asi.22748.
[12] P. Sjögårde, Improving overlay maps of science: Combining overview and detail,
Quantitative Science Studies (2022) 1–40. doi:10.1162/qss_a_00216.
[13] P. Sjögårde, P. Ahlgren, Granularity of algorithmically constructed publication-level
classifications of research publications: Identification of topics, Journal of Informetrics 12
(2018) 133–152. doi:10.1016/j.joi.2017.12.006.
[14] P. Sjögårde, P. Ahlgren, Granularity of algorithmically constructed publication-level
classifications of research publications: Identification of specialties, Quantitative Science
Studies 1 (2020) 207–238. doi:10.1162/qss_a_00004.
[15] A. Nishikawa-Pacher, A typology of research discovery tools, Journal of Information</p>
      <p>Science (2021) 01655515211040654. doi:10.1177/01655515211040654.
[16] J. Gläser, Opening the black box of expert validation of bibliometric maps, in: Lockdown</p>
      <p>Bibliometrics: Papers not submitted to the STI Conference 2020 in Aarhus, 2020, pp. 27–36.
[17] L. Šubelj, N. J. Van Eck, L. Waltman, Clustering scientific publications based on citation
relations: A systematic comparison of diferent methods, PloS one 11 (2016) e0154404.
doi:10.1371/journal.pone.0154404.
[18] L. Waltman, K. W. Boyack, G. Colavizza, N. J. van Eck, A principled methodology for
comparing relatedness measures for clustering publications, Quantitative Science Studies
1 (2020) 691–713. doi:10.1162/qss_a_00035.
[19] P. Ahlgren, Y. Chen, C. Colliander, N. J. van Eck, Enhancing direct citations: A comparison
of relatedness measures for community detection in a large set of pubmed publications,
Quantitative Science Studies 1 (2020) 714–729. doi:10.1162/qss_a_00027.
[20] M. Held, G. Laudel, J. Gläser, Topic reconstruction from networks of papers may not
be possible if only one algorithm is applied to only one data model, in: Lockdown</p>
      <p>Bibliometrics: Papers not submitted to the STI Conference 2020 in Aarhus, 2020, p. 18.
[21] G. Rossetti, L. Pappalardo, S. Rinzivillo, A novel approach to evaluate community
detection algorithms on ground truth, in: Complex Networks VII: Proceedings of the
7th Workshop on Complex Networks CompleNet 2016, Springer, 2016, pp. 133–144.
doi:10.1007/978-3-319-30569-1_10.
[22] M. Held, T. Velden, How to interpret algorithmically constructed topical structures of
scientific fields? a case study of citation-based mappings of the research specialty of
invasion biology, Quantitative Science Studies 3 (2022) 651–671. doi:10.1162/qss_a_
00194.
[23] R. Haunschild, H. Schier, W. Marx, L. Bornmann, Algorithmically generated subject
categories based on citation relations: An empirical micro study using papers on overall
water splitting, Journal of Informetrics 12 (2018) 436–447. doi:10.1016/j.joi.2018.
03.004.
[24] R. Klavans, K. W. Boyack, Which type of citation analysis generates the most accurate
taxonomy of scientific and technical knowledge?, Journal of the Association for Information
Science and Technology 68 (2017) 984–998. doi:10.1002/asi.23734.
[25] S. Xu, J. Liu, D. Zhai, X. An, Z. Wang, H. Pang, Overlapping thematic structures extraction
with mixed-membership stochastic blockmodel, Scientometrics 117 (2018) 61–84. doi:10.
1007/s11192-018-2841-4.
[26] F. Havemann, J. Gläser, M. Heinz, Memetic search for overlapping topics based on a
local evaluation of link communities, Scientometrics 111 (2017) 1089–1118. doi:10.1007/
s11192-017-2302-5.
[27] M. Held, Know thy tools! limits of popular algorithms used for topic reconstruction,</p>
      <p>Quantitative Science Studies (2022) 1–30. doi:10.1162/qss_a_00217.
[28] V. A. Traag, L. Waltman, N. J. Van Eck, From louvain to leiden: guaranteeing well-connected
communities, Scientific reports 9 (2019) 5233. doi:10.1038/s41598-019-41695-z.
[29] N. I. of Health, Medical subject headings, Available at https://www.nlm.nih.gov/mesh/
meshhome.html, ???? Accessed: 2023-09-07.
[30] C. Seitz, M. Schmidt, N. Schwichtenberg, T. Velden, A case study of the epistemic function
of citations—implications for citation-based science mapping, in: Proceedings of the 18th
International Conference of the International Society for Scientometrics and Informetrics
(ISSI), 2021.
[31] M. Yuan, J. Zobel, P. Lin, Measurement of clustering efectiveness for document collections,
Information Retrieval Journal 25 (2022) 239–268. doi:10.1007/s10791-021-09401-8.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Science mapping: a systematic review of the literature</article-title>
          ,
          <source>Journal of data and information science 2</source>
          (
          <year>2017</year>
          )
          <fpage>1</fpage>
          -
          <lpage>40</lpage>
          . doi:
          <volume>10</volume>
          .1515/jdis-2017-0006.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Cobo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>López-Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Herrera-Viedma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Herrera</surname>
          </string-name>
          ,
          <article-title>Science mapping software tools: Review, analysis, and cooperative study among tools</article-title>
          ,
          <source>Journal of the American Society for information Science and Technology</source>
          <volume>62</volume>
          (
          <year>2011</year>
          )
          <fpage>1382</fpage>
          -
          <lpage>1402</lpage>
          . doi:
          <volume>10</volume>
          .1002/asi.21525.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>N. J. van Eck</surname>
          </string-name>
          ,
          <article-title>Methodological advances in bibliometric mapping of science</article-title>
          ,
          <source>EPS-2011-247- LIS</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>