<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Research Inequality in NLP: How Resource Disparities Shape Topic Trends and Methodological Difusion via Citations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lizhen Liang</string-name>
          <email>lliang06@syr.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bei Yu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Syracuse University</institution>
          ,
          <addr-line>Syracuse, New York</addr-line>
          ,
          <country country="US">United States</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>The growing resource gap between institutions raises critical questions about transparency, replicability, and inclusiveness in AI research. While some AI research topics remain accessible, research in areas such as large language models (LLMs) necessitate more resources such as computational power and data access: resources largely concentrated among industry companies and a few top universities. This study investigates research inequality in NLP by analyzing topic shifts, institutional resource gap, and citation intent patterns in papers from rise of large language models (LLMs) and generative tasks, which have driven increased attention to topics such as Language Modeling, Generation, and Multimodality, while traditional areas like Machine Translation and Syntax/Parsing have declined. High-resource institutions are more likely to publish on these trending topics, as indicated by higher topic shift ratios. In contrast, low-resource teams are concentrated in declining topics. Citation intent analysis reveals that methodology-use citations, which indicate resource transfer, are decreasing over time, particularly in trending topics. This trend is especially pronounced in citations from low-resource to high-resource teams, suggesting that widening computational and infrastructural gaps limit the ability of low-resource institutions to adopt and build upon frontier research. These findings highlight a growing divide in NLP research participation and impact, underscoring the need for more inclusive and equitable research practices.</p>
      </abstract>
      <kwd-group>
        <kwd>Methodological</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Modern AI research demands increasing resources, especially access to large-scale infrastructure and
datasets, creating a significant advantage to institutions with greater financial and computational
capacity. For instance, in 2020, private enterprises reportedly spent over $80 billion on AI, while
U.S. federal non-defense investment in AI-related research and development amounted to just $1.5
billion [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This disparity has enabled well-resourced teams, especially those afiliated with major
technology companies, to drive the development of increasingly sophisticated AI models. In contrast,
many academic and public-sector institutions lack the resources necessary to reproduce, extend, or
critically evaluate these advances [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], raising concerns about the inclusiveness and reproducibility of
progress in the field.
      </p>
      <p>
        This resource gap not only afects what institutions can build but also shapes what research questions
they choose to ask [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. While industry actors often drive progress through proprietary models that
require vast resources [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], academic and under-resourced teams often focus on problems that are more
computationally tractable or theoretically grounded [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ].
      </p>
      <p>
        Despite this significant resource disparity, the growing availability of open-source software
frameworks, pretrained models, and benchmark datasets, has contributed to a broader participation in AI
research [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], as evidenced by the influx of new authors in recent years [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This raises a critical question:
to what extent do high-resource teams, while pushing the frontier, also act as enablers (e.g., through
https://liamliang.github.io/ (L. Liang); https://ischool.syracuse.edu/bei-yu/ (B. Yu)
      </p>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073
the release of resources) for broader access? To better understand this dynamic, we focus on the NLP
research community and address two research questions: 1) How do research topics difer between
research teams from high-resource and low-resource institutions? And 2) To what extent has research
from high-resource institutions lowered or heightened the barriers for low-resource teams in NLP?</p>
      <p>To investigate the first question, we analyzed temporal shifts in topic distributions across institutions
with varying resource levels, examining whether low-resource teams are increasingly constrained in
the scope of topics they pursue.</p>
      <p>
        Drawing on theories of citation that consider citations as framing devices and connectors of
intellectual lineages [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and previous studies that has used citation analysis to understand research
dynamics [
        <xref ref-type="bibr" rid="ref10 ref11 ref9">9, 10, 11, 12</xref>
        ], we investigate our second research question by analyzing citation patterns.
Specifically, we examine how low-resource teams cite the work of high-resource teams, with a focus on
methodological adoption, such as the use of models, datasets, or software developed by high-resource
teams. This approach interprets methodology-use citations as a proxy for resource transfer from the
cited institutions to the citing institutions, as the increasing prevalence of methodology-use citations
has been linked to the growing availability of reusable technologies and evaluations [
        <xref ref-type="bibr" rid="ref11">13, 11</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Data and Methods</title>
      <p>In this work, we synthesized data from various sources and trained two prediction models to generate
variables for downstream analyses. To study research topic shift, we retrieved titles and abstracts of
ACL Anthology papers published after 2010 from the ACL-OCL corpus [ 14]. We selected a smaller
time window since AI research intensified in the past 10-15 years. Each paper’s author afiliation
metadata was retrieved from OpenAlex, an openly accessible database containing metadata on scientific
research publications [15]. To estimate each institution’s resource level, we generated a proxy variable
predicted by a machine learning model trained on research expenditure data and bibliometric features.
Citation context data were obtained through Semantic Scholar’s S2AG API [16]. To analyze patterns
of methodological difusion, we fine-tuned a citation intent classifier to identify method-use citations,
which are instances where the citing paper adopts tools, models, or methods from the cited work. The
following subsections describe the above tasks respectively.</p>
      <sec id="sec-2-1">
        <title>2.1. Modeling Research Topic Shift</title>
        <p>The ACL-OCL corpus includes the full text of 73k papers from the ACL Anthology up to September
2022. We selected the papers published since 2010, since AI research intensified in the past 10-15 years.</p>
        <p>We used paper titles and abstracts as input for topic modeling. We first embedded each paper’s title
and abstract using the SPECTER 2 language model [17]. The resulting embeddings were then reduced in
dimensionality using UMAP. HDBSCAN was then applied to further clustered the dimension-reduced
embeddings into topic clusters.</p>
        <p>Using topic coherence [18] as the evaluation metric, we compared BERTopic [19] under various
parameter settings. The highest coherence score was achieved by a BERTopic model configured with
275 neighbors, 125 UMAP components, and a minimum cluster size of 275 for HDBSCAN.</p>
        <p>We further validated the model by manually reviewing sample articles and representative keywords
from each topic cluster, comparing them with the ACL submission topics. Using this comparison, we
were able to assign each topic cluster a label based on keywords from the ACL topic list (see Table 3
in the appendix). Finally, each paper was assigned a topic label based on the highest topic probability
generated by the topic model.</p>
        <p>We then calculated the topic shift ratio to measure the annual change in a research topic’s popularity,
i.e. whether it gained or lost attention, using the following equation:</p>
        <p>Topic Shift Ratio = ( P(aPpaepreirs iasssaisgsnigendetdottooptoicpi∣c P∣aPpaepreprupbulibslhisehdebdeifnoryeeyaer a)r )
(1)</p>
        <p>If the topic shift ratio for Topic X is greater than 1 in Year Y, it means that Topic X became more
prevalent in Year Y, compared to its prevalence in the years before.</p>
        <p>A slightly modified equation was designed to compare the popularity of a topic before and after a
cutof year, e.g. 2016:</p>
        <p>Topic Shift Ratio = ( P(aPpaepreisr iasssaisgsnigendetdottooptoicpi∣c P∣aPpaepreprupbulibslhisehdebdeafofryeeyare a)r )
(2)</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Estimating Institutional AI Resources</title>
        <p>We trained a regression model to estimate an institution’s AI resource. The training data comes from
the 2023 Higher Education Research and Development (HERD) Survey. The HERD Survey is an annual
census of 501 U.S. colleges and universities that expended at least $150,000. The survey includes data
for various research areas. We used expenditure in the area of information and computer science as a
proxy measure for an institution’s AI research resource.</p>
        <p>Bibliometric features have long served as tools in the science of science [20] and the scientometric
community [21]. Employing these features allows for investigations of the characteristics and dynamics
inherent in scientific activities and entities. We extracted author afiliations for each ACL paper from
OpenAlex, an openly accessible database containing metadata on scientific research publications. For
each institution, we aggregated 15 bibliometric features, including (1) basic counts, i.e. the number of
publications, citations, co-institutions, and researchers for each institution; (2) researcher seniority for
each institution including the mean, median, min, and max researcher h-index; (3) outbound citation
targets, such as the number of unique authors, institutions, and publishing venues (such as journals,
conference proceedings); (4) outbound citations aggregated at diferent research entity level, such as
author-, institution-, and publishing venue-level citations. See Table 5 in the appendix for a full list of
features and their definitions.</p>
        <p>Using the bibliometric features and expenditure data as training data, we trained and cross-validated
linear regression and random forest regression models with diferent hyper-parameters. The model
with the best performance is a random forest regression model with the maximum depth set to 20,
minimal samples split set to 2. The model achieves 0.407 R-squared on the testing dataset and 0.907
R-squared on the training dataset.</p>
        <p>Using the best prediction model and institution-level bibliometric data for all institutions that
ACLOCL authors are afiliated with, we predicted a pseudo-expenditure value for each afiliation as a proxy
for the amount of their AI resource. The feature importance of the model is shown in Table 4 in the
appendix. We found that the features “Number of citations” and “Number of publications” are among
the most important features. It makes sense since research spending should be positively correlated
with the size and capacity of institutions.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Identifying Methodology-use Citations</title>
        <p>For each citation to ACL-OCL papers, we retrieved the citation context, or the citation sentence, from
the S2AG database. We applied the method proposed by [22] because it doesn’t require external data
such as author and afiliation information to achieve performance comparable to the state-of-the-art.
We fine-tuned a SciBERT model using SciCite and ACL-ARC data as a multi-task learning task.</p>
        <p>The ACL-ARC dataset [13] provides annotations for citation intents with six classes, including
Extends, Future, Motivation, Compares, Uses, and Background, for 1,969 citation sentences from 10
ACL Anthology articles.</p>
        <p>The SciCite data set includes 11,020 citation sentences from computer science and medicine articles
sampled from the Semantic Scholar corpus [23]. The SciCite data schema was simplified based on
ACL-ARC, after removing citation intent categories that are rare or not useful for meta-analysis of
scientific publications. SciCite includes three categories: background information, use of methods, and
comparing result. Here we refer to them briefly as background, methodology, and result citations.</p>
        <p>Using SciCite as the main training set and ACL-ARC as the auxiliary training set, the resulting model
has achieved a macro 0.86 F1 score on the SciCite dataset, with balanced precision and recall values on
all categories. This result is comparable to [24]. See Table 1 for category-level performance.</p>
        <p>Using the fine-tuned citation intent classification model, we predicted citation intent for each citation
context retrieved from S2AG.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Result</title>
      <sec id="sec-3-1">
        <title>3.1. Research Topic Shifts in NLP</title>
        <sec id="sec-3-1-1">
          <title>3.1.1. Topic Diferences Between Low-resource and High-resource Teams</title>
          <p>Next, we compared the distribution of research topics between high-resource and low-resource teams.
Using our regression model to estimate each institution’s AI resource level, we classified the top 10% of
institutions as high-resource, and the remaining 90% as low-resource. We define a research team as
the group of authors on a single paper. The team’s resource level is determined by the highest-ranked
institution among the authors’ afiliations. We then assigned each paper a topic shift ratio, based on its
publication year and its assigned research topic (see Figure1).</p>
          <p>We found that papers from high-resource teams were associated with significantly higher topic
shift ratios than those from low-resource teams, indicating that high-resource teams are more likely to
publish on trending topics. This diference is statistically significant, as shown by a Mann–Whitney U
test (U = 230,316,303.0, p &lt; 0.001).</p>
          <p>We then compared the topic distributions between low-resource and high-resource research teams by
counting the number of papers published in each topic and applying chi-squared tests to assess statistical
diferences. Figure 3 shows the residuals from these tests for papers published in 2016 ( 2 = 70.913,
 &lt; 0.0001 ), 2018 ( 2 = 55.188,  &lt; 0.0001 ), and 2020 ( 2 = 58.455,  &lt; 0.0001 ). The topics are ordered
by their overall trend, with those declining in popularity near the top and trending topics near the
bottom (based on topic shift ratios calculated in 2016 and held consistent across all panels).</p>
          <p>Positive residuals (in blue color) indicate over-representation by high-resource teams, and negative
residuals (in red color) indicate over-representation by low-resource teams. The pattern across all three
years suggests a persistent topic divide: low-resource teams are more concentrated in declining topics,
while high-resource teams increasingly dominate emerging and computationally intensive areas.</p>
          <p>A. 2016</p>
          <p>B. 2018</p>
          <p>C. 2020</p>
          <p>Declining
Considering methodology-use citation as an indicator of “resource transfer” from one institution to
another, we analyzed the intent of citations to the ACL Anthology papers. Overall, about one half
of citations are background information, about one third on methdology use, and the remaining on
result discussion. Figure 4A presents the proportions of these citation intent types, normalized by
the number of citations per year, from 2010 to 2022, illustrating a trend that background citations are
increasing (Mann Kendall test:  = 0.923,  &lt; 0.0001 ), while methodology-use citations (Mann Kendall
test:  = −0.718,  &lt; 0.001 ) and result citations (Mann Kendall test:  = −0.923,  &lt; 0.001 ) have been
declining.</p>
          <p>As NLP literature expands, it is not surprising to see researchers citing more prior work as background
information. However, the decrease in methodology-use citations needs further examination to see
whether it indicates a decline in resource transfer, since the growing resource gap may prevent
lowresource teams from adopting certain methods due to limitations in computing power, data access,
and funding. If this is true, we should expect the increase in background citations and the decrease in
methodology-use citations to be more pronounced in citations from low-resource teams to high-resource
teams, especially among trending topics, since high-resource teams are more likely to work on trending
topics.</p>
          <p>Background Methodology
ttn A. All Citations
Ine0.50
n
o
iittt
a
C
n
ree0.35
ifff
D
o
n
o
itrpo0.20
roP 2010
2016
Year</p>
          <p>Result</p>
          <p>t B. Background Citations tn C. Methodology Citations
2022 iiiifftfttttIrrreoennpoooanenonPDC000...445824 2010 2022 iiiifftftttItrrreenonoaopoonnePDC000...323063 2010
2016
Year
2016</p>
          <p>Year
Declining</p>
          <p>To compare methodology-use citations in trending and declining topics, we selected papers from
the top five most trending topics ( Language Model, Computational Social Science and Cultural Analytics,
Generation, Question Answering, Multimodality and Language Grounding to Vision, Robotics and Beyond),
and the top five most declining topics ( Grammar Correction, Resources and Evaluation, Syntax: Tagging,
Chunking and Parsing / ML, Machine Translation, Speech recognition).</p>
          <p>Figure 4B shows that background citations have increased over time for both trending and declining
topics. This trend is supported by the Mann-Kendall test results: for trending topics,  = 0.769 ,  &lt; 0.001 ,
with a slope of 0.0046 and intercept of 0.50; for declining topics,  = 0.718 ,  &lt; 0.001 , with a slope of
0.0061 and intercept of 0.43. While both trends are significant, the higher intercept for trending topics
suggests that they generally require more background citations than declining topics. This is consistent
with expectations: researchers working in fast-evolving areas may need to cite a broader base of prior
work to contextualize and support their arguments.</p>
          <p>Figure 4C shows the proportion of methodology-use citations over time for both trending and
declining topics. For trending topics, the Mann-Kendall test reveals a decreasing trend in Methodology
citation proportion ( = −0.0022,   = 0.2911 ), whereas declining topics show a more stable
and higher baseline level ( = −0.0001,   = 0.3150 ). This suggests that in fast-moving areas,
researchers are less likely to cite existing models, datasets, or tools.
We further examined whether low-resource teams experienced a more pronounced decline in
methodology citations to work produced by high-resource teams.</p>
          <p>Figure 5 clearly shows a declining trend of methodology-use citations from low-resource teams
to high-resource teams, with the downward trend accelerating after 2016 ((Mann Kendall test:  =
−0.846,  &lt; 0.0001,  = −0.0058,   = 0.3310 ). In comparison, the overall trend across all
ACL papers shows a much more gradual decline (Mann Kendall test:  = −0.718,  &lt; 0.001,  =
−0.0014,   = 0.3082 ).</p>
          <p>The strong and accelerating decrease in methodology-use citations from low-resource teams to
high-resource teams suggests that it is increasingly more challenging for low-resource teams to engage
with or build upon the methodologies developed by high-resource teams.</p>
          <p>Proportion of Methodology Citations
0.33
0.30
0.27
2010 2Y0e1a6r
Low-Resource to High-Resource</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.2.2. Citations from Low-resource Teams to High-resource Teams between Trending Topics and Declining Topics</title>
          <p>Combining the resource and topic factors, we further compared methodology-use citations from
lowresource teams to high-resource teams among trending and declining topics.</p>
          <p>Figure 6C shows the same trend that methodology-use citations have been declining for both trending
topics (Mann Kendall test:  = −0.600,  &lt; 0.05,  = −0.0067,   = 0.2983 ) and declining
topics (Mann Kendall test:  = −0.527,  &lt; 0.05,  = −0.0047,   = 0.3448 ). Additionally,
methodology-use citations are consistently less prevalent in trending topics (average proportion for
trending topics: 0.27; declining topics: 0.32). Both patterns support the interpretation that low-resource
Background</p>
          <p>Methodology</p>
          <p>Result
A. Trending Topics</p>
          <p>B Declining Topics</p>
          <p>C. Methodology Citations
0.50
0.35
0.20
2012
2016Year
2022
2012</p>
          <p>2016Year
0.50
0.35
0.20
0.36
0.30
teams face greater challenges in adopting methods from high-resource teams, particularly in trending
research areas.</p>
          <p>To further validate our findings, we conducted a linear regression analysis, using the proportion
of methodology citations received by each cited paper as the dependent variable. We aggregated
methodology-use citations at the paper level and included three key predictors: (1) the maximum
predicted research expenditure among the cited paper’s co-author afiliations as a proxy for institutional
resource level, (2) the publication year, and (3) the normalized topic popularity of the cited paper based
on the topic popularity ranking from Figure 1.</p>
          <p>As shown in Table 2, the results indicate that the proportion of methodology-use citations is
significantly and negatively associated with cited team’s resource level, the popularity of the cited paper’s
topic, and publication year. In additional models, we find that larger gaps in resource levels and topic
popularity between citing and cited teams are also significantly associated with fewer methodology
citations. These findings reinforce the interpretation that institutional and topical asymmetries may
increasingly constrain methodological reuse, particularly disadvantaging low-resource teams when
citing high-resource work in trending areas.</p>
          <p>
            The results combined provide more evidence that resource barriers limit the adoption of methods
proposed by high-resource teams, and such a phenomenon is more serious for publications related to
trending topics. According to [13], such a trend could indicate a decrease in reusable technologies such
as models and datasets, and evaluations of tools. Such a decrease could be related to the increasing
resource gap in the field of AI. We also visualized the proportion of non-methodology citations. Latour
[
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] suggests that non-methodological citations are important for defending proposed ideas. We can
interpret from the increase in non-methodology citations that there is an increased need to defend newly
proposed ideas, and increasingly less consensus in the ACL anthology community. Such interpretation
makes sense as AI is fast-growing, and new ideas need to be introduced to an increasingly more
interdisciplinary community.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this work, by analyzing the research topics and citation intent, we investigate the disparities between
low-resource and high-resource institutions in the natural language processing research community.
Our findings indicate that high-resource teams have been focusing on research topics that are gaining
popularity, whereas low-resource teams have been more likely to work on topics that are becoming
less prominent. This suggests that access to resources such as computational power and large datasets
plays a significant role in determining what research topic a team can study. Furthermore, our result
reveals that research produced by high-resource teams is becoming increasingly dificult for other
researchers to build upon. Such results suggest a growing divide in AI research, where advancements
driven by high-resource industry corporations and universities may inadvertently limit the accessibility
of cutting-edge research to those with fewer resources. These findings indicate the need for more
inclusive research practices and collaborative eforts to ensure that AI innovation remains accessible to
a broader research community. Future work should explore potential strategies for bridging this gap.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Declaration on Generative AI</title>
      <p>During the preparation of this work, ChatGPT was used for editing to improve the style and tone of
writing.
[12] K. Nishikawa, How and why are citations between disciplines made? A citation context analysis
focusing on natural sciences and social sciences and humanities, Scientometrics 128 (2023) 2975–2997.</p>
      <p>URL: https://link.springer.com/10.1007/s11192-023-04664-y. doi:10.1007/s11192- 023- 04664- y.
[13] D. Jurgens, S. Kumar, R. Hoover, D. McFarland, D. Jurafsky, Measuring the Evolution of a Scientific
Field through Citation Frames, Transactions of the Association for Computational Linguistics 6
(2018) 391–406. URL: https://direct.mit.edu/tacl/article/43437. doi:10.1162/tacl_a_00028.
[14] S. Rohatgi, Y. Qin, B. Aw, N. Unnithan, M.-Y. Kan, The ACL OCL corpus: Advancing open science
in computational linguistics, in: H. Bouamor, J. Pino, K. Bali (Eds.), EMNLP, acl, Singapore, 2023.
[15] J. Priem, H. Piwowar, R. Orr, Openalex: A fully-open index of scholarly works, authors, venues,
institutions, and concepts, arXiv preprint arXiv:2205.01833 (2022).
[16] A. D. Wade, The semantic scholar academic graph (s2ag), in: Companion Proceedings of the Web</p>
      <p>Conference 2022, 2022, pp. 739–739.
[17] A. Cohan, S. Feldman, I. Beltagy, D. Downey, D. Weld, SPECTER: Document-level representation
learning using citation-informed transformers, Online, 2020.
[18] M. Röder, A. Both, A. Hinneburg, Exploring the Space of Topic Coherence Measures, in:
Proceedings of the Eighth ACM International Conference on Web Search and Data Mining,
WSDM ’15, Association for Computing Machinery, New York, NY, USA, 2015, pp. 399–408. URL:
https://doi.org/10.1145/2684822.2685324. doi:10.1145/2684822.2685324.
[19] M. Grootendorst, Bertopic: Neural topic modeling with a class-based tf-idf procedure, arXiv
preprint arXiv:2203.05794 (2022).
[20] S. Fortunato, C. T. Bergstrom, K. Börner, J. A. Evans, D. Helbing, S. Milojević, A. M. Petersen,
F. Radicchi, R. Sinatra, B. Uzzi, A. Vespignani, L. Waltman, D. Wang, A.-L. Barabási, Science of
science, Science 359 (2018) eaao0185. URL: https://www.science.org/doi/10.1126/science.aao0185.
doi:10.1126/science.aao0185.
[21] L. Leydesdorf, S. Milojević, Scientometrics, arXiv preprint arXiv:1208.4566 (2012).
[22] Z. Shui, P. Karypis, D. S. Karls, M. Wen, S. Manchanda, E. B. Tadmor, G. Karypis, Fine-Tuning
Language Models on Multiple Datasets for Citation Intention Classification, 2024. URL: http:
//arxiv.org/abs/2410.13332. doi:10.48550/arXiv.2410.13332, arXiv:2410.13332 [cs].
[23] A. Cohan, W. Ammar, M. van Zuylen, F. Cady, Structural Scafolds for Citation Intent Classification
in Scientific Publications, 2019. URL: http://arxiv.org/abs/1904.01608.
[24] L. Paolini, S. Vahdati, A. Di Iorio, R. Wardenga, I. Heibi, S. Peroni, Why do you cite? an investigation
on citation intents and decision-making classification processes, arXiv preprint arXiv:2407.13329
(2024).
[25] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, u. Kaiser, I. Polosukhin,
Attention is All you Need, in: Advances in Neural Information Processing Systems, volume 30,
Curran Associates, Inc., 2017. URL: https://proceedings.neurips.cc/paper_files/paper/2017/hash/
3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
[26] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al., Improving language understanding by
generative pre-training (2018).
[27] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding, in: J. Burstein, C. Doran, T. Solorio (Eds.), Proceedings of the
2019 Conference of the North American Chapter of the Association for Computational Linguistics:
Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational
Linguistics, Minneapolis, Minnesota, 2019, pp. 4171–4186. URL: https://aclanthology.org/N19-1423/.
doi:10.18653/v1/N19- 1423.
[28] W. Ma, S. Liu, W. Wang, Q. Hu, Y. Liu, C. Zhang, L. Nie, Y. Liu,
ChatGPT: Understanding Code Syntax and Semantics, ???? URL: https://www.
semanticscholar.org/paper/ChatGPT%3A-Understanding-Code-Syntax-and-Semantics-Ma-Liu/
a7088c0dc34115ce38e6a37feba3c03497708047.
[29] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal,
E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, G. Lample, LLaMA: Open and Eficient
Foundation Language Models, 2023. URL: http://arxiv.org/abs/2302.13971. doi:10.48550/arXiv.
2302.13971, arXiv:2302.13971 [cs].
[30] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis,
W.t. Yih, T. Rocktäschel, S. Riedel, D. Kiela, Retrieval-Augmented Generation for
KnowledgeIntensive NLP Tasks, 2021. URL: http://arxiv.org/abs/2005.11401. doi:10.48550/arXiv.2005.
11401, arXiv:2005.11401 [cs].
[31] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, I. Sutskever, Zero-Shot
Text-toImage Generation, 2021. URL: http://arxiv.org/abs/2102.12092. doi:10.48550/arXiv.2102.12092,
arXiv:2102.12092 [cs].</p>
    </sec>
    <sec id="sec-6">
      <title>A. Appendix</title>
      <p>Number of citations
Number of publications
Number of venue-level references
Number of venue-level references
Number of works referenced
Average researcher h-index
Number of co-institutions
Number of institutions cited
Number of institution-level references
Maximum researcher h-index
Number of researchers
Number of institution-level references
Median researcher h-index
Number of authors cited
Minimum researcher h-index</p>
      <sec id="sec-6-1">
        <title>Name</title>
        <p>Number of publications
Number of citations
Number of co-institutions
Number of researchers
Average researcher h-index
Maximum researcher h-index
Minimum researcher h-index
Median researcher h-index
Number of works cited
Number of author-level citations
Number of authors cited
Number of venue-level citations
Number of venues cited
Number of institution-level citations
Number of institutions cited</p>
      </sec>
      <sec id="sec-6-2">
        <title>Description</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Littman</surname>
          </string-name>
          , I. Ajunwa, G. Berger,
          <string-name>
            <given-names>C.</given-names>
            <surname>Boutilier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Currie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Doshi-Velez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Hadfield</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Horowitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Isbell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kitano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lyons</surname>
          </string-name>
          , M. Mitchell,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sloman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vallor</surname>
          </string-name>
          , T. Walsh, Gathering Strength,
          <source>Gathering Storms: The One Hundred Year Study on Artificial Intelligence (AI100) 2021 Study Panel Report</source>
          (
          <year>2022</year>
          ). URL: https://arxiv.org/abs/2210.15767. doi:
          <volume>10</volume>
          .48550/ ARXIV.2210.15767, publisher: arXiv Version Number:
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <surname>Google Gemini Eats The World - Gemini Smashes</surname>
          </string-name>
          GPT-4
          <article-title>By 5X</article-title>
          ,
          <string-name>
            <surname>The</surname>
            <given-names>GPU</given-names>
          </string-name>
          -Poors,
          <year>2023</year>
          . URL: https://www.semianalysis.com/p/google
          <article-title>-gemini-eats-the-world-gemini.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Movva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Balachandar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Peng</surname>
          </string-name>
          , G. Agostini,
          <string-name>
            <given-names>N.</given-names>
            <surname>Garg</surname>
          </string-name>
          , E. Pierson, Topics, Authors, and Networks in
          <source>Large Language Model Research: Trends from a Survey of 17K arXiv Papers</source>
          (
          <year>2023</year>
          ). URL: https: //arxiv.org/abs/2307.10700. doi:
          <volume>10</volume>
          .48550/ARXIV.2307.10700, publisher: arXiv Version Number:
          <fpage>2</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wahed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. C.</given-names>
            <surname>Thompson</surname>
          </string-name>
          ,
          <article-title>The growing influence of industry in AI research</article-title>
          ,
          <source>Science</source>
          <volume>379</volume>
          (
          <year>2023</year>
          )
          <fpage>884</fpage>
          -
          <lpage>886</lpage>
          . URL: https://www.science.org/doi/10.1126/science.ade2420. doi:
          <volume>10</volume>
          . 1126/science.ade2420, publisher
          <article-title>: American Association for the Advancement of Science</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>O.</given-names>
            <surname>Ignat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abzaliev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Biester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Castro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Gunal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kazemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Khalifa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Koh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Nwatu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Perez-Rosas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          , Has It All Been Solved?
          <article-title>Open NLP Research Questions Not Solved by Large Language Models</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            , M.-
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Kan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Hoste</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lenci</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Sakti</surname>
          </string-name>
          , N. Xue (Eds.),
          <source>Proceedings of the 2024 Joint International Conference on Computational Linguistics</source>
          ,
          <article-title>Language Resources and Evaluation (LREC-COLING 2024), ELRA</article-title>
          and
          <string-name>
            <given-names>ICCL</given-names>
            ,
            <surname>Torino</surname>
          </string-name>
          , Italia,
          <year>2024</year>
          , pp.
          <fpage>8050</fpage>
          -
          <lpage>8094</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .lrec-main.
          <volume>708</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Togelius</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. N.</given-names>
            <surname>Yannakakis</surname>
          </string-name>
          , Choose Your Weapon:
          <article-title>Survival Strategies for Depressed AI Academics (</article-title>
          <year>2023</year>
          ). URL: https://arxiv.org/abs/2304.06035. doi:
          <volume>10</volume>
          .48550/ARXIV.2304.06035, publisher: arXiv Version Number:
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gururaja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bertsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Na</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Widder</surname>
          </string-name>
          , E. Strubell,
          <article-title>To build our future, we must know our past: Contextualizing paradigm shifts in natural language processing</article-title>
          , in: H.
          <string-name>
            <surname>Bouamor</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pino</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          Bali (Eds.), EMNLP, acl, Singapore,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B.</given-names>
            <surname>Latour</surname>
          </string-name>
          ,
          <article-title>Science in action: how to follow scientists and engineers through society</article-title>
          , Harvard University Press, Cambridge, Mass,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Evaluating scientific impact of publications: combining citation polarity and purpose</article-title>
          ,
          <source>Scientometrics</source>
          <volume>127</volume>
          (
          <year>2022</year>
          )
          <fpage>5257</fpage>
          -
          <lpage>5281</lpage>
          . URL: https://doi.org/10.1007/ s11192-021-04183-8. doi:
          <volume>10</volume>
          .1007/s11192- 021- 04183- 8.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          , J. Liu,
          <article-title>Extracting the evolutionary backbone of scientific domains: The semantic main path network analysis approach based on citation context analysis</article-title>
          ,
          <source>Journal of the Association for Information Science and Technology</source>
          <volume>74</volume>
          (
          <year>2023</year>
          )
          <fpage>546</fpage>
          -
          <lpage>569</lpage>
          . URL: https://asistdl.onlinelibrary.wiley. com/doi/10.1002/asi.24748. doi:
          <volume>10</volume>
          .1002/asi.24748.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Jones</surname>
          </string-name>
          , Natural Language Processing:
          <article-title>A Historical Review</article-title>
          , in: Current Issues in Computational Linguistics: In Honour of Don Walker, Springer, Dordrecht,
          <year>1994</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>16</lpage>
          . URL: https://link. springer.com/chapter/10.1007/978-0-
          <fpage>585</fpage>
          -35958-
          <issue>8</issue>
          _1. doi:
          <volume>10</volume>
          .1007/978- 0-
          <fpage>585</fpage>
          - 35958-
          <issue>8</issue>
          _
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>