<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>April</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Automated Identification of Emerging Technologies: Open Data Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ljiljana Dolamic</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julian Jang-Jaccard</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alain Mermoud</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vincent Lenders</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cyber-Defence Campus, armasuisse Science and Technology</institution>
          ,
          <addr-line>Thun</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>23</volume>
      <issue>24</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Identifying emerging technologies and forecasting their trends is pivotal for stakeholders and decision-makers across academia, industry, and government agencies. The current strategies employed to track technology trends often rely on proprietary closed datasets and often rely on the insights of human domain experts. Not only are these approaches expensive and manual, but they are also time-consuming. In this study, we introduce an automated method for identifying emerging trends through a quantitative approach that utilizes extensive publicly available data, including patents, publications, and Wikipedia Pageview statistics. Our method proposes four criteria - novelty, growth, impact, and coherence - to automatically score technologies, based on a mathematical foundation. This approach enables the monitoring of tech trends across various sectors in an automated manner, without the need for domain experts. The results obtained through rigorous evaluation, benchmarked against similar reports from leading market research firms, illustrate a low recall rate paired with high precision, afirming the reliability of our proposed method. Furthermore, our method identifies emerging technologies not present in similar market reports, highlighting its unique capabilities.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;technology monitoring</kwd>
        <kwd>emerging technologies</kwd>
        <kwd>attributes of emergence</kwd>
        <kwd>scientometrics</kwd>
        <kwd>open source data</kwd>
        <kwd>machine learning</kwd>
        <kwd>informetrics</kwd>
        <kwd>natural language processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Understanding emerging technologies is crucial for
various entities, including industry, academia, and government
agencies. It can shape strategic decisions, improve
competitive positions, and create opportunities for technology
strategies. Owing to these considerations, there is a
substantial need for identifying emerging technologies, prompting
widespread media coverage on the topic and leading market
research firms like Gartner and Forrester to ofer services
promising deeper insights.</p>
      <p>
        Despite the common and widespread use of the term
’emerging technologies,’ there is no single standard
agreement on what constitutes the term. This lack of a clear
definition makes it challenging to develop a scientifically
sound methodology to identify emerging technologies.
Gartner’s renowned Hype Cycle for Emerging Technologies,
while intuitive, cannot serve as an underlying model and
has faced criticism in the literature for being considered
unscientific, inconsistent, generic, and subjective [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Other
market research firms, such as Forrester and IHS Markit,
also produce annual reports on emerging technologies, yet
the methodology for identifying these technologies remains
unclear.
      </p>
      <p>
        Research in the area of identifying emerging
technologies primarily relies on qualitative methods, expert systems,
and survey-based approaches. For quantitative methods,
researchers have utilized open datasets and S-curve models to
identify technology emergence [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5">2, 3, 4, 5</xref>
        ]. S-Curve models,
based on logistic or Gompertz growth concepts, provide a
solid mathematical foundation. However, most studies
focus on specific predetermined sets of technologies, making
it challenging to devise a general method for identifying
emerging technologies [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>In this paper, we introduce a novel approach for
identifying emerging technologies based on their coverage in
publicly available data sources, including patents,
publications, and Wikipedia Pageview statistics. Unlike previous
studies, we have not preselected any specific set of
technologies. Our method is transparent, does not require expert
input, and gives reproducible results for any technology.</p>
      <p>The remainder of this paper is organized as follows:
Section 2 provides a survey of existing research. In Section 3,
we ofer a description of the data used. Section 4 outlines
the proposed methodology. We present the evaluation
results in Section 5. The limitation of our proposed method
is discussed in Section 6. Finally, Section 7 concludes the
paper with future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Definitions for the term ’emerging technologies’ in the
literature often overlap but are based on distinct characteristics.
For example, some authors (e.g., [
        <xref ref-type="bibr" rid="ref10 ref11 ref7 ref8 ref9">7, 8, 9, 10, 11</xref>
        ]) emphasize
the potential impact of the technology on the economy or
society, covering both evolutionary change and disruptive
innovations. Others, like Boon [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], prioritize uncertainty
about a technology’s future evolution. Some researchers
combine both potential and uncertainty aspects [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ],
while others underline novelty and growth [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        The myriad of characteristics chosen to define
emerging technologies has given rise to diverse scientometric
approaches for measurement [
        <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
        ], lacking a
standardized definition of the underlying concept of emergence. A
comprehensive analysis by Rotolo, Hicks, and Martin [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]
explores existing research on the definition of emerging
technologies, aggregating comparable approaches. They
identify five main characteristics—radical novelty,
relatively fast growth, coherence, prominent impact, and
uncertainty—commonly appearing across the studied research.
We adopt this definition as a foundational framework for
our study.
      </p>
      <p>
        Predicting emerging technologies often relies on
publicly available datasets, commonly leveraging patents such
as those from the United States Patent and Trademark
Ofifce (USPTO), Global Patent Index (GPI), and Thompson
Innovation. Numerous publications advocate for the use
of bibliometric methods to extract data and identify
emerging technologies, followed by deploying growth models for
prediction. In the work of Daim et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], bibliometric
methods, US patent analysis, and S-curves were employed
for forecasting technologies such as fuel cells, food safety,
and optical storage. Similarly, Ranaei et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] used
expert interviews to fit data acquired by text-mining patents
into growth curve models for predicting hybrid cars and
fuel cells. Text-mining on patents and fitting to S-curves
were also proposed in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], and Bengisu et al. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] found
correlations between patent and publication data extracted
by scientometric methods for 20 technologies, deploying
S-curves for forecasting. S-Curve models for predicting
emerging technologies were also proposed by [
        <xref ref-type="bibr" rid="ref2 ref22">2, 22</xref>
        ].
      </p>
      <p>
        In recent times, artificial intelligence has regained
significant attention, leading to the use of machine learning to
model and predict emerging technologies. Kyebambe and
Hwang [
        <xref ref-type="bibr" rid="ref23 ref24">23, 24</xref>
        ] employed supervised learning on citation
graphs from USPTO data to automatically label and forecast
emerging technologies. Similarly, Zhou [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] applied
supervised deep learning on worldwide patent data, with training
sets labeled based on Gartner’s Hype Cycle.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Data</title>
      <p>We primarily use three diferent datasets: patent data from
USPTO, publication data from arXiv, and statistical data
from Wikipedia Pageviews.
patents granted by the USPTO since 2013. We utilize a subset
of around 6.6 million patent records for our study.</p>
      <p>Publications from arXiv2: We employ arXiv as a
primary publication source, taking advantage on its free
distribution model for open-access scholarly articles. The
repository hosts over 2.4 million publications spanning computer
science and diverse scientific disciplines since 1993. Figure
2 displays the number of submissions to arXiv since August
1991. Our study focuses on a subset of approximately 1.4
million arXiv publications.</p>
      <p>Wikipedia Pageview Statistics 3: In addition, we
incorporate Wikipedia Pageview statistics which indicates the
number of visitors to a Wikipedia article within a specified
time frame. This ofers insight into real-time public
interest and engagement, serving as a dynamic and accessible
indicator of emerging trends and technologies. Figure 3
illustrates an example of a monthly pageview statistics for
the keyword ’deep learning’.</p>
      <p>Leveraging the Wikipedia API, we retrieved the monthly
views for 50,954 articles relevant to the technology.</p>
      <p>Patents from PatentsView1: Patent information
provides valuable insights into the latest innovations, trends,
and competitive landscapes within various industries. We
utilize PatentsView to acquire patent information from the
USPTO for granted patents since 1976. As of December 5,
2023, there are over 8 million records of granted patents
available for free download for further analysis. Figure 1
provides a glimpse of the top 200 locations worldwide for</p>
      <sec id="sec-3-1">
        <title>1https://patentsview.org/</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>In this section, we outline our methodology, and Figure 4
ofers a comprehensive overview of the entire process.</p>
      <p>The proposed method is initiated by classifying each
Wikipedia article as either technology-related or not,
em</p>
      <sec id="sec-4-1">
        <title>2https://arxiv.org/ 3https://en.wikipedia.org/wiki/Wikipedia:Pageview_statistics</title>
        <p>ploying a binary classification approach termed as
technology classification.</p>
        <p>Once this classification is established, we extract abstracts
from USPTO and scholarly arXiv publications. These
abstracts undergo annotation using the DBPedia tool 4,
aligning the text with Wikipedia articles. This annotation process
aims to link the abstract content to relevant Wikipedia
entries. To reduce noise, we eliminate annotations occurring
fewer than 5 times and those not aligned with the
technology classification.</p>
        <p>The resulting filtered annotations, all within the
technology classification, serve as the basis for constructing time
series. The count of mentions for each technology  ∈  per
year is summed across each data source  ∈ , reflecting
the increasing occurrences of patents and publications over
time. Mathematically, this can be represented as:
Total Count() = ∑︁ count(, )</p>
        <p>∈
where count(, ) is the count of mentions for technology
 in data source . We then compute relative counts in
relation to the total number of technology mentions per
year, represented as:
Relative Count() =</p>
        <p>Total Count()</p>
        <p>Total Technology Mentions per Year</p>
        <p>
          Furthermore, monthly Wikipedia Pageviews are obtained
for all technologies and transformed into time series. These
time series, along with Wikipedia categories, contribute to
the computation of four scores—Novelty, Growth, Impact,
and Coherence—each derived from the definitions provided
by [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. Finally, we aggregate and normalize these four
scores to generate an emergence score for each technology.
        </p>
        <sec id="sec-4-1-1">
          <title>4.1. Technology Classification</title>
          <p>The output of annotated abstracts from patents and
publications contains noise, as each annotation refers to a
Wikipedia article, not necessarily related to technology.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4https://www.dbpedia.org/</title>
        <p>To address this issue, we devised a two-step
methodology named ’technology classification,’ which involves
the process of selecting relevant technology articles from
Wikipedia.</p>
        <p>Step 1: Cleaning and Selecting Relevant Categories
Each Wikipedia article is linked to categories, forming a
complex graph with parent-child relationships. The edges
between categories are loosely defined as "is related to,"
often connecting diferent Wikipedia articles from
nontechnology areas. This correlation appears to limit the
reliability of extracting only technology articles using these
graph-based relationships.</p>
        <p>To address this, we first clean up the directed categories
graph by removing hidden categories, admin and user pages.
Furthermore, we apply regular expression filters to eliminate
categories not related to technologies, such as companies,
people names, brands, currencies, and countries.</p>
        <p>Additionally, we utilize Wikipedia’s Main Topic
Classifications (MTC), encompassing categories like Technology,
Business, Arts, Health, etc. Subsequently, we calculate the
shortest path for each category in the filtered graph
corresponding to 28 MTC to retain the articles with the smallest
distance to Technology, Science, or Engineering concepts.
This resulted in 7,876 technology classification candidates,
still containing some categories that may not belong to
technology. By having a human domain expert manually go
through the 7,876 technology classification candidates, we
ultimately create a list of 1,356 technology categories.</p>
        <p>Succinctly, this process can be written as the following
pseudocode in Algorithm 1.</p>
        <p>Step 2: Technology Classification using SVM
The overall process of machine learning-based training to
obtain the final technology classification is detailed in
Algorithm 2.</p>
        <p>To create an input dataset for the Support Vector
Machine (SVM), which serves as our classifier, we extract
abstracts from Wikipedia articles identified within the
technology categories established in Step 1. The abstracts from
all Wikipedia pages directly linked to a technology
category are concatenated, stemmed, and then subjected to
TFIDF-based weighting. This process generates a weighted
Algorithm 1 Cleaning and Selecting Relevant Categories
1: procedure CleanUpDirectedGraph
2: Remove hidden categories, admin and user pages
from the directed categories graph
3: Apply regular expression filters to eliminate
irrelevant categories (e.g., companies, people names, brands,
currencies, and countries)
4: end procedure
5: procedure UtilizeMainTopicClassifications
6: Use Main Topic Classifications (MTC) encompassing
categories like Technology, Business, Arts, Health, etc.
7: Calculate the shortest path for each category in the
ifltered graph to MTC
8: end procedure
9: procedure FilterByDistanceToMTC
10: Retain articles with the smallest distance to
Technology, Science, or Engineering concepts within MTC
11: end procedure
bag-of-words for each technology category. Subsequently,
feature reduction is applied to form usable feature vectors.
It is worth noting that optimal results were observed
using mutual information-based feature reduction, targeting
a vector length of 1000. Distances to each MTC topic are
appended to this vector, producing the final feature vectors
as input features.</p>
        <p>
          To address the imbalance in class distribution caused by
our small training set of 1,356 positive samples, we employ
oversampling techniques, using Borderline-SMOTE [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ], to
increase the size of the input samples. The list of
technologies identified through SVM training is considered the final
list pertaining to technology.
        </p>
        <p>This final list is subsequently used to filter annotations
from patents and publications.</p>
        <p>Algorithm 2 Technology Classification using SVM
1: procedure CreateDataset
2: Extract abstracts from Wikipedia articles in
identiifed technology categories
3: Concatenate and stem abstracts, apply
TF-IDFbased weighting
4: Perform feature reduction for usable feature vectors
5: Append distances to each MTC topic to create final
feature vectors
6: end procedure
7: procedure HandleClassImbalance
8: Employ Borderline-SMOTE for oversampling
9: end procedure
10: procedure FinalizeTechnologyList
11: Use SVM training outcome as the final list of
technologies
12: end procedure</p>
        <sec id="sec-4-2-1">
          <title>4.2. Emergence Score</title>
          <p>
            Novelty Score: Novelty in emerging technologies
signiifes their distinctive newness, pioneering concepts,
breakthrough advancements, and creative problem-solving,
distinguishing them from existing solutions and suggesting
transformative potential [
            <xref ref-type="bibr" rid="ref15 ref18">15, 18</xref>
            ].
          </p>
          <p>In our study, we define novelty for a technology based
on increased mentions in recent years. For instance, if a
particular technology has a significant portion of references
occurring in the last few years, it receives a high novelty
score. To implement this, we considered the time span of the
last 10 years and calculated the percentage of annotations
for each year. Linearly decreasing weights ranging from
10 to 1 were assigned, respectively, thereby giving higher
weight to more recent years. Technologies for which the
majority of annotations occurred more than 10 years ago
are considered not meeting the novelty criterion and are
consequently discarded.</p>
          <p>To express this more mathematically, we first define the
yearly time series , using Eq. 1:</p>
          <p>, = {,, :  ∈  }
• ,, is the number of times technology  is
referenced in dataset  during year .</p>
          <p>•  ∈  denotes the year within the specified range.</p>
          <p>Thus, the total number of occurrences of all technologies
 ∈  in a dataset  ∈  over a given year  is represented
mathematically as Eq. 2:</p>
          <p>Total(, ) = ∑︁ ,,
∈
(1)
(2)
where:
where:
• Total(t,d) denotes the total count of mentions or
occurrences of technology () in dataset ().
• ,, is the number of times technology  is
referenced in dataset  during year .
• ∑︀∈ signifies the summation over all years ()
within the specified range  .</p>
          <p>The novelty score Novelty() of a technology  ∈  is
then expressed mathematically as Eq. 3:</p>
          <p>∈ ∈
Novelty(t) = ∑︁ ∑︁ (︂ ,,</p>
          <p>Total(, ) × 100 × 
)︂
(3)
• Novelty(t) represent novelty score for technology
().
• ,, is the number of times technology  is
mentioned in dataset  during year .
• Total(t,d) represents the total occurrences of
technology () in dataset ().
•  is a weight assigned to each year based on Eq. 4.
• ∑︀∈ ∑︀∈ denotes double summation over all
datasets() and years ( ).</p>
          <p>The formula computes the weight for each year based
on its relative position within the given range. The weight
increases linearly with the year’s proximity to the earliest
year, providing a higher weight to more recent years, as Eq.
4:
min ′)
 = ( + 1 − ∀′∈
(4)</p>
          <p>
            Growth Score: Emerging technologies exhibit relatively
fast growth rates compared to non-emerging technologies
[
            <xref ref-type="bibr" rid="ref18">18</xref>
            ]. The growth rate of a technology, assessed through
growth curves in patents and publications, has been studied
extensively [
            <xref ref-type="bibr" rid="ref30 ref31 ref32">30, 31, 32</xref>
            ]. Using the concept of growth curves,
we employ a two-step approach to compute the growth
score of a technology.
          </p>
          <p>
            In Step 1, we apply regression techniques to fit the
number of yearly technology mentions to four diferent curve
models: Linear, Quadratic, Gaussian, and Exponential 5. We
select the model with the highest R-squared (2) measure
[
            <xref ref-type="bibr" rid="ref33">33</xref>
            ] and compute the slope of the curve based on the
regression coeficients. It is important to note that we assume the
positive or negative sign of the slope determines whether
the trend is increasing or decreasing. Subsequently, based
on the best-fitting model and the slope, we assign the
technology to one of the classes defined in Table 1 to compute
the model_score.
          </p>
          <p>In Step 2, the slope of the technology growth curve
Slope(, ) is calculated by taking the diference between
the absolute counts of the last and the first year and
dividing it by the total number of years, as depicted in Eq. 5.
This equation quantifies the rate of change in technology
mentions over time for a specific technology () within a
dataset ().</p>
          <p>(, ) =
5We utilize Apache Commons SimpleRegression and
OLSMultipleLinearRegression for the linear and quadratic models. The same regression
tools are used with the logarithm of the data points to derive the
exponential and Gaussian models, respectively.
• Slope(, ) denotes the scope of the growth curve
for technology () in dataset ().
• min(Slope( , )) represents the minimum slope
value among all technologies in dataset ().
• max(Slope( , )) represents the maximum slope
value among all technologies in dataset ().</p>
          <p>This normalization process facilitates comparative
analysis across diferent technologies and datasets.</p>
          <p>The technology’s final growth score is then computed by
integrating both the model score, which is determined based
on the best-fitting growth curve model, and the slope score,
reflecting the rate of change in the technology’s mentions
over time, using Eq. 7.</p>
          <p>Growth(t) = ∑︁ ( _(, )+ _(, ))</p>
          <p>Impact Score: Wikipedia Pageviews represent the
number of times a particular article has been accessed on the
Wikipedia website, providing insights into the level of
public interest and engagement with specific topics or content.
Utilizing this information, we leverage Wikipedia Pageview
statistics to compute the impact score of a technology. We
use a monthly views to gather more data points. After
extracting the monthly views, denoted as (), we apply a
3-month moving average filter to smooth the time series.
This filter calculates the average of each data point along
with the two preceding and two succeeding months,
efectively reducing noise and revealing underlying trends - see
Eq. 8.</p>
          <p>=
−2 + −1 +  + +1 + +2
5
(8)</p>
          <p>The smoothed data ( ) then replaces () in the
twostep approach used for the growth score. We classify the
trends into the same five classes (as seen in Table 1).
Impact(t) =  _(,  )+ _(,  )
(9)</p>
          <p>Eq. 9 represents the calculation of the impact score
Impact() for a technology (). It combines the model score
 _(,  ) and the normalized slope score
 _(,  ) obtained from the 3-month
moving average ( ) of Wikipedia Pageviews. This score
reflects both the growth pattern and the temporal trends in
Wikipedia Pageviews, providing a comprehensive
assessment of the technology’s impact.</p>
          <p>
            Coherence Score: In our study, we consider coherence
as the persistence of a technology over time, as referred to
by [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ]. When identifying emerging technologies, we
assume that the presence of a category on Wikipedia signifies
a thematic grouping that brings together related
technological concepts. The coherence within such categories is
established through shared characteristics, applications, and
underlying principles of the technologies they encompass.
This alignment allows for consistent trends to emerge within
the category over time, reflecting the collective evolution of
technologies. Wikipedia categorization serves as a valuable
indicator of how various technologies within a category
develop in tandem, providing insights into the overarching
trends and advancements in related technological domains.
          </p>
          <p>To compute the coherence score, we begin by collecting
all unique categories from Wikipedia, forming what we
refer to as the ’Category Set.’ Subsequently, we perform
a mapping process, converting plural category names to
their singular counterparts, and then matching them with
articles sharing identical names. The coherence score is
then computed with the following Eq. 10:</p>
          <p>Coherence(t) =
{︃0.5, if  ∈ Category Set
0,
otherwise
(10)
In other words, if the technology () is part of the
Category Set, the coherence score is 0.5; otherwise, it is 0. This
mathematical expression reflects the coherent presence of a
technology within a specific thematic category.</p>
          <p>Emergence Score: Towards calculating the emergence
score, we sum the novelty, growth, impact, and coherence
scores. We then normalize the result to the range [0.0;1.0],
as shown in Eq. 11.</p>
          <p>Emergence(t) =  [ *  ()+
 * ℎ() +  * () +  * ℎ()]
(11)
We introduce control variables, including n, g, i, and c, to
empirically manage the impact of biases arising from data
imbalance, aiming to achieve the highest precision.</p>
          <p>Technology Class and Technology Class Score:
Individuals often generate multiple articles on Wikipedia that
closely relate to one another, such as those on Machine
Learning, Deep Learning, and Artificial Neural Networks.
To establish connections between these closely related
technologies, we employ Wikidata properties such as ’subclass
of,’ ’part of,’ ’instance of,’ or ’said to be the same as.’ We
refer to this group of related technologies as a ’Technology
Class.’ The Technology Class score (TCs) is computed by
taking the emergence score of the technology within the set
of related technologies, selecting the one with the maximum
emergence score, as shown in Eq. 12:.</p>
          <p>TCs = max Emergence (t)
∈
(12)</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Evaluation</title>
      <p>For patents, we gathered the abstracts of 6,647,699 patents
from PatentsView. From this dataset, we derived 112,199
unique annotations, of which 77,995 had more than 5
occurrences. Similarly, for publications, we collected the
abstracts of 1,425,558 research papers from arXiv. Within this
dataset, we identified 111,627 unique annotations with
technology classification, and among them, 65,162 articles had
occurrences exceeding 5 times. Our proposed technology
classification method identifies 50,954 technologies from
the 4,996,310 Wikipedia articles we utilized in our study.</p>
      <sec id="sec-5-1">
        <title>5.1. Results</title>
        <p>In this section, we discuss the observations obtained after
applying our proposed methodology to the public dataset
discussed earlier.</p>
        <p>Individual Scores: Table 2 displays the top 20
technologies with the highest novelty, growth, and impact scores.
Notably, technologies related to Artificial Intelligence (AI)†
appear among the top 20 across all scores, including Deep
Learning and Convolutional Neural Network (CNN) for
novelty, and Artificial Intelligence, Machine Learning, and
Artiifcial Neural Network for impact; all except CNN correspond
to categories in Wikipedia and are considered coherent.</p>
        <p>In the top 20 novel technologies, alongside AI-related
technologies, there are notable mentions of vehicle-related
technologies such as Multirotor, Autonomous Car, and
Vehicle-to-everything. The Nanosheet closes the novelty
list, being the only technology not related to either computer
science or vehicle technology. Communication ranks first in
the list of the top 20 technologies according to the growth
score, with Communication-related technologies like
Wireless and Data Transmission being other fast-growing terms.
The list also includes older technologies that receive
continuous or renewed interest, such as Lidar or Rechargeable
Battery. Apart from vehicle-related technologies like
Unmanned Aerial Vehicle and Autonomous Car, this list is
completed by the Internet of Things and Quantum
Computing.</p>
        <p>Overall Score: Table 3 presents the overall top 20
technologies after combining the individual scores.</p>
        <p>Deep Learning emerges as the top technology in our
methodology, with Convolutional Neural Network (CNN)
also making the list as a sub-category of Deep Learning. As
anticipated, Machine Learning is present, alongside the
Internet of Things, both demonstrating coherence and ranking
in the top 20 for impact and novelty, respectively.
Cyberattack holds a high position, accompanied by various
technologies related to Computer security, forming the second
group in the result list. Key-Value Database, the simplest
form of NoSQL databases, secures the seventh spot in the
top 20 emerging technologies. Communication and
Smartphone, technologies that have garnered attention for years,
are also on the list. We observe the inclusion of technologies
such as Autonomous Car, Knowledge Graph, and 5G in the
top 20 scored technologies.</p>
        <p>
          Our findings align well with similar observations made by
Zhou et al. [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ] and Daim et al. [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ], returning four
Convergence Emerging Technologies (CET) in the top five results,
with the fifth (CNN) being a sub-class of Deep Learning.
        </p>
        <p>Table 4 displays the top 20 technology classes identified
from the top 100 technologies based on the emergence score.
This method of presenting results enhances the visibility of
other technologies, such as Virtual Assistant or Exoskeleton.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Benchmarking</title>
        <p>To benchmark the compatibility of our proposed emergence
scoring to other similar works, we compiled the union
set of emerging technologies identified by leading
technology analysts, including Gartner, Forrester, IHS Markit, and
the World Economic Forum (WEF). Gartner predicted 35
technologies in its technology hype cycle, Forrester
predicted 12, IHS Markit 8, and WEF 10 emerging technologies.
Upon merging the overlapping technologies from these four
lists, we derived a consolidated list of 36 unique technology
classes which we use as ground truth. Table 5 provides an
overview of these classes.</p>
        <p>Notably, the majority of technologies in this table appear
to belong to the Computer Science-related domain, with
72% of them being linked to it. Technologies marked with
’†’ are those we were unable to directly map to a Wikipedia
article or category. Additionally, articles judged as
nontechnologies by the SVM classifier are indicated in the table
with ’.’</p>
        <p>It is worth mentioning that Wikipedia articles on
Augmented, Mixed, and Virtual Reality are collectively
presented, following Forrester’s proposal to consider them as a
single technology class.</p>
        <p>Table 6 illustrates the performance metrics of Average
Precision (AP) and Recall (R) for the top 20 technologies (T)
and Technology Classes (TC) identified in the evaluation
set.</p>
        <p>In the ’base’ run, all control variables in Eq. 10 are set to
1. Additionally, alongside the ’max_prec’ parameter set, we
present the average precision and recall of the Computer
Science technology class (max_prec_cs). Within the top 20
technologies with the highest emergence score, only one
non-technology result was observed. The average precision
(AP) was 0.72 for the base run. However, all the relevant
concepts from this subset relate to only 6 out of the 36
technologies mentioned before, resulting in a recall (R) of
0.16. By changing the control variables for the max_prec,
where non-Computer Science technology does not grow and
have entries in Wikipedia articles, we were able to increase</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Limitations</title>
      <p>A bias is evident when examining the results of identified
emerging technologies toward Computer Science, as
noticed within the evaluation set, with 70% of technologies
within the top 100 results belonging to this domain. This
bias complicates the exploration of trends in other domains.
Taking chemistry as an example, the International Union of
Pure and Applied Chemistry (IUPAC) issued a list of
emerging technologies for this domain, containing, among others,
3D bioprinting or Flow chemistry, none of which figure in
our evaluation set but are present in our technology result
set, ranked 4,897 and 12,421, respectively. To address this
bias, we split the result set as well as the evaluation set into
distinct domains (CS, Nanotechnology, Medicine, etc.). This
approach allowed us to navigate around the bias. The third
row (CS TC) of Table 6 provides the average precision and
recall when only results related to the Computer Science
ifeld are considered, as this class is predominant in our
result/evaluation sets. Although this approach results in only
a 10% increase in average precision, the increase in recall
rises to 30%.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>This paper presents an automated method for identifying
emerging technologies using publicly available data. Our
approach is applicable across various technology sectors
without the need for human domain experts, as it relies on
a clear mathematical foundation.</p>
      <p>We propose an emergence scoring system based on
novelty, growth, impact, and coherence scores. Novelty and
growth scores are computed from time series data of
annotations applied to USPO patents and arXiv publications.
The impact score is derived from the Wikipedia Pageview
time series, while the coherence score utilizes Wikipedia
categories.</p>
      <p>To assess the efectiveness of our proposed methods, we
compiled an evaluation set of 36 emerging technologies by
amalgamating lists from prominent market research firms
like Gartner and Forrester Research. The evaluation
unveiled a low recall (0.16) in identifying emerging
technologies.</p>
      <p>This research lays the groundwork for further
investigations, including the development of a methodology to
determine the more fine-grained stages of emergence (e.g.,
pre-emergence, emergence, post-emergence) for a particular
technology within diferent timeframes.</p>
      <p>
        Our study can be enhanced by incorporating the
OpenAlex concept 6, which has gained more popularity
compared to the now-defunct DBpedia concepts. Additionally,
we plan to employ more advanced deep learning models
instead of the SVM model, as mentioned in [
        <xref ref-type="bibr" rid="ref36 ref37">36, 37</xref>
        ],
specifically a combination of LSTM and Transformer [
        <xref ref-type="bibr" rid="ref38 ref39">38, 39</xref>
        ], to
conduct more eficient time series analysis. This will be
performed using a larger publication dataset than arXiv,
such as the one available on OpenAlex 7. Additionally, since
our methodology still requires a certain degree of manual
intervention, such as inspecting Wikipedia categories and
adjusting bias variables, we want to explore techniques that
can minimize these manual components to enhance
scalability and reduce potential subjectivity.
      </p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>We extend our thanks to the developers at Trivo
Systems—Pratiksha Jain, Himanshu Jain, and Marc Liechti—for
their work on the Technology Market Monitoring 1.0 project.
We appreciate their valuable contributions to shaping the
initial stage of our study. We also extend our thanks to
armasuisse Science and Technology for supporting the study.</p>
      <sec id="sec-8-1">
        <title>6https://docs.openalex.org/api-entities/concepts 7https://openalex.org/</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>O.</given-names>
            <surname>Dedehayir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Steinert</surname>
          </string-name>
          ,
          <article-title>The hype cycle model: A review and future directions</article-title>
          ,
          <source>Technological Forecasting and Social Change</source>
          <volume>108</volume>
          (
          <year>2016</year>
          )
          <fpage>28</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Intepe</surname>
          </string-name>
          , T. Koc,
          <article-title>The use of s curves in technology forecasting and its application on 3d tv technology</article-title>
          ,
          <source>International Journal of Industrial and Manufacturing Engineering</source>
          <volume>6</volume>
          (
          <year>2012</year>
          )
          <fpage>2491</fpage>
          -
          <lpage>2495</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ranaei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Karvonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Suominen</surname>
          </string-name>
          , T. Kässi,
          <article-title>Forecasting emerging technologies of low emission vehicle</article-title>
          ,
          <source>in: Proceedings of PICMET'14 Conference: Portland International Center for Management of Engineering and Technology; Infrastructure and Service Integration</source>
          , IEEE,
          <year>2014</year>
          , pp.
          <fpage>2924</fpage>
          -
          <lpage>2937</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J. W. Z.</given-names>
            <surname>Sossa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. P.</given-names>
            <surname>Marro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Alzate</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M. V.</given-names>
            <surname>Salazar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. F. A.</given-names>
            <surname>Patiño</surname>
          </string-name>
          ,
          <article-title>S-curve analysis and technology life cycle. application in series of data of articles and patents</article-title>
          ,
          <string-name>
            <surname>Revista</surname>
            <given-names>ESPACIOS</given-names>
          </string-name>
          | Vol.
          <volume>37</volume>
          (Nº 07)
          <year>Año 2016</year>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Kar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. P.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <article-title>Understanding the scurve of ambidextrous behavior in learning emerging digital technologies</article-title>
          ,
          <source>IEEE Engineering Management Review</source>
          <volume>49</volume>
          (
          <year>2021</year>
          )
          <fpage>76</fpage>
          -
          <lpage>98</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Adner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kapoor</surname>
          </string-name>
          ,
          <article-title>Innovation ecosystems and the pace of substitution: Re-examining technology s-curves</article-title>
          ,
          <source>Strategic management journal 37</source>
          (
          <year>2016</year>
          )
          <fpage>625</fpage>
          -
          <lpage>648</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Porter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Roessner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.-Y.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. C.</given-names>
            <surname>Newman</surname>
          </string-name>
          ,
          <article-title>Measuring national 'emerging technology'capabilities</article-title>
          ,
          <source>Science and Public Policy</source>
          <volume>29</volume>
          (
          <year>2002</year>
          )
          <fpage>189</fpage>
          -
          <lpage>200</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <article-title>Foresight in science and technology</article-title>
          ,
          <source>Technology analysis &amp; strategic management 7</source>
          (
          <year>1995</year>
          )
          <fpage>139</fpage>
          -
          <lpage>168</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N.</given-names>
            <surname>Corrocher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Malerba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Montobbio</surname>
          </string-name>
          ,
          <article-title>The emergence of new technologies in the ICT field: main actors, geographical distribution and knowledge sources</article-title>
          ,
          <source>Technical Report</source>
          , Department of Economics, University of Insubria,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Halaweh</surname>
          </string-name>
          ,
          <article-title>Emerging technology: What is it</article-title>
          ,
          <source>Journal of technology management &amp; innovation 8</source>
          (
          <year>2013</year>
          )
          <fpage>108</fpage>
          -
          <lpage>115</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.-C.</given-names>
            <surname>Hung</surname>
          </string-name>
          , Y.-Y. Chu,
          <article-title>Stimulating new industries from emerging technologies: challenges for the public sector</article-title>
          ,
          <source>Technovation</source>
          <volume>26</volume>
          (
          <year>2006</year>
          )
          <fpage>104</fpage>
          -
          <lpage>110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>W.</given-names>
            <surname>Boon</surname>
          </string-name>
          , E. Moors,
          <article-title>Exploring emerging technologies using metaphors-a study of orphan drugs and pharmacogenomics</article-title>
          ,
          <source>Social science &amp; medicine 66</source>
          (
          <year>2008</year>
          )
          <fpage>1915</fpage>
          -
          <lpage>1927</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Cozzens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gatchair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kang</surname>
          </string-name>
          , K.-S. Kim,
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Ordóñez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Porter</surname>
          </string-name>
          ,
          <article-title>Emerging technologies: quantitative identification and measurement</article-title>
          ,
          <source>Technology Analysis &amp; Strategic Management</source>
          <volume>22</volume>
          (
          <year>2010</year>
          )
          <fpage>361</fpage>
          -
          <lpage>376</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Stahl</surname>
          </string-name>
          ,
          <article-title>What does the future hold? a critical view of emerging information and communication technologies and their social consequences</article-title>
          ,
          <source>in: Researching the Future in Information Systems: IFIP WG 8</source>
          .2 Working Conference, Turku, Finland, June 6-8,
          <year>2011</year>
          . Proceedings, Springer,
          <year>2011</year>
          , pp.
          <fpage>59</fpage>
          -
          <lpage>76</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>H.</given-names>
            <surname>Small</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. W.</given-names>
            <surname>Boyack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Klavans</surname>
          </string-name>
          ,
          <article-title>Identifying emerging topics in science and technology</article-title>
          , Research policy
          <volume>43</volume>
          (
          <year>2014</year>
          )
          <fpage>1450</fpage>
          -
          <lpage>1467</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>W.</given-names>
            <surname>Glänzel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thijs</surname>
          </string-name>
          ,
          <article-title>Using 'core documents' for detecting and labelling new emerging topics</article-title>
          ,
          <source>Scientometrics</source>
          <volume>91</volume>
          (
          <year>2012</year>
          )
          <fpage>399</fpage>
          -
          <lpage>416</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tavazzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. P.</given-names>
            <surname>David</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jang-Jaccard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mermoud</surname>
          </string-name>
          ,
          <article-title>Measuring technological convergence in encryption technologies with proximity indices: A text mining and bibliometric analysis using openalex</article-title>
          ,
          <source>arXiv preprint arXiv:2403.01601</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D.</given-names>
            <surname>Rotolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hicks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <article-title>What is an emerging technology?</article-title>
          , Research policy
          <volume>44</volume>
          (
          <year>2015</year>
          )
          <fpage>1827</fpage>
          -
          <lpage>1843</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>T. U.</given-names>
            <surname>Daim</surname>
          </string-name>
          , G. Rueda,
          <string-name>
            <given-names>H.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gerdsri</surname>
          </string-name>
          ,
          <article-title>Forecasting emerging technologies: Use of bibliometrics and patent analysis</article-title>
          ,
          <source>Technological forecasting and social change 73</source>
          (
          <year>2006</year>
          )
          <fpage>981</fpage>
          -
          <lpage>1012</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kucharavy</surname>
          </string-name>
          , E. Schenk, R. De Guio,
          <article-title>Long-run forecasting of emerging technologies with logistic models and growth of knowledge</article-title>
          ,
          <source>in: 19th CIRP design conference</source>
          ,
          <year>2009</year>
          , p.
          <fpage>277</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bengisu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nekhili</surname>
          </string-name>
          ,
          <article-title>Forecasting emerging technologies with the aid of science and technology databases</article-title>
          ,
          <source>Technological Forecasting and Social Change</source>
          <volume>73</volume>
          (
          <year>2006</year>
          )
          <fpage>835</fpage>
          -
          <lpage>844</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>M.</given-names>
            <surname>Nieto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Lopéz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cruz</surname>
          </string-name>
          ,
          <article-title>Performance analysis of technology using the s curve model: the case of digital signal processing (dsp) technologies</article-title>
          ,
          <source>Technovation</source>
          <volume>18</volume>
          (
          <year>1998</year>
          )
          <fpage>439</fpage>
          -
          <lpage>457</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Kyebambe</surname>
          </string-name>
          , G. Cheng, Y. Huang,
          <string-name>
            <given-names>C.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z. Zhang,</surname>
          </string-name>
          <article-title>Forecasting emerging technologies: A supervised learning approach through patent analysis</article-title>
          ,
          <source>Technological Forecasting and Social Change</source>
          <volume>125</volume>
          (
          <year>2017</year>
          )
          <fpage>236</fpage>
          -
          <lpage>244</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>S.-Y.</given-names>
            <surname>Hwang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.-J.</given-names>
            <surname>Shin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Systematic review on identification and prediction of deep learning-based cyber security technology and convergence fields</article-title>
          ,
          <source>Symmetry</source>
          <volume>14</volume>
          (
          <year>2022</year>
          )
          <fpage>683</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. Zhang,</surname>
          </string-name>
          <article-title>Forecasting emerging technologies with deep learning and data augmentation: convergence emerging technologies vs non-convergence emerging technologies (</article-title>
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>P. USPTO</surname>
          </string-name>
          , Locations that drive innovation,
          <year>2023</year>
          . URL: https://datatool.patentsview.org/,
          <source>accessed: December 9</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27] arXiv, Monthly submissions,
          <year>2024</year>
          . URL: https://arxiv. org/stats/monthly_submissions,
          <source>accessed: February 5</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>P.</given-names>
            <surname>Analysis</surname>
          </string-name>
          , Comparison of pageviews across multiple pages,
          <year>2023</year>
          . URL: https://pageviews.wmcloud.org/,
          <source>accessed: February</source>
          <volume>12</volume>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>H.</given-names>
            <surname>Han</surname>
          </string-name>
          , W.-Y. Wang,
          <string-name>
            <given-names>B.-H.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <article-title>Borderline-smote: a new over-sampling method in imbalanced data sets learning</article-title>
          ,
          <source>in: International conference on intelligent computing</source>
          , Springer,
          <year>2005</year>
          , pp.
          <fpage>878</fpage>
          -
          <lpage>887</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>B.</given-names>
            <surname>Andersen</surname>
          </string-name>
          ,
          <article-title>The hunt for s-shaped growth paths in technological innovation: a patent study</article-title>
          ,
          <source>Journal of evolutionary economics 9</source>
          (
          <year>1999</year>
          )
          <fpage>487</fpage>
          -
          <lpage>526</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31] M. Meyer,
          <article-title>Patent citation analysis in a novel field of technology: An exploration of nano-science and nano-technology</article-title>
          ,
          <source>Scientometrics</source>
          <volume>51</volume>
          (
          <year>2001</year>
          )
          <fpage>163</fpage>
          -
          <lpage>183</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Day</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Schoemaker</surname>
          </string-name>
          ,
          <article-title>Avoiding the pitfalls of emerging technologies</article-title>
          ,
          <source>California management review 42</source>
          (
          <year>2000</year>
          )
          <fpage>8</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Moore</surname>
          </string-name>
          , Introduction to the Practice of Statistics, WH Freeman and company,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. Zhang,</surname>
          </string-name>
          <article-title>Forecasting emerging technologies using data augmentation and deep learning</article-title>
          ,
          <source>Scientometrics</source>
          <volume>123</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>T.</given-names>
            <surname>Daim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. K.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yalcin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alsoubie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>Forecasting technological positioning through technology knowledge redundancy: Patent citation analysis of iot, cybersecurity, and blockchain</article-title>
          ,
          <source>Technological Forecasting and Social Change</source>
          <volume>161</volume>
          (
          <year>2020</year>
          )
          <fpage>120329</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , C. Zhang,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mayr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Suominen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <article-title>An editorial of “ai+ informetrics”: Robust models for large-scale analytics</article-title>
          ,
          <source>Information Processing and Management</source>
          (
          <year>2023</year>
          )
          <fpage>103495</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>W.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jang-Jaccard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sabrina</surname>
          </string-name>
          ,
          <article-title>Improving performance of autoencoder-based network anomaly detection on nsl-kdd dataset</article-title>
          ,
          <source>IEEE Access 9</source>
          (
          <year>2021</year>
          )
          <fpage>140136</fpage>
          -
          <lpage>140146</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jang-Jaccard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sabrina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Camtepe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Boulic</surname>
          </string-name>
          ,
          <article-title>Lstm-autoencoder-based anomaly detection for indoor air quality time-series data</article-title>
          ,
          <source>IEEE Sensors Journal</source>
          <volume>23</volume>
          (
          <year>2023</year>
          )
          <fpage>3787</fpage>
          -
          <lpage>3800</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jang-Jaccard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sabrina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Camtepe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dunmore</surname>
          </string-name>
          ,
          <article-title>Reconstruction-based lstm-autoencoder for anomaly-based ddos attack detection over multivariate time-series data</article-title>
          ,
          <source>arXiv preprint arXiv:2305.09475</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>