=Paper= {{Paper |id=Vol-1437/ipamin2015_submission_3 |storemode=property |title=An Approach for Predicting Hype Cycle Based on Machine Learning |pdfUrl=https://ceur-ws.org/Vol-1437/ipamin2015_paper3.pdf |volume=Vol-1437 }} ==An Approach for Predicting Hype Cycle Based on Machine Learning== https://ceur-ws.org/Vol-1437/ipamin2015_paper3.pdf
          An Approach for Predicting Hype Cycle Based on
                        Machine Learning
                             Zhijun Ren                                                          Shuo Xu
Institute of Scientific and Technical Information of China,             Institute of Scientific and Technical Information of China,
                   Beijing 100038, China                                                   Beijing 100038, China
      The China Patent Information Center, the State                                        xush@istic.ac.cn
   Intellectual Property Office, Beijing 100088, China
                  renzhijun@cnpat.com.cn

                           Xiaodong Qiao                                                       Hongqi Han
                                                                        Institute of Scientific and Technical Information of China,
Institute of Scientific and Technical Information of China,
                                                                                     Beijing 100038, China
                   Beijing 100038, China
                       qiaox@istic.ac.cn                                                     bithhq@163.com


                             Kai Zhang
   The China Patent Information Center, the State
Intellectual Property Office, Beijing 100088, China
                   zhangkai@cnpat.com.cn


ABSTRACT                                                                Some scholars believe that hype means advertised in publicity
Analyzing mass information and supporting insight based on              and exaggerated propaganda[1], and the best measure is the
analysis results are very important work but it needs much              expectation, which means people’s expectation for technology
effort and time. Therefore, in this paper, we propose an                innovation. Enterprises should use hype cycle to target the
approach for predicting hype cycle based on machine learning            emerging technologies, and use the concept of digital business
for effective, systematic, and objective information analysis           transformation to predict future business trends.
and future forecasting of science and IT field. Additionally,
                                                                        Gartner believes that hype cycle is a qualitative
we execute a comparative evaluation between the suggested
                                                                        decision-making tool, like other management methods, it
model and Hype Cycle for Big Data, 2013 for validating the
                                                                        relies mainly on the judgment of experts. And to complete the
suggested model and generally used for information analysis
                                                                        hype cycle assessment and prediction of a set of technologies
and forecasting.
                                                                        in a certain field, we need to use a variety of evaluation
                                                                        methods. Other scholars have explored how to measure
Keywords                                                                expectations for innovation, and quantitative indicators are
                                                                        mainly the number of participants[2], the number or the ratio
Hype Cycle; Data Mining; Technology Predict; KNN                        of the technological innovation documents[3], patent
                                                                        statistical data[4] and the search flow of Google and other
                                                                        search engines[5]. When using the network measurement
1. OVERVIEW                                                             method, other tools may need to be supplemented: USPTO
Hype Cycle is a conceptual model widely used by Gartner,                patents, news reports of Google news archive and the official
Inc., which can reveal the basic laws of the evolution of               website market share of a certain product, etc., these methods
technological innovation chain and is a powerful tool to get            mainly create the hype cycle by adopting artificial methods.
the overall grasp of technological innovation development               The InSciTe[6,7] developed by Korea Institute of Science and
trend, to get the objective assessment of the maturity of               Technology Information adopts the decision tree and
technological innovation, to make the reasonable choice on              statistical feature analysis method, based on Gartner research,
innovative intervention time and to seek the technological              to provide the technical life cycle diagram and adopts the
innovation late-mover advantage.                                        emerging technologies discovering model to provide the key
                                                                        technologies on the life cycle diagram. In the particular
                                                                        judgment process, there might be problems of force
                                                                        compliance in certain stage. For example, a technology is
                                                                        during the plateau of productivity between the year from 2000
Copyright © 2015 for the individual papers by the papers' authors.      to 2005, but its data between 2006 to 2007 is in line with the
Copying permitted for private and academic purposes.                    stage of slope of enlightenment, so the force compliance is
This volume is published and copyrighted by its editors.                needed to judge the technology life cycle of the last two
                                                                        years[1]. Related reference didn’t include how specific
Published at Ceur-ws.org
                                                                        technology term coordinates are obtained.
Proceedings of the Second International Workshop on Patent Mining and
its Applications (IPAMIN). May 27–28, 2015, Beijing, China.
                                                                        Therefore, the paper uses papers and patent information to
                                                                        realize emerging technology discovering method by machine
learning; then acquires some features by feature selection and     types of organizations make technology deployed.
uses machine learning algorithm to classify and locate the         Data from Internet between 2001 and 2014 about Gartner
coordinates, and produces the hype cycle according to the          Hype Cycle is manually collected on terminology, stage and
prediction.                                                        coordinate to be used as training data.
The paper is structured as follows. The prediction frame of
technology maturity is described in Chapter 2. The learning
method and model of technology maturity is introduced in
                                                                   3.2 Feature Calculation
Chapter 3. The prediction method and model of technology           Technical features are the foundation of technology life cycle
maturity is illustrated in Chapter 4. Experiment and analysis      discovering model. The technical features calculation uses the
are conducted in Chapter 5 and conclusion and forecasting are      papers and patents as data, and uses paper
made in the last chapter.
                                                                   index
                                                                             S ( Pp )  {Pp1 , Pp2 ,, Ppn } ,                                patent      index
2. TECHNOLOGY MATURITY                                             S ( Pt )  {Pt1 , Pt 2 ,, Pt n }
                                                                                                                      and combined index of
PREDICTION FRAME
                                                                   paper and patent
                                                                                    S ( Ppt )  {Ppt1 , Ppt 2 ,, Ppt n } as
The model of approach for predicting hype cycle based on
                                                                   calculation objects. It can study the interaction and exclusion
machine learning includes the learning part and the prediction
                                                                   between papers and patents, and explore the rule of
part. The learning model mainly relates to data training. The
                                                                   development between science and technology.
data acquiring and learning includes acquiring training data,
data annotation and feature calculation. The prediction model      Paper        index         includes         paper       growth
                                                                                                            k 1
means identification and discovering method of emerging                                             k
                                                                                                 AN Pp  AN Pp
technology in certain field and process to discover technology             PpGrowthRate                   k 1
system innovation by producing hype cycle and predict                                                   AN Pp
maturity and discover an emerging technology, as well as to        rate                                                                 ,   paper       relative
                                                                                                                                                 k 1
make sure the input ratio of partial innovation and overall                                                                  N   k
                                                                                                                                        N
                                                                                     Pp RelativeGrowthRate 
                                                                                                                                 Pp              Pp
innovation. It includes the term selection, feature calculation,                                                                     k 1
stage classification, technology position and visible                                                                              N Pp
information module. The hype cycle prediction module is            growth rate                                                                          , paper
                                                                                                                k
show in Fig. 1.                                                                                                A
                                                                                   Pp AuthorRate 
                                                                                                                Pp
                                                                                                                 k 1
                                                                                                           AA    Pp
                                                                   author rate                                               , paper author growth
                                                                                                                            k 1
                                                                                                        A A
                                                                                                           k

                                                                          Pp AuthorGrowthRate 
                                                                                                           Pp               Pp
                                                                                                                k 1
                                                                                                               APp
                                                                   rate                                                             , paper institution
                                                                                                      k
                                                                                                  I
                                                                          Pp InstitutionRate 
                                                                                                      Pp
                                                                                                       k 1
                                                                                                 AI    Pp
                                                                   rate                                        , and paper institution growth
                                                                                                                         k 1
                                                                                                           I   k
                                                                                                                      I Pp
                                                                          Pp InstitutionGrowthRate 
                                                                                                               Pp
                                                                                                                      k 1
                                                                                                                    I Pp
                                                                   rate                                                            .
                                                                   Patent           index             includes                     patent                growth
                                                                                                                         k 1
                                                                                                 AN  AN
                                                                                                      k
                                                                           Pt GrowthRate             Pt                 Pt

                                                                   rate
                                                                                                   AN Ptk 1                        ,       patent      relative
                                                                                                                                            k 1
                                                                                                                        N N  k
                                                                                    Pt RelativeGrowthRate                    Pt            Pt

                                                                   growth rate
                                                                                                                          N Ptk 1                  , inventor
                                                                                                        k
                                                                                                    I
                                                                              Pt InventorRate          Pt
                                                                                                         k 1
                                                                   rate
                                                                                                   AI   Pt              ,          inventor              growth
         Fig.1 Technology maturity prediction frame
                                                                                                                                   k 1
                                                                                                           AI  AI
                                                                                                                k
                                                                            Pt InventorGrowthRate              Pt                 Pt
3. TECHNOLOGY MATURITY
                                                                   rate
                                                                                                             AI Ptk 1                       ,     application
LEARNING                                                                                               k
                                                                                                   A
                                                                           Pt ApplicantRate           Pt
                                                                                                        k 1
3.1 Data Acquirement                                                                              AA   Pt
                                                                   rate                                             , and application growth
Since 1995, Gartner began to pay attention to the hype and
disillusionment along with every appearance of new                                               AA  AAPtk 1  k
                                                                        Pt ApplicantGrowthRate                 Pt
technologies and innovations, to track the trends of the
technology life cycle, to study the common pattern between         rate
                                                                                                   AAPtk 1
them, in order to provide guidance of when and where all
Combined index of paper and patent includes paper and patent                      extracted from patents and papers, TLCD model can
relative                                             growth                       determine the stage of the emerging technology adopting the
rate                                                                              five stages of classification of TSKNN algorithm, and the
                                         k 1
                                  k
                              ( N Pp  N Pp   )  ( N Ptk  N Ptk 1 )            specific stages refer to the Gartner's Hype cycle, which
Ppt RelativeGrowthRate                                                           includes Technology Trigger, Peak of Inflated, Expectations
                                             N Ptk 1  N Pp
                                                          k 1                    Trough of Disillusionment, Slope of Enlightenment, Plateau
                                                                              ,   of Productivity.
                                                              k
                                                        N                         SKNN algorithm is an improvement of KNN algorithm. KNN
                                  Ppt RatioRate 
                                                              Pp
                                                              k                   algorithm, by computing the distance between the training
paper and patent ratio rate
                                                        N     Pt   , paper and    point from training set and test point from test set, considers
patent            people                        growth                     rate   the closest distance having the most similar feature and can be
                                      k 1                                        classified into the same group, obtaining test markers
                                k
                             ( APp  APp   )  ( I Ptk  I Ptk 1 )               characteristic points and the same tag feature training points.
Ppt PeopleGrowthRate                       k 1
                                                  I Ptk 1
                                                                                  SKNN mainly considers the time sequence issue of
                                           APp                                    terminologies to be classified, so the data of next year need to
                                                                              ,
paper       and         patent             people         ration           rate   be larger than the data of last stage, in order to avoid the force
                         k                                                        classification problem. The specific algorithm is as follows.
                        APp
Ppt PeopleRatioRate                                                              (1) Redescribe training technology terminology and feature
                         I Ptk        , paper and patent institution
                                                                                  vector, according to feature set.
growth                                                          rate              (2) When the technology terminology feature vector reaches,
                                            k 1
                                                                                  18 features should be calculated respectively to establish
                                     k
                                 ( I Pp  I Pp   )  ( APtk  APtk 1 )           feature vector according to age.
Ppt InstitutionGrowthRate                     k 1
                                             I Pp    APtk 1                     (3) Select K technical terminologies which are most similar to
                                                                              ,   new technical year that is to be calculated from the training
paper      and       patent             institution           ration       rate   technical terminology set following the formula below,
                                k
                              I Pp
Ppt InstitutionRatioRate 
                                                                                                                          M


                              A  k                                                                                        W  W
                                                                                                                          k 1
                                                                                                                                 ik          jk           (1)
                                 Pt    .                                                          Sim (t i , t j ) 
                                                                                                                          M           M
                                                                                                                        ( Wik2 )( W jk2 )
                                                                                                                         k 1         k 1
4. TECHNOLOGY MATURITY
PREDICTION                                                                        (4) Among K neighbors of new terminology, weight of each
                                                                                  classification is calculated respectively as follows,
4.1 Terminology Extraction
Technology terminology refers to terms used in a certain field,
which means concepts, features or relationships in the field. In                                                                               

this paper, terminology extraction is based on keywords of                                      p( x, C j )            Sim( x, d ) y(d , C )
                                                                                                                d i STKNN
                                                                                                                                        i         i   j
                                                                                                                                                          (2)
papers and templates. The keywords are word or phrase
extracted from papers to meet the needs of literature indexing
or retrieval work. It is used to express the literature subject
                                                                                  In the formula, x refers to feature vector of the new
and therefore can be used as emerging technology
                                                                                  terminology, Sim(x,di) refers to similarity calculation formula
terminology.
                                                                                  which is the same as mentioned in the last calculation formula,
Template technology is a common method for terminology                            y(di,Cj) is categorical attributes function, if di belongs to Cj,
recognition. By analyzing the characteristics of papers and                       then the function is equals to 1, otherwise is 0.
patents, some fixed sentence structure are found, for example,
                                                                                  (5) Compare the weight of various classifications; distribute
‘The development application of XX technology in XY ’, so
                                                                                  the computable terminologies to the stage with the greatest
‘3D printing’ can be recognized in ‘The development
                                                                                  weight. Record the stage.
application of 3D printing technology in medical science’.
After certain strings are extracted from the template,                            (6) Back to step 3 and calculate for next year.
frequency and the subjection degree can be used to perform
terminology recognition. If a collocation is found in corpus, it                  4.3 Technical Position
must appear more than once, thus the frequency is an                              Based on the classification results, PKNN algorithm
important index in terminology extraction. Only when the                          calculates the position of terminologies on the hype cycle
term frequency in the corpus exceeds a certain threshold value,                   curve and S-curve. The algorithm is to find K most similar
it is believed that it has reached a technological terminology                    technologies in all classifications and the position of the
standard and awareness in the field is relatively high.                           technical terminology is the average position of these
Subjection degree refers to a relevant degree of the                              terminologies.
terminology and its filed. It represents the degree of a term
                                                                                  Input data includes trained technical terminology, feature
belonging to a field. While meeting the frequency and
                                                                                  vector and position under the classification, technical
subjection degree at the same time, the string is a technology
                                                                                  terminology and feature vector to be calculated.
terminology.
                                                                                  Output data includes technical terminology position.
4.2 Classify for Stage
By calculating technical terminology and technical features
                                                             Fig.2 Feature Extraction

Algorithm steps are listed as follows.                                         (3) Draw 5 stages line and label the text.
 (1) Select K technical terminologies which are most similar                   (4) Draw the technical terminology position according to the
to the emerging technical terminology in the year to be                        position calculated by maximum similarity algorithm.
calculated from the training technical terminology set.

                         M
                                                                               5. EXPERIMENT AND ANALYSIS
                          [sim( x , y) * r ]
                                    k        k                                 Gartner Emerging Technologies report describes some
              pred(y)  k 1 M                                   (3)           technologies that become famous because of hype, or
                              sim( x , y)
                             k 1
                                        k                                      technologies that Gartner believes will have significant impact.
                                                                               In order to validate the technical life cycle discovery model,
                                                                               960 training data released by ‘Hype Cycle Report’ from 2001
wherein K is 10.                                                               to 2013 is used for training technical maturity. Hype Cycle for
 (2) Among K neighbors of new terminologies, consider the                      Big Data, 2013 is used to evaluate validation set of prediction
deviation between the most similar K terminologies and the                     results.
emerging technical terminology, and calculate the prediction                   Using SKNN method to perform Hype Cycle model stage test,
position of new terminology.                                                   Table 1 shows the experiment results of Hype Cycle for Big
                                                                               Data, 2013 data and technical life cycle model. Compared
4.4 Visualization of Hype Cycle Curve                                          with Gartner, the technical life cycle model achieves the
The paper uses fitting algorithm to produce Hype cycle. The                    precision of 67.24% and recall rate of 68.46%. The reason
process is as follows.                                                         why the accuracy and recall rate in the fifth stage and the
                                                                               fourth stage is lower than that of other stages is that the
(1) Draw coordinates of horizontal and vertical axis, and mark                 sample size in the fifth stage and the fourth stage is too small
the expectation and time information.                                          and the problem data has greater impact.
(2) Produce Hype Cycle with curve fitting formula.


                                        Table 1. Technical life cycle discovering model experiment result


          Stage                                    Result
                                                   Gartner         Suggested          Number of      Precision     Recall
                                                                   approach           results same
                                                                                      in the both
          Technology Trigger                       11              11                 10             91%           91%
          Peak of Inflated                         14              15                 11             73.33%        78.6%
          Expectations
          Trough of                                11              9                  8              88.89%        72.7%
          Disillusionment
          Slope of Enlightenment              2               3           1              33%          50%
          Plateau of Productivity             2               2           1              50%          50%
          Total                               40              40          33             67.24%       68.46%



Using SKNN method to perform Hype Cycle model position             data sets will be used to further improve the accuracy of the
prediction, according to the Hype cycle visualization methods,     prediction model.
Hype Cycle for Big Data, 2013 prediction result is produced        This paper is funded and supported by the
and given as follows (This paper does not cover how each           China Postdoctoral Science Foundation(2013M540125).
technology reaches the Plateau, and Gartner’s result is used in
the following visualized graph ).

                                                                   7. REFERENCES
                                                                   [1] Fenn J, Raskino M. Mastering the hype cycle: How to
                                                                       choose the right innovation at the right time[M]. Boston:
                                                                       Harvard Business School Press, 11-19, 2008.

                                                                   [2] Guo Daoquan. The Study of Technology Maturity Model
                                                                       and Assessment based on TRL[D]. Changsha: National
                                                                       University of Defense Technology, 2010.
                                                                   [3] JärvenpääH M, Mäkinen S J. An empirical study of the
                                                                       existence of the Hype Cycle: A case of DVD
                                                                       technology[C]. IEEE International Engineering
                                                                       Management Conference. Estoril, Europe, 257-261, 2008.
                                                                   [4] Budde B, Alkemadeb F, Hekkert M. On the relation
                                                                       between communication and innovation activities: A
        Fig.3 Gartner's Hype Cycle for Big Data,2013                   comparison of hybrid electric and fuel cell vehicles[J].
                                                                       Environmental Innovation and Societal Transitions,
                                                                       101:1-15, 2013.
6. CONCLUSION AND FORCASTING                                       [5] Steinert M, LeiferL. Scrutinizing Gartner's hype cycle[C].
The approach for predicting Hype Cycle based on machine                Portland International Center for Management of
learning discussed in this paper can effectively analyze paper         Engineering and Technology, Portland, 2010.
and patent information, forecast and produce Hype Cycle. The       [6] Kim, J., Lee, S., Lee, J., Lee, M., & Jung, H. Design of
approach consists of two processes of learning and training            TOD Model for Information Analysis and Future
and provides users with a more diverse information analysis            Prediction. Communications in Computer and Information
and forecasting tool. Compared with traditional methods and            Science, 264(1): 301-305, 2011.
services, the approach is more systematic and objective.
                                                                   [7] Jinhyung Kim, Myunggwon Hwang, Do Jeong , Hanmin
Experimental results show that, compared to Hype Cycle for             Jung.Technology trends analysis and forecasting applicati
Big Data, 2013, the approach for predicting Hype Cycle based           on based on decision tree and statistical
on machine learning achieves the accuracy of 68.46% and                feature analysis [J]. Expert Systems with Applications, 39,
recall rate of 67.24% in forecasting stages, respectivly.              2012.
Therefore, this model can provide more accurate information
analysis and forecasting information to interested users,can       [8] Wang Xin, Qiao Xiaodong, Xu Shuo, Han Hongqi, The
                                                                       Overview of Technology Life Cycle Analysis Method
locate the position of technology accurately, and produce the
                                                                       Based on Factual Database[J]. Digital Library Forum.
Hype Cycle automatically.
                                                                   [9] Bin Sun. A Summarization of Information Extraction (2)
As for future work, considering the features of paper data and
patent data, more features will be extracted and experiments           (In Chinese). Terminology Standardization & Information
in different databases to predict the experiment will be               Technology, 2003.
conducted; In addition, more machine learning data means
more accurate results, so simulation experiments and different