A Comparison of Time Series Model Forecasting Methods on Patent Groups Mick Smith Rajeev Agrawal Department of Computer Systems Technology Department of Computer Systems Technology North Carolina A&T State University North Carolina A&T State University csmith715@gmail.com ragrawal@ncat.edu Abstract patents, Exponential Smoothing and Autoregressive The ability to create forecasts and discover trends is a value to Integrated Moving Averages (ARIMA). almost any industry. The challenge comes in finding the right data Due to a decrease in storage costs and an increase in and the appropriate tools to analyze and model such data. This processing power, Big Data has created a situation in which paper aims to demonstrate that it may be possible to create a vast amount of information has been made available. As technology forecasting models through the use of patent groups. we progress into the next several years, there will be a great The focus will be on applying time series modeling techniques to a collection of USPTO patents from 1996 to 2013. The techniques need to understand the massive amounts of structured and used are Holt-Winters Exponential Smoothing and ARIMA. Cross unstructured data that is a product of the Big Data validation methods were used to determine the best fitting models phenomenon. As it will be demonstrated by this research, and ultimately whether or not patent data could be modeled as a analysis of patents represents an area of great analytic time series. potential. This paper will show that patent data is certainly a prospective source for a Technology Forecasting (TF) model. This will differ from other research in TF since other 1. Introduction techniques do not consider the sequence of patent grants as As innovation and technology has grown over the last a trend. Instead, they focus only on the cumulative content several decades there has arisen a greater need for tracking, of patents for a set period of time with no respect to changes grouping, and analyzing such progress. This is satisfied over that time period. Furthermore, the creation of TF through the issuance of patents. Each patent can be thought models with patent data can go a long way in helping us of as an index in technological advancement since they understand the underlying meanings within a given introduce a new, innovative idea or theory. If these pieces technological sector. The trends and analyses that result of knowledge are to be considered benchmarks in the from such models would benefit other areas of government, constantly changing landscape of technology, then it may be politics, economics, and social well-being. possible to examine the trends in quantities of patents. The goal of this paper is to show that an opportunity exists 2. Related Work to create a technology forecasting model based on the sequence of patents issued over a given time period. To When attempting to forecast univariate time series data, it accomplish this it is necessary to demonstrate that a time is generally accepted that parsimonious model techniques series model can accurately predict the fluctuations in patent are followed. A simple approach that has been used in volume from month to month. Due to the overwhelmingly many applications is the Holt-Winters Exponential large amount of patent data, this research will focus on three Smoothing (HWES) technique. Exponential smoothing classes of data processing patents: Generic Control Systems techniques are simple tools for smoothing and forecasting or Specific Applications (GCSSA), Artificial Intelligence a time series. Smoothing a time series aims at eliminating (AI), Database and File Management or Data Structures the irrelevant noise and extracting the general path followed (DFMDS). Furthermore, this subset of patents will only by the series (Fried and George 2014). It is based on a include patents from 1996 to 2013. Two univariate time recursive computing scheme, where the forecasts are series forecasting models will be applied to each series of updated for each new incoming observation and is Bibliometrics, and Delphi processes, improves technology sometimes considered as a naive prediction method (Gelper forecasting. Shin and Park (2009) have demonstrated that et al. 2010). technology forecasting methods can be a key factor in Exponential smoothing methods were originally used in economic growth. In their methods they use Brownian the 1950’s as a collection of ad hoc techniques for agents to detect regions of technology growth. extrapolating various types of univariate time series (De Gooijer and Hyndman 2006). In 1960 C.C. Holt and his 3. Proposed Methodology student Peter Winters introduced a variation to the technique which ultimately became known as the Holt- In this analysis, each patent group is being considered Winters technique (De Gooijer and Hyndman independently of other patents. It was important to use this 2006)(Goodwin 2010). Holt’s initial model extended approach so that it could first be shown that a sequence of simple exponential smoothing to allow forecasting of data patents over a given time represented a meaningful time with a trend. Winters would later collaborate with his series and that predictive modeling could be carried out. mentor to produce a seasonal component (Hyndman and However, in building on this research it will be important to Athanasopoulos 2013). understand the relationships between each group and the While Autoregressive (AR) and Moving Average (MA) effect each one may have the others. models have been in existence since the early 1900’s, it was The patent data for this project was obtained from UC the work of Box and Jenkins in 1970 that integrated these Berkley Fung Institute techniques into one approach and ultimately created (https://github.com/funginstitute/downloads). Their patent ARIMA (De Gooijer and Hyndman 2006). The Box- data has been extracted from the USPTO website and Jenkins approach allowed for non-stationary time series converted from XML to a SQLite table structure. The patent trends to be modeled (Shumway and Stoffer 2006). Non- stationary data can be made stationary through a process databases provided include patent data ranging from 1975 to known as differencing. In some time series models there is 2013. From these tables it was possible to filter out the a need to adjust for seasonality. As previously mentioned number of patents in a given classification over a period of both HWES and ARIMA offer alternative methods to time (1996 to 2013). While the selection of dates is adjust models accordingly. However, that is not the case somewhat arbitrary, it does coincide with a rough starting with the data selected for this paper. date of commercial internet use. The USPTO classes and Time series modeling has been applied in several number of patents used in this research is shown in Table 1. different settings and situations. Research has been carried out in economics (Kang 1996)(Dongdong Name USPTO Number of Patents 2010)(Timmermann and Granger 2004), climate change Class (1996 – 2013) and weather forecasting (Kumar and De Ridder GCSSA 700 27,503 2010)(Leixiao et al. 2013), utility forecasting (Conejo et al. AI 706 8,699 2005)(Contreras et al. 2003)(De Gooijer and Hyndman DFMDS 707 53,415 2006), and many more. Table 1 – Quantities and Classifications of Patents Even though the only forecasting methods mentioned here are univariate, it is worth mentioning that multivariate Each particular class has several subclasses which offer techniques exist as well. Some of the more popular greater specificity in the classification of the patent. It multivariate time series models that exist include should be noted that if each class were to be broken into their VARIMA, VARMA, VAR, and BVAR. However, the smaller subclass components, additional trends may appear. impact that one patent trend may have on another might be However, such granularity should not be necessary for this substantial and should not be overlooked. When study. Every entry in the database also included the considering further research in patent analysis it is possible application and grant date for each patent. In this research that these modeling techniques could be used. the grant date was used to compile the total number of It should be reiterated that the main objective of this paper patents per month from January of 1996 to March of 2013. is to demonstrate that groupings of patent data over time However, in generating the forecasting models only the data can be represented as a time series and that a forecasting from January 1996 to December 2011 was used. This model can be fitted to the trend. There is a lot of value in allowed for a portion of the actual data to be used in such technology forecasting, especially as it pertains to comparison to the proposed forecast values. some level of patent mining. Technology forecast For each patent group two models will be applied, HWES modeling on patent data has been done to show areas of and ARIMA. Two functions within R Studio were used to technological development opportunities (Jun et al. generate the models for each class of patents: HoltWinters() 2011)(Tseng et al. 2007). Daim et al. (2006) suggest that and auto.arima(). Each series was plotted and 15 month the use of multiple methods, including Patent Mining, forecasts for the two models were produced. The forecast values were then compared to the actual values previously withheld and forecast error metrics were calculated. A third Simple Exponential Smoothing (SES) forecast will be 250 applied and graphed for purposes of providing visual 200 comparison. However, SES models in their most basic form tend to over fit the data and may not be the best option. controlts 150 Furthermore, as it has been stated, the actual selection of a forecasting method is not the objective of this paper. It is the 100 hope of this research to identify possible candidates for 50 future patent mining/technology forecasting research. 2000 2005 2010 In this paper, we make an assumption that the Time classifications proposed by USPTO are correct. It may be GCSSA Time Series argued that other meaningful patents related to a given technology are classified elsewhere. For instance, Wu et al. 120 (2010) suggest that most industries rely on the International 100 Patent Classification (IPC) process too heavily. This can 80 sometimes make searching for specific patents within a aits classification difficult, decrease business decision 60 processes, and increase the possibility of patent 40 infringement. It may be possible to cluster patents with 20 similar content to create less arbitrary classifications. From 2000 2005 2010 these groupings themes could be determined and trend Time analysis analogous to this research could be carried out. One AI Time Series proposed approach is to cluster the patents using Genetic Algorithms and Support Vector Clustering (Wu et al. 2010). 600 500 4. Experimental Results 400 datats 300 R Studio was used in this project to compile, plot, and 200 forecast each time series trend. The first step in the process 100 was to graph each series. Figure 1 illustrates the time series graphs of all three groupings. From each of these graphs it 0 2000 2005 2010 can be observed that there is an observable trend. Time Additionally it should be noted that by themselves, none of DFMDS Time Series the models are stationary, which is a requirement for the Figure 1 – Patent Time Series ARIMA model. However, R implements ARIMA in such a manner that the level of differencing is determined Name Smoothing Coefficients SSE automatically. Alpha GCSSA 0.277 215.64 135767.9 4.1 Exponential Smoothing AI 0.3 89.52 21146.18 For each dataset both the HoltWinters and auto.arima DFMDS 0.338 472.32 515876.8 functions were used to fit appropriate models. The Table 2 – HW Exponential Smoothing Model Values smoothing parameters and Sum of Squares values for each The trend lines generated from the HWES model appear HWES model are shown in Table 2. The alpha values were to fit each instance very well. In fact it may be argued that automatically generated by R and indicate how close the they are over fitting each data series. However, for the model will fit the actual data. The parameter can range in purposes of this research such a similarity is acceptable values from zero to one. If the value is close to one then the since this study is primarily concerned with determining if resulting model is influenced more by the later values of the modeling such data is possible to begin with. Another data. However, all of the values in Table 2 indicate that both feature to note is that in the forecast of each HW model, the recent and less recent data points were used in creating the trend seems to become flat. According to Hyndman and forecast. The coefficient value represents the final Athanasopoulos (2013) empirical evidence suggests that component estimate. Exponential Smoothing methods tend to over-forecast. To compensate for this, a technique known as damping is applied which creates a flattened forecasting line. Figures 2 through 7 show forecast for each patent group projected 15 Forecasts from ETS(A,A,N) months out for a SES and HWES model. The SES plots are 600 being included to illustrate the predictive potential that other Exponential Smoothing models offer. Although due to the 500 error correction options it offers, HWES will continue to be 400 the primary model of demonstration for this paper. 300 200 Forecasts from ETS(A,A,N) 300 100 250 0 2000 2005 2010 200 Figure 6 – SES Model and 15 month forecast for DFMDS 150 Forecasts from HoltWinters 100 600 50 500 2000 2005 2010 400 Figure 2 – SES Model and 15 month forecast for GCSSA 300 200 Forecasts from HoltWinters 300 100 250 0 2000 2005 2010 200 Figure 7 – HW Model and 15 month forecast for DFMDS 150 4.2 ARIMA 100 50 The ARIMA model has three parameters (p, d, q) and is 2000 2005 2010 often written as arima(p, d, q). The Autoregressive (AR) Figure 3 – HW Model and 15 month forecast for GCSSA portion of the model is based on the idea that the current Forecasts from ETS(A,A,N) value of the series, xt, can be explained as a function of p past values, xt−1, xt−2,...,xt−p, where p determines the number 120 of steps into the past needed to forecast the current value 100 (Shumway and Stoffer 2006). The parameter of d represents the levels of differencing the original time series needs to 80 undergo to become stationary. As an alternative to the 60 autoregressive representation in which the xt on the left-hand 40 side of the equation are assumed to be combined linearly, 20 the moving average model of order q, abbreviated as MA(q), 2000 2005 2010 assumes the white noise wt on the right-hand side of the defining equation are combined linearly to form the Figure 4 – SES Model and 15 month forecast for AI observed data (Shumway and Stoffer 2006). Therefore, in Forecasts from HoltWinters the ARIMA model q represents the number of lags in the moving average. 120 Normally the creation of an ARIMA model requires 100 determining the level of differencing necessary to make a 80 time series stationary. Thankfully R has a function (auto.arima) that accomplishes this task in one step. It may 60 be worthwhile to note that the middle term of each proposed 40 ARIMA model is 1. This corresponds with the level of 20 differencing that is needed to make each time series 2000 2005 2010 stationary. The model parameters for each patent group are Figure 5 – HW Model and 15 month forecast for AI shown in Table 3. As with the HWES and SES examples, the forecasts for each patent group were projected out 15 months and the results are shown in Figure 8. Name ARIMA 𝝈𝟐 AIC BIC accuracy like MAE, MAPE, and MASE are used to compare Model models of different structures. GCSSA (2, 1, 1) 687.5 1798.6 1811.6 For each model and 15 month forecast, four error statistics AI (2, 1, 0) 100.2 1431.1 1444.1 were calculated: Root Mean Squared Error (RMSE), Mean DFMDS (1, 1, 3) 2407 2040.3 2056.5 Absolute Error (MAE), Mean Absolute Percentage Error Table 3 – ARIMA Model Parameters (MAPE), and Mean Absolute Scaled Error (MASE). The results are shown in Table 4. All of these values used the 15 months not included in the original model training data as testing data. For each error calculation lower values are preferred. According to Hyndman and Koehler (2006), values of MASE greater than one indicate that the forecasts are worse, on average, than in-sample one-step forecasts from naıve (random-walk) methods. Based on this measurement, it can be seen that the MASE values indicate that all of the models have adequate forecasting capabilities. The results from Table 4 suggest that ARIMA acts as a better predictor for the GCSSA and DFMDS data while the AI patent data seems to be better suited for an Exponential Smoothing model. Given the forecasting results, it does not seem reasonable to state that a specific time series model is best for these three patent groupings. For additional reference the full list of testing and forecasting values are listed in the appendices at the end of this paper. 4.4 Discussion At a first glance it appears that the models generated may be over fitting the data. However, the MASE values calculated indicate that each of the models produced performs very well in predicting the testing data. It is possible that both are true. From looking at the trend lines produced, they do seem to be very similar to the actual trends. Moreover, the testing data may not have been fully representative of the full flow of each trend. In future research a different proportion of training and testing data should be considered. Another interesting observation from the experimentation Figure 8 – ARIMA 15 Month is that the Database and Control System patent groups favored an ARIMA model, while Artificial Intelligence 4.3 Model Comparison patents fit better with a Holt Winters model. A possible explanation for this is an intuitive look at the initial time In the early stages of time series modeling the selection of series for each classification group. In the AI trend the data models was very subjective. Since then, many techniques seems to be fairly stationary until about 2008, when the and methods have been suggested to add mathematical rigor number of patents seemed to spike rapidly. Thus it appears to the search process of an ARMA model, including that not much differencing would be needed on this model Akaike’s information criterion (AIC), Akaike’s final and this may automatically make it a better candidate for a prediction error (FPE), and the Bayes information criterion HWES model. (BIC). Often these criteria come down to minimizing (in- sample) one step-ahead forecast errors, with a penalty term for over fitting (De Gooijer and Hyndman 2006). It should be noted that these model comparison techniques are only useful for selecting the best model of similar structure. For instance if there are three ARIMA models on one dataset to choose from, AIC or BIC can be used to select from those models. It is for this reason that measures of forecast Patent Model RMSE MAE MAPE MASE Group HWES 42.52 31.23 12.11 0.6436 GCSSA ARIMA 40.66 29.66 11.62 0.6142 HWES 11.61 8.03 7.84 0.731 AI ARIMA 13.08 9.64 9.46 0.7906 HWES 85.08 65.57 11.45 0.6754 DFMDS ARIMA 80.4 60.98 10.64 0.6351 Table 4 – Model Forecast Error Statistics 5. Conclusions and Future Work Daim, T.U.; Rueda, G.; Martin, H.; Gerdsri, P. 2006. Forecasting Emerging Technologies: Use of Bibliometrics and Patent Analysis. Technological Forecasting & Social Change 73:981-1012 The first goal of this paper was to demonstrate that current groups of patents could be represented as a time series. From observing the initial plots it appears that this certainly Dongdong, W. 2010. The Consumer Price Index Forecast Based is the case. An interesting observation that can be made is on ARIMA Model. In Proceedings of the 2010 WASE the consistent increase in these technology based patents International Conference on Information Engineering (ICIE) 307- 310 over the past 20 years. The second objective of this research was to confirm that time series models could be applied to each patent group. This too was successful. Obviously it is Fried, R.; George, A.C. 2014. Exponential and Holt-Winters debatable as to whether the models presented are the most Smoothing, International Encyclopedia of Statistical Science, optimal for the situations provided. However, it seems safe Springer Berlin Heidelberg to state that with additional work patent and technology forecasting models could be produced using time series Gelper, S.; Fried, R.; Croux, C. 2010. Robust Forecasting with modeling techniques. Exponential and Holt–Winters Smoothing. Journal of Forecasting Future work would benefit from exploring the validity of 29:285-300 the groupings of patents. A possible approach would be to use textual mining techniques to first group the patents and Goodwin, P. 2010. The Holt-Winters Approach to Exponential then conduct an analysis similar to the one carried out in this Smoothing: 50 Years Old and Going Strong. Foresight 19:30-33 paper. It may also be worthwhile to explore multivariate autoregression techniques such as Vector Autoregression or Jun, S.; Park, S.S.; Jang, D.S. 2011. Technology Forecasting Using Bayesian Vector Autoregression. As mentioned earlier in Matrix Mapping and Patent Clustering, Industrial Management & the paper, there may be associations between patent Data Systems 112(5):786-807 groupings that might influence the rate of change in another. Furthermore, if the patent classifications are not a good De Gooijer, J.G.; Hyndman, R.J. 2006. 25 Years of Time Series enough representation of a technological theme, then both a Forecasting, International Journal of Forecasting 22:443–473 re-clustering of patents and a multivariate analysis may be necessary. Kang, H. 1986. Univariate ARIMA Forecasts of Defined Variables. Journal of Business & Economic Statistics 4(1):81-86 References Kumar, U.; De Ridder, K. 2010. GARCH Modelling in Association Conejo, A. J.; Plazas, M. A.; Espinola, R.; Molina, A. B. 2005. with FFT-ARIMA to Forecast Ozone Episodes. Atmospheric Day-Ahead Electricity Price Forecasting Using the Wavelet Environment 44(34):4252-4265 Transform and ARIMA models. IEEE Transactions on Power Systems, 20(2):1035-1042 Leixiao, L.; Zhiqiang, M.; Limin, L.; Yuhong, F. 2013. Hadoop- based ARIMA Algorithm and its Application in Weather Forecast. Contreras, J.; Espinola, R.; Nogales, F. J.; Conejo, A. J. 2003. International Journal of Database Theory & Application 6(5):119- ARIMA Models to Predict Next-Day Electricity Prices. IEEE 132 Transactions on Power Systems, 18(3):1014-1020 Hyndman, R.J.; Athanasopoulos, G. 2013. Forecasting: principles and practice http://otexts.org/fpp/ A2 – AI Testing/Forecast Data Hyndman, R. J.; Koehler, A.B. 2006. Another Look at Measures HW ARIMA of Forecast Accuracy. International Journal of Forecasting Point Actual Forecast Forecast 22:679–688 Jan 2012 92 89.5 83.2 Shin, J.; Park, Y. 2009. Brownian Agent Based Technology Feb 2012 83 89.5 90.4 Forecasting. Technological Forecasting & Social Change Mar 2012 83 89.5 84.7 76:1078-1091 Apr 2012 101 89.5 83.6 May 2012 126 89.5 89.7 Shumway, R.H.; Stoffer, D.S. 2006. Time Series Analysis and Its Applications. Springer, New York Jun 2012 88 89.5 84.4 Jul 2012 102 89.5 83.8 Timmermann, A.; Granger, C.W.J. 2004. Efficient Market Aug 2012 95 89.5 94.0 Hypothesis and Forecasting. International Journal of Forecasting 20(1):15-27 Sep 2012 99 89.5 87.4 Oct 2012 90 89.5 86.5 Tseng, Y.-H.; Lin, C.-J.; Lin, Y.-I. 2007. Text Mining Techniques Nov 2012 98 89.5 88.1 for Patent Analysis. Information Processing and Management 43:1216-1247 Dec 2012 85 89.5 83.9 Jan 2013 97 89.5 86.2 Wu, C.H.; Ken, Y.; Huang, T. 2010. Patent Classification System Feb 2013 96 89.5 86.5 Using a New Hybrid Genetic Algorithm Support Vector Machine. Applied Soft Computing 10:1164-1177 Mar 2013 89 89.5 85.2 A3 – DFMDS Testing/Forecast Data Appendices HW ARIMA Point Actual Forecast Forecast A1 – GCSSA Testing/Forecast Data Jan 2012 580 472.3 488.9 HW ARIMA Point Actual Forecast Forecast Feb 2012 486 472.3 475.9 Jan 2012 243 215.6 216.6 Mar 2012 563 472.3 478.8 Feb 2012 196 215.6 221.9 Apr 2012 493 472.3 476.8 Mar 2012 179 215.6 219.9 May 2012 610 472.3 478.2 Apr 2012 229 215.6 219.4 Jun 2012 501 472.3 477.2 May 2012 304 215.6 220.0 Jul 2012 632 472.3 477.9 Jun 2012 210 215.6 219.9 Aug 2012 516 472.3 477.4 Jul 2012 288 215.6 219.8 Sep 2012 513 472.3 477.8 Aug 2012 235 215.6 219.8 Oct 2012 643 472.3 477.5 Sep 2012 235 215.6 219.9 Nov 2012 503 472.3 477.7 Oct 2012 312 215.6 219.8 Dec 2012 472 472.3 477.6 Nov 2012 230 215.6 219.8 Jan 2013 430 472.3 477.7 Dec 2012 213 215.6 219.8 Feb 2013 558 472.3 477.6 Jan 2013 224 215.6 219.8 Mar 2013 483 472.3 477.6 Feb 2013 232 215.6 219.8 Mar 2013 244 215.6 219.8