Identification of basic criteria of portfolio analysis based on the rolling verification principle D A Gercekovich1, E Yu Gorbachevskaya2, I S Shilnikova1 1 Irkutsk State University, Karl Marx St. 1, Irkutsk, Russia, 664003 2 Irkutsk National Research Technical University, 83, Lermontov St., Irkutsk, Russia, 664074 eugorbachevskaya@mail.ru Abstract. The problem of synthesizing the optimal sizes of training samples specific to each of the considered financial instruments is considered and tested in the article, using real examples. The sample size is selected according to the quality criterion which is based on the accuracy of the generated forecasts. The stated algorithm, which serves as the basis for the synthesis of widely diversified portfolios can significantly increase the efficiency of investment decisions. It is facilitated by, taking into account the characteristics of the markets under study. 1. Introduction The theory of investment during its development has overcome several stages. Until 1952, such economists as Fisher [1], Keynes [2-4] and many others believed that the size of investment is fully determined by the rate of return. The theory of investments changed radically after the publication of the work by Markowitz [5]. The fundamental difference of the Markowitz model was the definition of a new criterion in the theory of investment decision making - the level of risk through the level of scatter of returns from their expected values. Further development of portfolio theory is associated with the works of Sharp [6]. He proposed the so-called one-factor model of the capital market where the formation of an investment portfolio is carried out on the regression analysis basis. Later Tobin proposed to include risk-free assets in the analysis [7]. For example, government bonds. The works of Sharp, Lintner, Mossin [8, 9] opened the next stage in investment theory, associated with the so-called capital asset valuation model, or Capital Asset Price Model - CAPM. The main result of the CAPM was to establish a relationship between the return and risk of an asset for an equilibrium market. Further, Miller proposed an options model. This model was based on the possibility of a risk-free transaction with the simultaneous use of a share and an option written on it [10]. The task of forming a widely differentiated investment portfolio in the classical point of view involves calculating estimates of their expected returns, risk level, covariances and other statistical characteristics based on historical data of the same length. The task is to form an optimal portfolio from the considered set of securities: ordinary shares, financial derivatives, Real Estate Investment Funds - REIT, Exchange-Traded Fund - ETFs, Fixed Income Securities (preferred shares, bonds, etc.); cash in a certain currency with a given interest rate of increase, exchange commodities, such as, for example, gold, oil, etc. [5-8, 11-16]. The assets listed above represent various industries and finance. Most of the well-known technical indicators: moving average, chart patterns, Moving Average Convergence/Divergence - _____________ Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). MACD, Average Directional Movement - ADX, etc. [17, 18] are equally effective for a wide range of instruments. The key indicator that determines the specifics of various markets and their individuality is the indicator that reflects the “lifetime” of an asset. The most typical are the following: pork meat, beef meat, copper, wheat, Japanese yen, options, futures or forward contracts, and many others. Differences in their dynamics determine the difference in such parameters as the order of the autoregressive equation, the number of averaging points for calculating the moving average [12, 19]. These differences in the parameters of the listed models determine the variability of the optimal training lengths for various financial instruments. The proposed dynamic formation of the historical database, based on the principle of rolling verification of the investment portfolio. It has undoubted practical significance, since it allows us to link together the size of the training sample and the specifics of the corresponding market. 2. Research Methods Critics of portfolio theory speak negatively about two of its basic provisions [20]: 1. The taken investment decisions assume the values of profitability and risk calculated from past data can be the basis for the formation of investment policy at the present time. Even though in numerous studies the possibility of successful predicting the dynamics of financial instruments is questioned, the authors of this article believe: а) It is possible to enumerate a fairly extensive class of examples of models that make it possible, with a satisfactory efficiency for practical investors, to synthesize forecasts of the dynamics of financial instruments representing various markets, with a lead time from one month to a year. And indicate a number of successful examples demonstrating the effectiveness of portfolio analysis on independent material, both at educational examples and practical tasks. b) Even if we take an extreme point of view about the complete impossibility of predicting the dynamics of financial instruments, then in this case, the basic principles of portfolio analysis will help to form such an investment portfolio, the characteristics of “unsinkability” of which (such as: exclusion from consideration of financial instruments that in the recent past showed negative investment results, diversification, etc.) will allow the investor to “survive” an unfavorable period of time with minimal losses. 2. The estimation of the variation in returns is carried out according to the generally accepted formula, while for the investor only those cases are unfavorable when the expected returns exceed the real ones. The approbation of practical investors showed that the use of semi-dispersion does not improve the quality of the resulting risk level estimates, but, in some cases, has a negative impact [14]. Based on the above and based on the foundations of portfolio theory, a procedure for synthesizing an investment portfolio on a variable training sample is proposed, which is based on the principle of sliding verification. Let the set of securities under study be given by the corresponding time series of the same length, which are the realizations of the quotes of the corresponding securities. In other words, the data that the investor has is a two-dimensional array, where the number of rows is equal to the number of securities under consideration m, and the number of columns n is the amount of historical data, i.e. number of time intervals (bars). Let us introduce the following designations: DR (i, j) is the realized profitability of the i – th security, j is the number of the time interval (bar), DA (i, k) is the expected (calculated by the investor) profitability of the i – th security, k is the number time interval (bar) for which this value is calculated (Table 1). The expected result of diversification is an optimal portfolio, i.e. a list of securities from the initial set and the corresponding weights which they are included in the resulting investment portfolio with. Since portfolio investors, making investment decisions, rely on two basic parameters: expected return and level of risk, portfolio theory is called a two-parameter model. That is why, in order to accomplish this task, it is necessary to estimate the future profitability and the level of risk unique for each security individually and for the investment portfolio as a whole using the time series of the returns of the corresponding securities that the investor has at his disposal. Table 1. Historical data on the profitabilities of m assets. Asset Time slot number Number 1 2 ... q1 q1+1 ... n 1 DR(1, 1) DR(1, 2) ... DR(1, q1) DR(1, q1+1) ... DR(1, n) 2 DR(2 ,1) DR(2, 2) ... DR(2, q1) DR(2, q1+1) ... DR(2, n) … … ... ... ... ... ... ... m DR(m, 1) DR(m, 2) ... DR(m, q1) DR(m, q1+1) ... DR(m, n) According to Markowitz [5], the risk of investing in a certain type of securities is determined by the probability of deviation of the realized profitability from the expected value. Suppose that during a certain nearest time interval the trends existing in the market will continue, then the predicted value of profitability can be determined based on the processing of historical data on the dynamics of quotations of these assets in the past, thus, the expected profitability is the result that the investor can expect during the nearest time interval (for example, one month). Then the difference between the investor's expectations and the actual profitability of the analyzed time interval for each of the considered assets will fully reflect the quality of the forecast. In other words, if DR (i, j) DA (i, j), then the investor's expectations are met. Conversely, if DR (i, j) DA (i, j), then the investor's expectations turned out to be overestimated, and the real profitability is less than the expected one. Let us take this quantitative expression as a basis for the formation of an optimality criterion. In its turn, the expected return is the average which can be calculated for the last one, two, three, etc. previous values of the return on the asset or group of assets under study. Continuing to increase the number of historical data to calculate the expected return from the current point in time to the past, the investor, according to the a priori chosen quality criterion, will receive the optimal length of the training sample (i.e. the most acceptable number of historical data that will be used to calculate the expected return). To improve the quality of developed investment strategies, a procedure is proposed that is designed to restore the slowly changing in time basic criteria of portfolio analysis: profitability and risk, considering the effect of information “aging”. For the first time, the problem of selecting the optimal length of the training sequence when choosing the best empirical model was formulated by Gershengorn [21]. To illustrate the effect of a slow change in time of the basic criteria of portfolio analysis of the expected return DR and risk RS, the known data for 1926 1993 were used [6]. The comparison was carried out according to the sliding verification principle [16]. For this purpose, the expected profitability, and the level of risk for 1926 1974 were calculated. On the graph (abscissa), this is the time point “1975”. Further, the “oldest” data of 1926 were excluded from consideration, and the “fresh” data of 1975 (on the graph the time point “1976”), etc. were included in the analysis until 1993 inclusively. Based on the values of the expected return and risk, the ratio of return to risk is calculated, which most informatively reflects the changes that occur in the US stock market (Figure 1). From Figure 1, the abscissa of which is time, and the ordinate is the ratio of return to risk, it follows that since the mid-70s, the investment attractiveness of the US stock market has been steadily decreasing, while in the mid-80s there is a confident trend reversal and the rise begins, which continues until 1993. A sufficiently high variation of the criterion on the considered time interval indicates that the criterion of the ratio of the expected return to the level of risk is not only informative at a sufficiently high level, but also allows predicting the threat of an unfavorable investment time period with sufficient advance for practical purposes. To implement the algorithm for optimizing the size of the training sample, we will divide the entire series of returns in chronological order into two non-intersecting sequences: training – q1 and test – q2, then: q1 + q2 = n, q1, q2 > 0. According to the data of previous yields, from the subsystem q1 for each asset i, the optimal interval of length S is selected, at which DR(i, tq1-s), DR(i, tq1-s+1),…, DR(i, tq1), (1) the most accurate forecasts for DR(i, tq1+1), DR(i, tq1+2),…, DR(i, tn), (2) The forecast based on the test sequence data is carried out on based the sliding verification principle. Long-term government bonds Figure 1. Drift in the ratio of expected return to risk level Data (1) is used to “train” the model. Then, according to the received criteria, an “exam” is made on the data (2). The algorithm for finding the optimal training length for an investment portfolio with the adaptation of basic criteria works as follows. Let there be observations in the training sequence. According to the latest observations, the expected return is calculated from q1 – DA (i, q1 + 1). Based on the found value, a forecast is made for the next time point – this is DR (i, q1 + 1). Forecast error: ER(i, q1+1) = (DR(i, q1+1) – DA(i, q1+1)), remembered. Next, the first time point from the training sequence is discarded and the first point from the test sequence is added (where the adaptive algorithm was tested). The expected profitability is restated according to the updated data. The model with a given number of observations on training ntr is sequentially tested at all time points q2 from the test sequence. Sequentially increasing the length of the training sequence from 1 to q1, it is possible to find the optimal length of the training sequence (nopt) for each i – th asset from the condition of the maximum of the following criterion S(ntr) : = , for cases, when ER(i, q1+1) > 0. = , for cases, when ER(i, q1+1) < 0. Then the optimality criterion can be written as follows: max (S(ntr)) = S1+S2. When limiting the maximum permissible level of risk: . (3) Where (3) is the standard deviation of the results of testing the model with a fixed number of historical data on its training, V is the maximum risk level specified by the investor. Then, the essence of the last inequality is to exclude from consideration those variants of the model from among those considered that do not meet the set task of synthesizing the optimal investment portfolio. After identifying the optimal training length sequentially for each asset, out of the total number m, we obtain a one-dimensional array containing m elements (m unique training lengths for each financial asset under study) L(1: m). Figure 2. Visualization of synthesis results for optimal training lengths 3. Approbation The proposed method for synthesizing the optimal training length was tested on two problems: 1. Data on the long-term dynamics of the US stock market for 1926 till 1993 (treasury bills, ordinary shares, long-term government and corporate bonds and changes in the consumer price index) [6]. 2. Data for 1945 till 1996 on the dynamics of crop yields in Russia (cereals in general, sugar beets, vegetables, potatoes, sunflowers) [22]. Figure 2 is a visualization of the results of the synthesis of optimal training lengths for estimating the expected return in the Markowitz model. The graph, in addition to the original crops, displays the results of averaging the optimization results. On the graph the abscissa shows the value of the training sample, and the ordinate shows the value of the maximum optimism criterion. The last of the six presented graphs shows the optimization results based on the averaged values of the optimality criterion for problem 2. The meaning of such averaging is to demonstrate the effectiveness of the proposed approach. Namely, supporters of the classical approach in the problem of forming an investment portfolio can use the result to assess the expected return and the level of acceptable risk, an investor must and suffice to use historical data for the last 19 years. Computational experiments carried out on an independent material have shown the practical suitability of the proposed approach. For each of the studied assets, its own specific values of the training sample were obtained, which closely correlate with the “life time” of the corresponding financial instruments. The extremum of the optimality criterion in the studied examples is either within the investigated interval of possible values of the lengths of the training samples, or on its border when the values of the optimality criterion reach a plateau. 4. Conclusions In the general case, upon completion of a full study on the synthesis of optimal learning values for a sufficiently wide set of financial assets, it is recommended for the investor, in his further practical activities, when calculating the basic criteria of portfolio analysis (expected profitability and risk level), to use training samples of different length for each asset. The right edge of these samples corresponds to the current moment in time, while the left edges will differ. References [1] Fisher I 1930 The Theory of Interest (The Macmillan Company, N.Y.). [2] Keynes J M 1937 General Theory of Employment. Economic Journal 51(2) 214 [3] Keynes J M 2007 General Theory of Employment, Interest and Money (Мoscow: Eksmo) [4] Keynes J M 1936 General Theory of Employment, Interest and Money (Harcourt, Brace, N.Y.) 154 – 158 [5] Markowitz H M 1959 Portfolio Selection: Efficient Diversification of Investment (New York: Wiley) [6] Sharpe W F and Aleksandr J 2016 Investment (Мoscow: Infra-M) [7] Tobin J 1965 The Theory of Portfolio Selection. Theory of Interest Rates (London: MacMillan) 3 – 51 [8] Lintner J 1965 The Valuation of Risk Assets and the Selection of Risky Investments in Stock Portfolios and Capital Budgets. Review of Economics and Statistics 47(1) 13 – 37 [9] Mossin J 1966 Equilibrium in a Capital Asset Markets. Econometrica 34(4) 768 – 783 [10] Miller M H 1978 Dividends and Taxes. Journal of Financial Economics. M. Scholes 6 333 – 364 [11] Merton R C 1971 Optimum consumption and portfolio rules in a continuous time model. Journal of Economic Theory 3 373 – 413 [12] Gitman L J 1997 Fundamentals of investing (Мoscow: Delo) [13] Ferri R 2013 All About Asset Allocation. The Easy Way to Get Started (Мoscow: Publisher “Mann, Ivanov and Ferber”) [14] Bernstein U 2001 The Intelligent Asset Allocator. How to Build Your Portfolio to Maximize Returns and Minimize Risk (The McGraw-Hill Companies, Inc.) [15] Gertsekovich D A 2008 Quantitative methods for analyzing financial market (Irkutsk: ISU Publisher) [16] Gertsekovich D A and Babushkin RV 2019 Dynamic portfolio analysis of stock indices. The world of economy and management 19(4) 14 – 30 [17] Elder A 2003 How to play and win on stock exchange (Мoscow : Diagramma) [18] LeBo Ch 1999 Computer analysis of Futures markets (Мoscow : Alpina) [19] Gertsekovich D, Gorbachevskaya L, Grigorova L and Peshkov V 2019 Return on investment in REIT real estate funds. IOP Conference Series: Materials Science and Engineering 667 (1) № 012025 Available at: https://iopscience.iop.org/article/10.1088/1757-899X/667/1/012025 (accessed: 01.12.2020) [20] Roll R 1977 A critique of the asset pricing theory`s tests. On past and potential testability of the theory. Journal of Financial Economics 4(2) (March) 129 – 176 [21] Gershengorn G I 1977 A software package for constructing empirical differential equations. Long-term forecasts of natural phenomena (Novosibirsk: Nauka) 133 – 137 [22] Investman in Russia. 2017 Statistical compilation.