=Paper=
{{Paper
|id=Vol-2022/paper24
|storemode=property
|title=
Towards Framework for Discovery of Export Growth Points
|pdfUrl=https://ceur-ws.org/Vol-2022/paper24.pdf
|volume=Vol-2022
|authors=Dmitry Devyatkin,Roman Suvorov,Ilya Tikhomirov,Yulia Otmakhova
|dblpUrl=https://dblp.org/rec/conf/rcdl/DevyatkinSTO17
}}
==
Towards Framework for Discovery of Export Growth Points
==
Towards Framework for Discovery of Export Growth Points
© Dmitry Devyatkin1 © Roman Suvorov1 © Ilya Tikhomitov 1 © Yulia Otmakhova 2
1
Federal Research Center Computer Science and Control of the Russian Academy of Sciences,
Moscow, Russia
2
Novosibirsk State University,
Novosibirsk, Russia
devyatkin@isa.ru rsuvorov@isa.ru tih@isa.ru otmakhovajs@yandex.ru
Abstract. Export value of the Russian Federation has been reducing in the latest years, as well as the
corresponding relative yield. Most probably, this trend is caused by Russia total export decline together with
growth of food export. Thus, it is very important to not only increase export volumes, but also adjust export
structure to fit nowadays reality better. The paper presents a computer-aided framework for export growth
points discovery. While the full framework is described briefly, more attention is paid to the first sub-task:
growth point candidates ranking. The objective of this sub-task is to reveal combinations of commodities and
partner countries with high probability of successful export. The method uses open data about international
trade flows and production from United Nations databases and modern machine learning methods. The
experimental evaluation shows that taking into account retrospective data allows ranking growth point
candidates significantly better. Finally, the limitations and the possible directions of future research are
discussed.
Keywords: export growth potential, data mining, international trade, customs statistics, open data,
machine learning.
1 Introduction implementation. The first step consists in ranking pairs
in such a way so most
Sanctions pose both difficulties and opportunities for likely growing pairs appear in the beginning of the list.
the Russian economy. On the one hand, traditional In this paper we propose a machine-learning-based
foreign markets may be restricted or their growth method that ranks the “growth point” candidates using
potential may be exhausted. On another hand, exploring features, extracted from historical data from FAOSTAT
new markets may become a fruitful workaround. We and UN Comtrade databases [2, 3]. The presented
believe that modern big data and machine learning evaluation is preliminary, because it is based on
technologies should be useful to discover new foreign retrospective data. We understand such a weakness and
markets with high probability of growth in the nearest we are going to address it in the future work.
future. We will refer to the pairs of countries and The rest of the paper is organized as follows: in the
commodities as potential growth points. This paper aims Section 2 we review the most related works published so
on making a step towards finding new growth points far; in Sections 3 and 4 we briefly describe our
using machine learning and open data analysis. framework and the task of export growth point candidates
Authors of [1] consider export growth potential as an ranking; in Section 5 we describe our dataset and present
opportunity to meet the primary demand for a certain the results of experimental evaluation; in Section 5 we
product or service. At the same time, the possibility to conclude and discuss future work.
satisfy the demand arises locally and has a specific
territorial, and, therefore, national binding. 2 Related work
There are two possible ways to satisfy growing
Most commonly used approaches to foreign trade
demands: extensive and intensive. Intensive way implies
modeling include: gravity models, computable general
improving technologies, scientific and engineering
equilibrium models, heuristic ranking models,
solutions and increasing the resource potential and
Markovian models, common statistical approaches
efficiency of management. Therefore, a product may
(regressions, histograms) for manual analysis of a
have high export growth potential if it has high added
situation.
value, robust interbranch relations and stable external
The paper [4] presents the empirical evaluation of
demand. In this paper, we propose a framework for
spatial gravity model of Russian trade. The authors
discovery of “export growth points”. High-level
concluded that the spatial variables such as the location
procedure of this framework consists of two main steps:
of the state border checkpoints have a significant effect
(1) finding candidates for “growth points”; (2) assessing
on the volume and routes of Russian imports. In [5]
each candidate and discovering difficulties with its
authors study factors of export and import value-added
trade and suggest some recommendations for
Proceedings of the XIX International Conference management of industrial and trade policy. The
“Data Analytics and Management in Data Intensive
techniques proposed in this paper allow to determine
Domains” (DAMDID/RCDL’2017), Moscow, Russia, main directions of economic policy to expand exports
October 10–13, 2017
142
and improve Russian production structure. Duenas and Shen et al [16] considered the international trade
Fagiolo in their paper [6] concluded that gravity models network at the level of countries and goods. They used
are poorly suited to predicting the presence of trade flow analysis in graphs and statistics on tops to study the
relations between some two countries. However such network. The authors draw a number of conclusions
models allow us to accurately estimate and forecast the related to the specialization of countries, as well as the
volume, given the knowledge that such trade relation dominance of developed countries in terms of the
exists. In [7] researchers use gravity models to diversity of exported products (the principle of
investigate the export destinations that could be preferential accession).
effectively developed with internal financial support. They empirically confirm the fact that food products
Experimental work was carried out on the data of food are mostly traded between the most closely located
export at the firm-level. countries, while high-tech goods are distributed virtually
In [8] authors consider Markov models for all over the world. Also, the authors detect countries with
forecasting the variability of the network of foreign trade an anomalous profile of imports, which can talk about a
financial flows. In [9] an approach for detecting number of economic problems. In [17] authors presented
promising areas of export in the sector of both service and the analysis of export in the service sector on the example
goods is proposed. The approach is based on the of Germany companies. The main goal of the analysis is
sequential filtering of potential markets via a number of to determine the dependence of directions and the mode
heuristics, including estimation of the market volume, a of export on the various features of exported services.
level of demand, market openness, etc. In [10] authors They used a non-open dataset from Deutsche Bank.
studied the relationships between migration flows and Among other things, the authors detected such heuristics
foreign trade. They concluded that the trade flows for as "exports are more preferable to countries with higher
some products are positively and significantly correlated incomes (for countries with lower incomes, an
with migration flows. That feature can be taken into international partnership is more preferable)"; "When
account during analyzing and evaluating the prospects of selling in more remote countries, international
an export. partnership is more profitable."
In [11] Lall et al. investigated relationship between In [18; 19] researchers developed machine learning
exports volume and the "complexity" of goods and models to forecast export dynamics of agricultural
introduced a metric of "complexity" or products. They compare Support Vector Machines
"manufacturability" of goods. They mentioned the (SVM) and Autoregressive Integrate Moving Average
dependence between the rate of growth of prices on a (ARIMA). The experiments showed that SVM achieves
product and the degree of it manufacturability. This significantly smaller error rates.
dependence can be used as one of the features for To sum the review up, we can say that quite extensive
detecting and assessing the export growth potential. efforts have been committed to analyze and predict
Bernard et al. [12] proposed a method for estimating the international trade flows. However, most papers describe
feasibility of entering the international market for a fragmentary studies, which are focused on a limited set
particular company. They used indicators of the
of factors. Thus, a goal-oriented and comprehensive
company past activity, including participation in exports,
approach is in high demand.
a competitive environment, etc. It is worth noting the
weak influence of sectoral state support for exports on the 3 Framework for discovering export growth
actual volume of exports. In [13] authors considered the
relationship of the topology of the international trade points
network between countries in general with network In this section we will try to formalize the problem of
topologies within each product group. They proposed a export growth points discovery. The objective is to find
methodology for studying the dynamics of changing the combinations , which have the
structure of several heterogeneous networks that highest unrealized potential for export growth. Also,
represent trade flows between countries for individual production and export management of these
commodity groups. As a result, the most active exporters combinations has to be feasible in the Russian
and importers were detected for separate groups. Federation. Producti is a product or product category to
In [14] authors try to model the structure and
export and Countryj is a country or a group of countries
dynamics of the international trade network using the
classical methods for solving selecting balls from urns to export to.
problem. The analysis is carried out at the level of We propose to use open data analysis and modern
countries and the principle of preferential attachment is machine learning techniques to find such growth points.
implemented ("the rich get richer, the poor get poorer"). The high-level algorithm of our framework consists of
In [15] authors propose to model the structure and the following steps:
dynamics of the International Trade Network via the 1. Construct a list of growth point
Hamiltonian system. The authors describe the dynamics candidates. Reorder this
of the International Trade Network in terms of list so the candidates with higher likelihood of
Hamiltonian, and also make the assumption that the main becoming successful export direction appear
provisions from the field of statistical physics will also earlier.
be applicable to modeling the International Trade 2. Analyze supply chains which contain
Network. commodities from our candidate list. Products
143
with higher added value should be reviewed first. three main reasons. The first one is that information about
Consider the product lifecycle (including order is more abstract than information about exact
production, storage, transportation and increase of trade value or volume (and thus the
processing for the selected products) in order to corresponding predictive model should generalize
detect the most probable difficulties for each better). The second reason is that we plan to use LTR in
stage of the lifecycle in the context of the Russian more general case and thus we want to conduct
Federation. Propose intensive or extensive ways experiments as close to the proposed framework as
of overcoming them. Products with too many possible. And the third reason is that we can generate
difficulties are removed from the list. more data to train LTR model and thus try to reduce
Novelty of our approach consists in maximum overfitting.
possible automation. We can automate step 1 (candidates To facilitate solution of the described LTR problem,
ranking) and aid step 2. Ranking in Step 1 can be carried we treat it as pairwise ranking problem: we build a
out with a predictive machine-learning based model. Step regression model, which is given a pair of two export
2 can be highly facilitated by developing a specialized growth point candidates and
information retrieval system which uses big collections returns a difference between
of scientific and engineering documents, such as patents, export flows for the first and second pair. Generally, such
scientific papers, grant reports. Step 1 is discussed in a model operates on a feature set consisting of three
detail later in this paper. We are going to consider step 2 major parts: description of global macroeconomic
in future. situation; description of trade flows for the first
candidate; description of trade flows for the second
4 Data Driven Candidates Ranking candidate. Ideally, information about both candidates
Formally, the problem of candidates ranking is a should also somehow describe prices, competitiveness,
Learning-To-Rank (LTR) problem. Traditionally, each quality etc.
LTR problem is specified by three components: a set of The objective of the experimental evaluation in this
possible queries, a set of objects and a target metric to paper is to verify that retrospective data is useful to
optimize. In this work each query is formulated as compare trade flow dynamics for different commodities
“Which products to which countries should we try to and foreign markets. To achieve this goal, we applied
export to increase budget income, in the context of ARIMA model as a baseline and also built two machine
current macroeconomic situation and our state of learning models: “baseline” and “advanced”.
industry?”. In other words, a query is specified by current 4.1 Dataset
economic context (wide or narrow, depends on
implementation). Objects that are ranked relative to that We used excerpts from FAOSTAT [2] and UN
query are export growth point candidates or pairs Comtrade (Comstat) [3] databases from 2011 to 2015
(what and where to export). years. The main source of data is Comstat (import,
The main difficulty with LTR problem statement is export, re-import, re-export). From FAOSTAT we took
target metric construction. This metric must reflect the information about production volumes. The last year
likelihood of success if export of Producti to Countryj FAOSTAT contains data about is 2014, so 2015 is the
from the Russian Federation will be established. Such a last year we could predict for. Full dataset contained 307
metric cannot be constructed in purely data-driven way, million data points.
because no database of such cases exists. To overcome Due to limited time and computational resources, we
this issue, we propose to base on two sources of conducted experiments only on the 10 most exported
knowledge: (1) opinion of experts in the field of food from the Russian Federation commodities. Also, we
market and international trade; (2) retrospective data selected 20 countries in the same way. Thus, we got 200
about dynamics of international trade. On the one hand, growth points. Surely, in future experiments we should
retrospective data alone cannot be used to predict future, consider much larger set of commodities and countries,
because the world context is changing and it will almost not only those well-developed already.
never become same again. On another hand, experts base The testbed was set up as follows. All available data
on a limited number of factors and limited knowledge (it were split into two parts: train and test. Train subset
may be very deep but still limited). Thus, we propose to contained information about trade from 2013 to 2014.
use experts to take into account factors which are hard to Test subset contained information about only 2015. Each
formalize; and retrospective data - to measure prior subset consisted of datapoints each representing a pair of
likelihood of trade flow of Producti to Countryj to grow. export growth point candidates to compare. Features
Taking into account expert opinion requires labeling were constructed using “current” and “previous” year.
a training dataset. In this paper we conduct preliminary Outcomes were constructed on the base of the “next”
studies only using retrospective data, due to limitations year. Thus, in train features were constructed on the base
of time and resources. Experiments with manually of 2011-2012 (2013 as “next”) and 2012-2013 (2014 as
annotated datasets will be considered in future. “next”) and outcomes were constructed on the base of
In other words, in this paper we study only export 2013 and 2014 correspondingly. In test subset features
dynamics prediction. One can dispute that LTR is a
reasonable approach to this problem and claim that
traditional regression is a better fit. We chose LTR due to
144
Table 1 Top 5 predicted export growth points andoftheir
squares
summary of proportion
country portions in export
in the total a flow).
gainEtalon
No Actual Predicted outcomes for “advanced” model were constructed as
𝑠𝑖𝑔𝑛(𝑑𝐸𝑉1 )𝑙𝑜𝑔(|𝑑𝐸𝑉1 | + 1) −
ARIMA Baseline model Advanced model
𝑠𝑖𝑔𝑛(𝑑𝐸𝑉2 )𝑙𝑜𝑔(|𝑑𝐸𝑉2 | + 1), where 𝑑𝐸𝑉𝑖 is the first
Partner Commodity Partner Commoditydifference
Partnerof exportCommodity PartnerfromCommodity
value of Product the Russian
i
Country Country Federation to Countryi. TrainingCountry
Country dataset for “advanced”
1 Saudi Barley Libya Barley modelAzerbaijan
consisted Potatoes
of 68370 samplesItaly (pairs Maize
of growth
Arabia points) and 1398 features. Test dataset consisted of
2 China Soybeans Spain Soybeans 35700Georgia
samples. Maize Spain Maize
3 Turkey Maize Ukraine Wheat Uzbekistan Wheat Libya Maize
4 Azerbaijan Wheat Ukraine Molasses Ukraine
We Potatoes
tried Support Spain with
Vector Machines Ryelinear and
5 Italy Maize Kazakhstan Soybeans polynomial
China kernels, Wheat
random forestUkraine
regressor
Molasses
(bagging)
and gradient tree boosting (as implemented in LightGBM
Export
$ 360059k 11710k 13830k 19197k
gain
[20]). Hyperparameters were optimized using grid
% 76.2 2.4 2.9 4.0
search. To prevent overfitting during hyperoptimization,
were constructed using 2013-2014 and outcomes training data was split so that data for each year was used
represented difference in dynamics in 2015. Each subset solely either for the train or for evaluation. After best
was symmetric: for each pair there was also pair hyperparameters were chosen, the model was refitted
. Samples with outcome of 0 were excluded from using all training data. Finally, we decided to use
both subsets. LightGBM to train that model, because it showed the
most promising results. All the results presented for
4.2 Baseline model “advanced” model were constructed using LightGBM.
The objective of baseline model is to estimate, how One can notice that we do not explicitly use
accurate candidates can be compared using only information about global economic situation. We omitted
knowledge about titles of these candidates. Baseline is it from the feature set due to two main reasons: (1) it is
implemented as Bernoully Naive Bayes classifier with very difficult to represent in such a way so a machine
feature set, consisting only of learning-based model can take full advantage of it
(only elements of left hand part of comparison). Etalon (unclear how to prepare features); (2) some global
oucomes for training the baseline model were information is implicitly encoded into difference between
constructed as 𝑠𝑖𝑔𝑛(𝑑𝐸𝑉1 − 𝑑𝐸𝑉2 ), where 𝑑𝐸𝑉𝑖 is the production, import and export, and also in
first difference of export value of Producti from the monopolization estimates. Surely, explicitly taking into
Russian Federation to Countryi. account the global economic situation is very important.
Thus, this classifier estimates prior marginal We will consider it in next papers.
probability of each candidate to grow faster than each 5 Experimental evaluation
other candidate. This model is very naive and measures
skewness of our dataset and most frequent patterns of the As written before in the paper, the main objective of
Russian Federation international trade. experimental evaluation is to estimate how much the
detailed retrospective data about international trade is
4.3 «Advanced» model useful for the problem of growth point candidate ranking.
The objective of this model is to estimate, how much Because of the nature of the problem, the standard
simple context information can improve comparison classification or regression scores are not well applicable
accuracy. There are several differences from the baseline: to measure the prediction quality, i.e. miscomparison of
the feature set, the machine learning method used and the different pairs may have very different significance.
loss function. Therefore, we used a proportion of the predicted export
The feature set consists of two parts: historical growth points in the total export gain as the score. In
information about trade of the Russian Federation with other words, the bigger part of export growth the model
Producti and Countryi; and the same information about detects (the list “%” row in tables), the better the model
the second candidate. “Historical information about works. These percent values may be treated as
trade” includes the following basic values from UN quantitative prediction quality measures.
Comtrade database: export amount (in tonnes), export Table 1 contains the scores for the top 5 actual
value (in USD), export prices (as ratio of value to growth points and for the predicted alternatives. Sum
amount), export monopolization; the same corresponding absolute export value growth for the predicted pairs is
parameters for re-export, import, re-import. The feature presented. The last row (%) contains the portion of total
set also contains information about production (from growth of export from Russia in 2015, calculated for all
FAOSTAT database). Prior dynamics is taken into growth point candidates (as specified above). From this
account using first order differences and ratios. First table one can see that it is nearly impossible to predict
order difference (or ratio) is the difference (or ratio) of short one-year trade flow dynamics without additional
the value for the current year and that for the previous information about global economic situation.
one. Monopolization (or competitiveness, or A notable difficulty here is high volatility of the
concentration) is estimated using Herfindahl index (sum product market, while the creation or development of a
145
food manufacture is a long-term process. Therefore, we than 30% of actual export growth. ARIMA and
think that prediction of averaged, long-term trends would “advanced” model performed approximately equally. So,
yield a more meaningful ranking. we conclude that almost no new markets are explored:
Advanced model achieved slightly better results than we will trade tomorrow with those, who we trade today.
baseline and ARIMA models. From that we conclude that Additional unaccounted factors may include politics,
retrospective data is useful to predict flow dynamics. wars, sanctions, etc.
This in turn means that combining open retrospective
data about international trade with expert opinions makes 7 Conclusion and future work
much sense in order to maximize both likelihood and In this paper we have reviewed and discussed the
novelty. problem of export growth points discovery. The main
Table 2 Top 5 predicted commodities and their contribution of this paper is an automated data-driven
proportion in the total export gain framework that addresses the problem. The framework
uses open data from many data sources and modern
No Actual ARIMA Baseline Advanced machine learning techniques. We also conducted
model model preliminary experiments to evaluate the possibility to use
retrospective data to rank growth point candidates. The
1 Barley Barley Potatoes Maize experiments were based on open data from FAOSTAT
and UN Comtrade.
2 Soybeans Soybeans Maize Rye
Currently, it is very difficult to say for sure, which
3 Maize Wheat Wheat Molasses method is more useful for the final task – growth point
discovery. Different methods compared to each other
4 Wheat Molasses Linseed Soybeans differently, depending on how to compare (top5 growth
points, top5 commodities or top5 directions). This fact
5 Potatoes Maize Rye Wheat gives some clues on what a better model should look like.
Another thing that has to be changes is the objective
$ 446903k 440272k 137694k 225233k function: predicting short-term export value changes is
% 94.6 93.2 29.1 47.6 very difficult and useless, because developing a new
manufacture needs much more than one year. Thus, it
Table 2 presents five commodities with the highest makes much more sense to predict long-term trends.
expected growth. The last row (%) contains the portion Main directions of future work include (a) repeating
of total growth. One can see how much Russian food experiments with adjusted methodology; (b) creating a
export is non-diversified: 5 commodities occupy more manually-annotated dataset of growth points; (c)
than 90% of total export value growth. Also, we can see incorporating information about global economic
that ARIMA predicts commodity dynamic much better situation and substitutes.
than both baseline and advanced model. We think that
Acknowledgment
this is mostly due to inertia of flows: if something grows
today, it will most probably grow tomorrow. Again, The research is supported by Russian Foundation for
“advanced” model performed better than baseline. This Basic Research, project 16-29-12877.
means that prior information is not very useful to predict
commodity dynamics. References
Table 3 Top 5 predicted directions and their proportion [1] Rodrik D. Institutions, integration, and geography:
in the total export gain In search of the deep determinants of economic
growth //In Search for Prosperity: Analytic
No Actual ARIMA Baseline Advanced Narratives on Economic Growth. Princeton
model model University Press, Princeton. – 2003
[2] Food and Agriculture Organization of the United
1 Saudi Libya Azerbaijan Italy
Nations. URL: http://www.fao.org/faostat/en/
Arabia
2 China Spain Georgia Spain [3] UN Comtrade: International Trade Statistics. URL:
3 Turkey Ukraine Uzbekistan Libya https://comtrade.un.org/data/
4 Azerbaijan Kazakhstan Ukraine Ukraine [4] Kaukin A., Idrisov G. The gravity model of Russia’s
5 Italy Georgia China Armenia international trade: the case of a large country with a
$ 374755k 49666k 145263k 47982k long border. Working paper. – 2014
% 79.3 13.6 31.8 13.1 [5] Gordeev D. et al. Analysis of Global Supply Chains
Table 3 presents five countries with the highest in International Trade Patterns. – 2016. – №. 765
expected import growth from the Russian Federation. [6] Duenas M., Fagiolo G. Modeling the International-
From this table we conclude that Russia export is not only Trade Network: a gravity approach //Journal of
commodity-non-diversified, but also partner-non- Economic Interaction and Coordination. – 2013. –
diversified. From this table we can see that purely prior- Vol. 8(1). – pp. 155-178
based “baseline” model performed best: it predicted more
146
[7] Jaud M., Kukenova M., Strieborny M. Financial [14] Peluso S. et al. International Trade: a Reinforced Urn
Development and Sustainable Exports: Evidence Network Model. – 2016. – №. 1601.03067.
from Firm product Data //The World Economy. – [15] Fronczak A. Structural Hamiltonian of the
2015. – Vol. 38(7). – pp. 1090-1114 international trade network //No. – 2012. – Vol. 1. –
[8] Snijders T. A. B. Models for longitudinal network No. arXiv: 1205.4589. – pp. 31-46
data //Models and methods in social network [16] Shen B., Zhang J., Zheng Q. Exploring multi-layer
analysis. – 2005. – Vol. 1. – pp. 215-247 flow network of international trade based on flow
[9] Grater S. et al. Linking export opportunities of distances //arXiv preprint arXiv:1504.02361. – 2015
products and services: the case of South Africa. [17] Kelle M. et al. Cross border and Foreign Affiliate
[10] Sgrignoli P. The World Trade Web: A Multiple- Sales of Services: Evidence from German Microdata
Network Perspective //arXiv preprint //The World Economy. – 2013. – Vol. 36(11). – pp.
arXiv:1409.3799. – 2014 1373-1392
[11] Lall S., Weiss J., Zhang J. The “sophistication” of [18] Sujjaviriyasup T., Pitiruek K. Agricultural Product
exports: a new trade measure //World Development. Fore-casting Using Machine Learning Approach
– 2006. – Vol. 34(2). – pp. 222-237. //Int. Journal of Math. Analysis. – 2013. – Vol. 7. –
[12] Bernard A. B., Jensen J. B. Why some firms export №. 38. – p. 1869-1875
//Review of Economics and Statistics. – 2004. – Vol. [19] Sujjaviriyasup T., Pitiruek K. Hybrid ARIMA-
86(2). – p. 561-569 support vector machine model for agricultural
[13] Barigozzi M., Fagiolo G., Garlaschelli D. production planning //Applied Mathematical
Multinetwork of international trade: A commodity- Sciences. – 2013. – Vol. 7. – №. 57. – p. 2833-2840
specific analysis //Physical Review E. – 2010. – Vol. [20] Microsoft. https://github.com/microsoft/lightgbm
81(4). – p. 46-104
147