<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multilabel Classi cation for In ow Pro le Monitoring?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dmitry I. Ignatov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>] dignatov@hse.ru</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pavel Spesivtsev</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>PSpesivtsev@slb.com</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmitry Kurgansky</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>mykurgansky@mail.ru</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivan Vrabie</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>vrabie</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>@mail.ru</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Svyatoslav Elizarov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>sorkerrer@gmail.com</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vladimir Zyuzin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>VZyuzin@slb.com</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Moscow Institute of Physics and Technology</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Research University Higher School of Economics</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Schlumberger Moscow Research</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>9</lpage>
      <abstract>
        <p>The purpose of this study is to identify the position of nonperforming in ow zones (sources) in a wellbore by means of machine learning techniques. The training data are obtained using the transient multiphase simulators and represented as the following time-series: bottomhole pressure, well-head pressure, owrates of gas, oil, and water along with a target vector of size N, where each element is a binary variable indicating the productivity of the respective in ow zone. The goal is to predict the target vector of active and non-active in ow sources given the surface parameters for an unseen well. A variety of machine learning techniques has been applied to solve this task including feature extraction and generation, dimensionality reduction, ensembles and cascades of learning algorithms, and deep learning. The results of the study can be used to provide more e cient and accurate monitoring of gas and oil production and informed decision making.</p>
      </abstract>
      <kwd-group>
        <kwd>Multi-phase ow</kwd>
        <kwd>multilabel classi cation</kwd>
        <kwd>time series</kwd>
        <kwd>bottomhole pressure</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>During the production phase of oil and gas wells it often happens that oil does
not enter every in ow point, which leads to a decrease in the e ciency of the
operation and undesired economic consequences4. It is bene cial to determine
which of the in ow points are inactive to properly design the intervention
operations. The main research hypothesis here is as follows: using the machine learning
approaches, the active and non-active in ow points can be predicted based on
the measurements of certain parameters at the wellhead, including pressure and
total gas and oil productivity.</p>
      <p>The paper is organized as follows. In Section 2 we formulate the studied
problem as a multilabel classi cation. Sections 3 and 4 explain the data generation
process and detail the performed data transformations, respectively. Section 5
describes the time-series speci c feature extraction process. Section 6 presents
the obtained classi cation results along with feature importance estimation.
Section 7 concludes the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Problem Statement</title>
      <p>The problem of in ow pro le monitoring can be formulated as follows.</p>
      <p>There are descriptions of objects X Rd, where d is the size of feature space,
and a nite set of class labels Y 0; 1 L. A nite training set of observations
is given as follows:
x(i); y(i) N</p>
      <p>i=1; where
x(i) = (x1; : : : ; xd) 2 X is the description vector of i-th object (one
measurement), y(i) = (y1; : : : ; yL) 2 Y is the label vector with
yj =
(1; if there is an oil in ow at j-th position</p>
      <p>0; otherwise.</p>
      <p>However, in our case, the description vector x(i) can be recast as containing
time series of d sensors within a certain time interval T = f1; 2; : : : ; tg:
x(i) = (x1; : : : ; xt)1; : : : ; (x1; : : : ; xt)d 2 Rd t:
x(i); y(i) N</p>
      <p>i=1, it is necessary to construct a mapping</p>
      <p>Using a training set st =
function (classi er):</p>
      <p>h : X ! Y
For each test instance x~ 2 X, we get a prediction: y^ = h(x~).</p>
      <p>Thus, the problem of multilabel classi cation is to be solved, in which the
object can belong to several classes at the same time, and the classes are not
mutually exclusive. For example, this type of problem arises in text mining, namely
in automatic tag's assignment, text categorization and classi cation, similarly
for categorization of images, etc. Multilabel classi cation is an extension of the
traditional classi cation problem with several classes, i.e. multi-class problem.
Approaches to solve this problem are mentioned in section 6 and can be partially
found in [10, 7].
3</p>
    </sec>
    <sec id="sec-3">
      <title>Data Generation</title>
      <p>The training data are obtained as a result of numerical simulations that
describe the physical processes taking place in wells [9]. For the given input
parameters(wellbore geometry, initial distribution of volume fractions of phases,
pressure in the wellhead, choke size, etc.), the simulator models the behavior of
the wellbore for a given time interval T and generates the following time series:
{ BHP (t) is the bottomhole pressure (measured at the source closest to the
surface);
{ W HP (t) is the wellhead pressure;
{ Qo(t) is the surface oil owrate;
{ Qw(t) is the surface water owrate;
{ Qg(t) is the surface gas ow rate.</p>
      <p>The target vector y of length 20 is generated randomly and consists of ones
and zeros, characterizing the presence or absence of in ow in one of the 20
prede ned in ow points along the wellbore. In the present work, 5000 simulation
realizations are used.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Data preparation</title>
      <p>Given that each time series is large and has complex structure, which may carry
latent complex patterns, it is necessary to transform it to a smaller space of
more informative features than only the values of the series at a certain
timesteps t. For example, one can extract minimal and maximal values, the number
of local maxima and minima (\peaks"), take the average and median values,
etc. In addition, many machine learning algorithms are sensitive to data scaling.
Such algorithms, for example, include nearest neighbor method, Support Vector
Machine, etc. In this study, we will use two common types of data normalization:
normalization by standard deviation and the Min-Max normalization. Another
important task is to reduce the dimension of the feature space using di erent
methods, and we will examine the most popular ones, such as:</p>
      <sec id="sec-4-1">
        <title>1. Principle Component Analysis (PCA) 2. Independent Component Analysis (ICA) 3. Truncated Singular Value Decomposition (TSVD).</title>
      </sec>
      <sec id="sec-4-2">
        <title>Hence, the original task is divided into two subtasks:</title>
        <p>1. Determination of the appropriate feature space X0
2. The choice/tuning of the optimal classi er h.</p>
        <p>The average size of 0/1-loss on the test sample of size M is used as a quality
criterion. To characterize the average prediction accuracy of each in ow point
one can consider the whole vector of 0/1-loss for all in ow points. Thus, in our
experiments the averaged accuracy of an in ow point at di erent positions varies
showing higher values for several rst positions (closer to the surface, see Fig. 1).</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Feature extraction from time series</title>
      <p>The set of predictors for training in the initial sample is represented by time
series, from which it is possible to extract a set of additional parameters that
can positively a ect the quality of algorithms [3].</p>
      <p>Fourier transform is one of the basic tools in signal analysis. This transform
allows to move from time domain to frequency domain, that is, to get rid of
the signal shifts in time. Discrete Fourier Transform (DFT) is used for discrete
signals.</p>
      <p>An alternative to the Fourier transform is the wavelet transform, which is a
convolution of the wavelet function to the signal. The wavelet transform
translates the signal from the time representation to its time-frequency representation.
For discrete signals, a discrete wavelet transform is applied by a set of lters.
First, the signal is passed through a low-frequency lter (LF- lter) with a pulse
response g:
s^[n] =
+1</p>
      <p>X s[k]g[n
k= 1
k]</p>
      <p>At the same time, the signal is similarly decomposed using a high-frequency
lter f (HF- lter). The result contains detailed coe cients (after the HF- lter)
and approximation coe cients (after the LF- lter). After completing the
procedure the samples of the signals are downsampled by a factor of 2.</p>
      <p>Di erent output values of linear regression were also used as features. In our
case, we used a sample from a time series as a predictor, and a discrete sequence
from 0 to a number equal to the length of the sample minus 1 as the target
variable.</p>
      <p>Another attribute is the mean squared of the time series, which is given
below:</p>
      <p>The average absolute change was also taken into account, which is simply
the following:</p>
      <p>E =</p>
      <p>X
i=1;:::;n</p>
      <p>xi2:
1
n</p>
      <p>X
i=1;:::;n 1
jxi+1
xij:</p>
      <p>Among many more parameters that can be used to enlarge the feature space
are average, standard deviation, median, dispersion, min/max value, trend,
number of min/max values, lower/upper quartile, and last position of min/max value.</p>
      <p>All the aforementioned features in this section can be calculated by
specialized Python libraries. Here we have used tsfresh library [2] and produced more
than 1200 features5.
5 The full list of possible features to extract can be found by the link
https://tsfresh.readthedocs.io/en/latest/text/list of features.html</p>
    </sec>
    <sec id="sec-6">
      <title>Experiments</title>
      <p>To conduct experiments with the data obtained by the simulators, a set of 5000
numerical simulations was generated, for each of which there are indications of
4 di erent sensors that produce measurements for 3600 seconds with a sampling
rate of 1 Hz. The average 0/1-loss for each in ow point (or averaged by all of
them) on the test sample is used as a quality criterion. The split into training
and test sets was made by randomly sampling generated observations in the
ratio of 4:1.</p>
      <p>
        The rst experiment was to test the approach of independent classi ers,
separately for each of the 20 sources. In addition to selection of the optimal classi er,
it is necessary to correctly determine the appropriate feature space X0 . For this
purpose, many di erent methods of dimensionality reduction and normalization
have been tested both for the initial data and for the extracted time series
features. Every dimensionality reduction method was tested on the following set of
classi cation algorithms: Random Forest (RF), SVM, kNN, XGBoost [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The
mean 0/1-loss varies from 0.36 to 0.39.
      </p>
      <p>The best algorithm was XGBoost with PCA over z-score normalization of
features obtained from the time series. The same combination of the dimension
reduction method and the algorithm, but with min-max normalization resulted
in the third best performance.
Experiment 2 was to build an ensemble of the top 10 of the best performing
algorithms and determine the label by majority voting. As it had been expected,
the results were slightly better, the average value of the loss function 0/1 was
equal to 0.31.</p>
      <p>During the third experiment aimed at testing the approach of classi er chains [7],
a correlation matrix was built between the values of all sources. By chain of
classi ers, Read et al. [7] mean a simple classi er cascade where after prediction of
the rst component of a target vector, the second component is predicted on the
same set of features plus the prediction for the rst component (or its known
value for training data) as an extra feature, and similarly for the sequence of the
remaining components. In the resulting matrix there were no correlation greater
than 0.1, so the option of building classi er chains would not bring signi cant
improvement in quality.</p>
      <p>The fourth experiment was originally to predict the number of active in ow
zones. For each sample in the available training data, the number of active in ow
zones was counted and the task of multiclass classi cation was compiled. The
prediction accuracy was 1. Having received such a good result, it was proposed
to build a version of the cascade classi er, working on the following scheme:
1. We predict the number of working sources.
2. We obtain the probabilities of class 1 for each source separately.
3. Sort the probabilities in descending order.
4. Get the number of sources equal to one at di erent probability thresholds
(calibration step).
5. If the number of sources labeled by \1" (i.e., active sources) at a given
probability threshold is greater than the predicted number of sources, then
put the label \0" for the sources whose probability is the lowest until the
number of active in ow points (predicted working sources) becomes equal to
their predicted number.</p>
      <p>However, this algorithm not only did not reduce the f0; 1g-loss function more
than the ensemble, but signi cantly increased it to 0.44. This can be explained
by the fact that in the current scheme of the cascade algorithm, we did not
process the option when the number of sources is less than that of the predicted
ones. In addition, a signi cant part of the probabilities of sources to belong to
class 1 is very similar, which does not allow one to exclude only the wrong values.</p>
      <p>The fth experiment was designed to use both the initial data and the
extracted features from the initial data. The XGBoost method was chosen as an
algorithm, the following set of features was used as a feature space:
{ 300 ICA components applied on the training set transposed time series
normalized by Z-score;
{ 300 PCA components applied to more than 1200 features extracted from the
time series;
{ the number of working sources for this simulation (can be predicted by a
simple binary classi er, e.g., logistic regression);
The result of this method was the reduction of the loss function error to 0.26,
which is the best result in this study.</p>
      <p>For the sake of comparison, a series of experiments using deep neural
networks was conducted in Keras over Tensor ow. We used both LSTM ([4]) and
CNN networks ([6]) as well as their mixture over all 5000 examples given as
normalized and concatenated time-series in 4500/500 learning scenario for training
and validation. The highest validation accuracy was about 0.59.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>We considered and tested methods of extracting signi cant features from
multivariate time series, methods of data normalization and dimensionality reduction.
Several basic algorithms and their ensembles were tested as well as a cascade of
two classi cation algorithms was proposed and applied.</p>
      <p>The best result, 0.26, in terms of average f0; 1g-loss was shown by the
XGBoost method with specially constructed sets of features.</p>
      <p>The results of our experiments are summarized in Table 1</p>
      <p>Our analysis demonstrates that in ow pro le monitoring using surface
measurements is a challenging problem. However, the combination of machine
learning techniques allows to get results signi cantly better than random guess. We
hope that further enhancement of specially designed methods based on classi er
ensembles, relevant deep neural networks architectures and times-series features
extraction techniques may further improve the quality of multi-label prediction
in the studied problem.</p>
      <p>Acknowledgments The work of Dmitry Ignatov (Sections 2, 6, and 7) was
supported by the Russian Science Foundation under grant 17-11-01294 and
performed at National Research University Higher School of Economics, Russia.
2. Maximilian Christ, Nils Braun, Julius Neu er, and Andreas W. Kempa-Liehr.</p>
      <p>Time series feature extraction on basis of scalable hypothesis tests (tsfresh a
python package). Neurocomputing, 307:72 { 77, 2018.
3. Marco Fagiani, Stefano Squartini, Leonardo Gabrielli, Marco Severini, and
Francesco Piazza. A statistical framework for automatic leakage detection in smart
water and gas grids. Energies, 9(9), 2016.
4. Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural
Comput., 9(8):1735{1780, November 1997.
5. Dmitry I. Ignatov, Konstantin Sinkov, Pavel Spesivtsev, Ivan Vrabie, and Vladimir
Zyuzin. Tree-based ensembles for predicting the bottomhole pressure of oil and
gas well ows. In Wil M. P. van der Aalst et al., editor, Analysis of Images,
Social Networks and Texts - 7th International Conference, AIST 2018, Moscow,
Russia, July 5-7, 2018, Revised Selected Papers, volume 11179 of Lecture Notes in
Computer Science, pages 221{233. Springer, 2018.
6. Yann LeCun and Yoshua Bengio. Convolutional networks for images, speech, and
time series. In Michael A. Arbib, editor, The Handbook of Brain Theory and Neural
Networks, pages 255{258. MIT Press, Cambridge, MA, USA, 1998.
7. Jesse Read, Bernhard Pfahringer, Geo Holmes, and Eibe Frank. Classi er chains
for multi-label classi cation. Machine Learning, 85(3):333{359, 2011.
8. Pavel Spesivtsev, Konstantin Sinkov, Ivan Sofronov, Anna Zimina, Alexey Umnov,
Ramil Yarullin, and Dmitry Vetrov. Predictive model for bottomhole pressure
based on machine learning. Journal of Petroleum Science and Engineering, 166:825
{ 841, 2018.
9. Pavel E. Spesivtsev, Andrey D. Kharlashkin, and Konstantin F. Sinkov. Study of
the transient terrain-induced and severe slugging problems by use of the drift- ux
model. SPE Journal, 22(SPE-186105-PA), 2017.
10. Grigorios Tsoumakas and Ioannis Katakis. Multi-label classi cation: An overview.</p>
      <p>IJDWM, 3(3):1{13, 2007.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Tianqi</given-names>
            <surname>Chen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Carlos</given-names>
            <surname>Guestrin</surname>
          </string-name>
          .
          <article-title>Xgboost: A scalable tree boosting system</article-title>
          .
          <source>In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          , San Francisco, CA, USA,
          <year>August</year>
          13-
          <issue>17</issue>
          ,
          <year>2016</year>
          , pages
          <fpage>785</fpage>
          {
          <fpage>794</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>