<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Deep Learning Algorithm For Personalized Blood Glucose Prediction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Taiyu Zhu</string-name>
          <email>taiyu.zhu17@imperial.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kezhi Li</string-name>
          <email>kezhi.li@imperial.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pau Herrero</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jianwei Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pantelis Georgiou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Electronic and Electrical Engineering, Imperial College London</institution>
          ,
          <addr-line>London SW5 7AZ</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A convolutional neural network (CNN) model is presented to forecast the future glucose levels of the patients with type 1 diabetes. The model is a modified version of a recently proposed model called WaveNet, which becomes very useful in acoustic signal processing. By transferring the task into a classification problem, the model is mainly built by casual dilated CNN layers and employs fast WaveNet algorithms. The OhioT1DM dataset is the source of the four input fields: glucose levels, insulin events, carbohydrate intake and time index. The data is fed into the network along with the targets of the glucose change in 30 minutes. Several pre-processing approaches such as interpolation, combination and filtering are used to fill up the missing data in the training sets, and they improve the performance. Finally, we obtain the predictions of the testing dataset and evaluate the results by the root mean squared error (RMSE). The mean value of the best RMSE of six patients is 21:72. This work is submitted to the Blood Glucose Level Prediction Challenge, the 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence (IJCAI-ECAI 2018), International Workshop on Knowledge Discovery in Healthcare Data. yThis work is supported by EPSRC, the ARISES project. T. Zhu and K. Li are the main contributors to the paper.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>With an increasing incidence worldwide, type 1 diabetes is a
severe chronic that requires long-term management of blood
glucose relying on the glucose predictions [Daneman, 2006].
Aiming at improving the accuracy of the predictions,
artificial intelligence researchers have been investigating machine
learning approaches to develop efficient forecasting models.</p>
      <p>In this paper, the main system is constructed by a deep
convolutional neural network (CNN). It origins from a proposed
model called WaveNet, which is firstly developed by the firm
DeepMind to process raw audio signals [Van Den Oord et al.,
2016]. The glucose data of the patients are sequentially
obtained by continuous glucose monitoring (CGM) in every five
minutes. The change between the current glucose value and
the future glucose value is quantised to 256 target categories.
Under such circumstance, the prediction problem is converted
to a classification task, which can be properly solved. After
pre-processing datasets and building the modified WaveNet
model, the prediction results for 30-minute prediction
horizons (PH) are obtained.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Data Pre-processing</title>
      <sec id="sec-2-1">
        <title>Database</title>
        <p>The source of training and testing data is from the OhioT1DM
dataset developed by [Marling and Bunescu, 2018]. There are
six patients with type 1 diabetes wearing Medtronic 530G
insulin pumps and Medtronic Enlite CGM sensors to collect
the data during the 8-week period. Each of the patients
reports the data on daily events via an app on a smartphone and
a fitness band. In the OhioT1DM dataset, the patients are
numbered as 559, 563, 570, 575, 588 and 591. Two of them
are male with ID 563 and 570, while others are female. Three
of the nineteen data fields, including previous CGM data by
’glucose level’, insulin value by ’bolus’, carbohydrate intake
by ’bwz carb input’ and time index normalised to the unit for
each day are used as the four channels of inputs. The meal
information is obtained by the pump Bolus Wizard (BWZ),
which is input by patients to calculate the bolus. Other fields
of the patient data are also added to the input batch, such as
heart rate and skin temperature. However, in our experiment,
these fields slightly degrade the performance of classification
and bring more variances to the model.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Interpolation and Extrapolation</title>
        <p>By observing the glucose data, several intervals miss values
in both training and testing sets. Since the targets of the CNN
model rely on the differences between current and future data
points, the discontinuities can cause negative influences. We
fill up the missing values of the training dataset by the
firstorder interpolation. For the testing data, the first-order
extrapolation is taken to ensure the future values are not involved.
The predictions by extrapolated intervals are ignored to
guarantee that the result has the same length as the CGM testing
data when evaluating the performance.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3 Combination</title>
        <p>For the training dataset, it contains the data from six
patients around 40 days and 115,200 CGM data points.
Usually an effective machine learning model needs training data
with much larger size. Moreover, the missing intervals
appear frequently in the whole dataset. In our model, the data
points with large missing gaps are discarded, and we
interpolate the values only for short intervals. To have a longer
training dataset and avoid the overfitting problem, we expand
the training set and improve the generalisation. We introduce
a part of the data with the longest continuous interval from
other subjects and combine them into the current subject to
form an extended training data. Notably, the strategy keeps
half proportion data of the current subject, and the other five
patients contribute to the other half of the training data with
10% each. The segments from other patients are selected by
observation with the fewest missing values. Thus, the length
of the dataset is expanded. This method significantly
improves the performance of patient 591, who has long missing
data interval (967-point) in the training set.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4 Filtering</title>
        <p>After the interpolation and combination, it is found that there
are many small spikes near the peaks or the turning points on
the CGM data of the training dataset. Those spikes are a part
of variances when the batches are used to train the model. To
remove these variances, we use a median filter to filter out
the noises at the cost of raising the bias slightly. The window
size needs to be appropriately chosen, which is five-point in
this work. The median filter is not used on the testing data,
so the on-line prediction is still feasible. The outcome of data
pre-processing is shown in Figure 1.</p>
        <p>Original
Linear Interp
Median Filter
300
suitable model with a four-channel feeding dictionary and
non-linear glucose-insulin interaction. Moreover, our work is
based on WaveNet that is more time efficient for training and
testing with smaller weights compared with recurrent neural
networks (RNNs) [Borovykh et al., 2017]. It focuses on the
long-term relationships between channels and is conditioned
on all previous samples [Van Den Oord et al., 2016], which
is modelled as (1).</p>
        <p>T
p(x) = Y p(xtjx1; ::::; xt 1) (1)</p>
        <p>t=1
where x = x1; :::; xT and p(x) is the joint probability
computed by the product of conditional probabilities. The
output dimension is the same as the input because there are no
pooling layers. Convolutional layers model the conditional
probabilities, and a softmax layer is applied to maximise the
log-likelihood probabilities.</p>
      </sec>
      <sec id="sec-2-5">
        <title>3.1 The Causal CNN</title>
        <p>The main components in WaveNet are causal convolutional
layers. After shifting the outputs by several data points, the
1D causal convolution layers can be implemented. The
causality is essential for CNN to forecast time series since it
guarantees that the model cannot use any information from future
timesteps. Particularly, one special ingredient is causal
dilated convolutional neural network (DCNN) layer. It largely
increases the receptive field of the input signal. The structure
is shown in Figure 2. Compared with regular causal
convolutional layers, the dilated one involves a larger number of
input nodes. This setting makes the system capable of learning
long-term dependencies by skipping certain steps that
determined by the dilation factor.</p>
        <p>Output Layer
Dilation = 16
Hidden Layer
Dilation = 8
Hidden Layer
Dilation = 4
Hidden Layer
Dilation = 2
Hidden Layer
Dilation = 1
Input
7500
7550
7600
7650
Index
7700
7750
7800
Predicting future values for time series is one of the essential
problems in data science. A conventional method is to use
autoregressivemoving-average (ARMA) process to model the
patterns. However, the performance of the ARMA model
is not satisfactory in this task since it is incapable to
capture non-linearities [Hamilton, 1994]. Feed-forward neural
networks can overcome this difficulty and learn the patterns
of multivariate time series well by feeding the data with
extremely large size [Zhang et al., 1998]. Thus it is a more</p>
        <p>In this work, there are three blocks, and the DCNN block
contains five layers of dilated convolution. The dilation
increases from one to a certain number in each block. The
motivation behind this configuration is to exponentially grow the
receptive field to cover more previous information and obtain
a more efficient model with 1 32 convolution.</p>
      </sec>
      <sec id="sec-2-6">
        <title>3.2 System Architecture</title>
        <p>We adopt the fast approach to implement the WaveNet
method, so it removes redundant convolutional operations
and reduces the time complexity from O(2L) to O(L) [Paine
et al., 2016], where L is the total number of the layers. The
fundamental technique is to create convolution queues that
divide the operations into pop and push phases. The model
functionally acts as a single step of RNNs. The system model
for this project with fast WaveNet is shown in Figure 3.</p>
        <p>Residual
Connection</p>
        <p>Output Layer (i)
+
ReLU</p>
        <p>Dilated
Convolution</p>
        <p>Predictions
Softmax
1×1
Convolution
256
32
Output Layer (i-1)</p>
        <p>Output Layer L</p>
        <p>Compared with the work in [Van Den Oord et al., 2016],
we use a rectified linear unit (ReLU) as the activation
function, instead of gated function, which is denoted as ReLU(x)
:= max(x, 0). How the model learns the non-linearities of the
data series largely depends on the activation function. It is
found in [Borovykh et al., 2017] that the ReLU is very
efficient for non-stationary or noisy time series. Moreover, it
also reduces the training time and simplifies the model
further. The output from layer i is written as (2).
f i = [ReLU (w1i d f i 1) + b; :::; ReLU (wTii d f i 1) + b]
(2)
where f i is the output of the CNN layer i after the dilated
convolution d with weight filters wli, l = 1; 2; :::; Tl, and
b stands for the bias. To model the conditional probabilities,
Softmax is applied to calculate the training loss and output the
predictions. The reason for this is because of its flexibility
which has no requirement of the data shape. Thus it works
well for continuous 1-D data [Oord et al., 2016].
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Training WaveNet</title>
      <p>After pre-processing patient data and constructing the
WaveNet system, the following step is to train the network;
then the test data can be fed into the trained model.
4.1</p>
      <sec id="sec-3-1">
        <title>Make Batches</title>
        <p>The inputs of the neural network are four channels: CGM
data, insulin event, carbohydrate intake, and time index. The
batches of the testing phase have the same structure. The
PH is 30 minutes, so it requires to forecast the CGM values
6 points in the future. Therefore, we calculate the glucose
change between the current value and the future value in the
PH. By quantisation, we put the change of glucose values in
256 classes/categories as targets with a difference of 1 mg=dl
between each class. The number of classes is chosen
carefully because a small number of classes cannot distinguish the
difference while a large number of classes are not suitable for
small training dataset. After investigating the training dataset,
we think that the value of 256 is able to cover more than 95%
of difference values, referring to the glucose change in the
range of 128 mg=dl within 30 mins.</p>
      </sec>
      <sec id="sec-3-2">
        <title>4.2 Weight Optimisation</title>
        <p>The training process is to find the weights that minimise the
cost function of the network. The cost function is one of the
most important indicators in the training phase, which
represents the error between targets and the network output. In
the proposed system, sparse Softmax cross entropy is applied
to optimise the model. Generally, the optimisation follows
the gradient descent method that calculates weights through
backward propagation, and the weights are updated after each
iteration. Here we use adaptive moment estimation (Adam)
Optimiser to adjust training steps by minimising the mean
cost, and the learning rate is set to 0.0001. Adam optimiser
has the promising performance on non-stationary time series
[Kingma and Ba, 2014]. It uses first-order gradients and can
be implemented with high computational efficiency. We set
the number of total training iteration as 1,000 to avoid the
underfitting or overfitting problem. The cost function loss
versus the global steps is shown in Figure 4.</p>
        <p>5.8
5.6
5.4
t5.2
s
o
C 5
g
n
i
in4.8
a
r
T
4.6
4.4
4.2
40</p>
        <p>400 500 600</p>
        <p>Iterations of Global Steps</p>
        <p>The input data has the same length as the outputs. The
curve smoothly decreases in the first 500 iterations, and some
spikes appear in iterations after 500, which is the
consequence of mini-batch operations by Adam optimiser. It is
noted that the cost is still converging after 1,000 global steps,
but it causes over-fitting. The reason why over-fitting is likely
to happen is the complexity of the model having limited
training data.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Performance</title>
      <sec id="sec-4-1">
        <title>5.1 Results</title>
        <p>The unit of glucose level in this paper is mg=dl, and we use
one of the essential evaluation metrics to evaluate the
prediction performance: root mean squared error (RMSE), which
can be expressed as</p>
        <p>RMSE = tuu N1
v</p>
        <p>N
X(x^(tjt P H)
t=1
xt)2:
(3)</p>
        <p>We mainly focus on RMSE for each patient and record the
results for each patient after several runs, whose result are
shown in Table1. The mean of the best results is 21.7267 with
the standard deviation of 2.5237. The performance varies
with subjects, and the method for subject 570 obtain the best
result.
The forecasting curve and original CGM data are plotted in
Figure 5. Notably, the predicted curve fits original CGM
recording with similar trends in general. However, there still
exists a degree of difference that can be seen in the detail view
of one-day CGM data. First, it is noted that there is an
obvious delay between predictions and raw data, especially for the
turning points and peaks. In the bottom plot, the dashed line
stands for the insulin events, and the red circle is the meal
events. Intuitively, it is found the curve will change
significantly after these events. The curve intensively fluctuates, and
the error is high in these periods. The possible reason is that it
is difficult for the model to learn the biological model
explicitly, and those glucose level changes are not determined only
by the input data. However, it is also found that the RMSE
result improves by 0.8 mg=dl when feeding the four channels
of data fields instead of solely CGM data.</p>
        <p>Another finding is the effect of extrapolation. There are
some missing values around 18:00 and we use the first-order
extrapolation to fill up the time series. However, the error is
still high for the data after these regions. Because the
predictive curve is calculated by adding the predicted differences
onto the previous values, it heavily depends on the data 6
timesteps before. We only extrapolate the CGM field,
because the other three channels are discrete values. As shown
in Figure 5, the insulin and carbohydrate intake have
significant impacts on the future values, so it would cause more
error if these data points are missing. Several different
interpolation methods are also tested in the training phase, such as
cubic and spline. The first-order interpolation performs best
by reducing mean RMSE by 2.1 mg=dl because it captures
linearities of 1-D signal well.</p>
        <p>For the first six data points, the predictions are calculated
on concatenating the last part of the training data at the front
of the testing data. As long as the trend of concatenated data
is the same as the subsequent timesteps, it will not affect the
340
320
/l)d 300
g
l(m280
e
evL 260
seo 240
c
luG 220
200
180
0:00</p>
        <p>1500</p>
        <p>Time Index
500
1000
2000
2500</p>
        <p>3000
Raw data
Predictions
Meal Event
Insulin Event
6:00
12:00
Time
18:00
24:00</p>
        <p>RMSE results much. The length of the predictions is the same
as the testing dataset, as the point number shown in Tabel 1.</p>
        <p>The predictions for subject 575 and 591 performs much
worse than other subjects. There are two reasons. On the one
hand, it is the large gap in the training dataset, as in subject
591. Although we use data combination approach to
compensate for this loss and successfully reduce the mean RMSE
by around 0.2 mg=dl, the RMSE is still quite high compared
with others. On the other hand, it is the condition of the
testing dataset. For subject 591, the CGM data of testing sets
fluctuates a lot with plenty of spikes, and the error is high
near the turning point of the curve. However, those
fluctuations are determined by the biological model and the
conditions of patient health, such as the plasma insulin model in
[Lehmann and Deutsch, 1992]. Moreover, subject 570 and
563 use ”Humalog” insulin while other four subject use
”Novalog”. We found that predictions are more accurate for the
subjects with ”Humalog” insulin. Another possible feature
of these two patients is they have the same gender. The data
from a larger group of subjects is required to prove and
explore the correlations.</p>
        <p>Compared with existing models, this paper presents a novel
deep learning model based on CNN layers. The performance
outperforms the simple autoregressive models using only the
same CGM data, which follows the structure in [Sparacino et
al., 2007]. As for the results from other deep learning models,
the RMSE results cannot be compared directly, due to
different subjects and datasets, such as [Mougiakakou et al., 2006].
Nevertheless, the major advantages of this models are higher
efficiency with less global training steps and small weights,
and it has the fast algorithmic implementation with the low
time complexity O(L).
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this paper, the task to predict the glucose level in
30minute PH is converted into a classification problem, and a
new model based on modified WaveNet is developed. With
the pre-processed dataset and causal DCNN system
architecture, the network is trained to obtain the network weights.
Four channels are selected to input the network because we
find they have the strong correlations with glucose levels.</p>
      <p>The mean value of the best RMSE for six subjects is
21.7267 with standard deviation equals to 2.5237. The model
is different from existing RNN models and outperforms many
current algorithms. The prediction performances mainly
affected by the missing CGM values and the length of the
training sets. By integrating other data fields with biological
models is a potential approach to improve the prediction accuracy
in the future work.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Borovykh et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Anastasia</given-names>
            <surname>Borovykh</surname>
          </string-name>
          , Sander Bohte, and Cornelis W Oosterlee.
          <article-title>Conditional time series forecasting with convolutional neural networks</article-title>
          .
          <source>arXiv preprint arXiv:1703.04691</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Daneman</source>
          , 2006]
          <string-name>
            <given-names>Denis</given-names>
            <surname>Daneman</surname>
          </string-name>
          .
          <article-title>Type 1 diabetes</article-title>
          .
          <source>The Lancet</source>
          ,
          <volume>367</volume>
          (
          <issue>9513</issue>
          ):
          <fpage>847</fpage>
          -
          <lpage>858</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[Hamilton</source>
          , 1994]
          <article-title>James Douglas Hamilton</article-title>
          .
          <source>Time series analysis</source>
          , volume
          <volume>2</volume>
          . Princeton university press Princeton,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>[Kingma and Ba</source>
          , 2014]
          <article-title>Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization</article-title>
          .
          <source>arXiv preprint arXiv:1412.6980</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[Lehmann and Deutsch</source>
          , 1992] ED Lehmann and
          <string-name>
            <given-names>T</given-names>
            <surname>Deutsch</surname>
          </string-name>
          .
          <article-title>A physiological model of glucose-insulin interaction in type 1 diabetes mellitus</article-title>
          .
          <source>Journal of biomedical engineering</source>
          ,
          <volume>14</volume>
          (
          <issue>3</issue>
          ):
          <fpage>235</fpage>
          -
          <lpage>242</lpage>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[Marling and Bunescu</source>
          , 2018]
          <string-name>
            <given-names>Cindy</given-names>
            <surname>Marling</surname>
          </string-name>
          and
          <string-name>
            <given-names>Razvan</given-names>
            <surname>Bunescu</surname>
          </string-name>
          .
          <article-title>The OhioT1DM dataset for blood glucose level prediction</article-title>
          .
          <source>In The 3rd International Workshop on Knowledge Discovery in Healthcare Data</source>
          , Stockholm, Sweden,
          <year>July 2018</year>
          . CEUR proceedings in press, available at http://smarthealth.cs.ohio.edu/bglp/ OhioT1DM-dataset-paper.pdf,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Mougiakakou et al.,
          <year>2006</year>
          ] Stavroula G Mougiakakou,
          <article-title>Aikaterini Prountzou, Dimitra Iliopoulou, Konstantina S Nikita, Andriani Vazeou, and Christos S Bartsocas. Neural network based glucose-insulin metabolism models for children with type 1 diabetes</article-title>
          .
          <source>In Engineering in Medicine and Biology Society</source>
          ,
          <year>2006</year>
          . EMBS'
          <volume>06</volume>
          . 28th Annual International Conference of the IEEE, pages
          <fpage>3545</fpage>
          -
          <lpage>3548</lpage>
          . IEEE,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Oord et al.,
          <year>2016</year>
          ] Aaron van den Oord, Nal Kalchbrenner, and
          <string-name>
            <given-names>Koray</given-names>
            <surname>Kavukcuoglu</surname>
          </string-name>
          .
          <article-title>Pixel recurrent neural networks</article-title>
          .
          <source>arXiv preprint arXiv:1601.06759</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Paine et al.,
          <year>2016</year>
          ] Tom Le Paine, Pooya Khorrami,
          <string-name>
            <surname>Shiyu</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Prajit Ramachandran,
          <article-title>Mark A Hasegawa-Johnson, and Thomas S Huang. Fast wavenet generation algorithm</article-title>
          .
          <source>arXiv preprint arXiv:1611.09482</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Sparacino et al.,
          <year>2007</year>
          ]
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Sparacino</surname>
          </string-name>
          , Francesca Zanderigo, Stefano Corazza, Alberto Maran, Andrea Facchinetti, and
          <string-name>
            <given-names>Claudio</given-names>
            <surname>Cobelli</surname>
          </string-name>
          .
          <article-title>Glucose concentration can be predicted ahead in time from continuous glucose monitoring sensor time-series</article-title>
          .
          <source>IEEE Transactions on biomedical engineering</source>
          ,
          <volume>54</volume>
          (
          <issue>5</issue>
          ):
          <fpage>931</fpage>
          -
          <lpage>937</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>[Van Den</surname>
          </string-name>
          Oord et al.,
          <year>2016</year>
          ]
          <string-name>
            <given-names>Aaron</given-names>
            <surname>Van Den Oord</surname>
          </string-name>
          , Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner,
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Senior</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Koray</given-names>
            <surname>Kavukcuoglu</surname>
          </string-name>
          .
          <article-title>Wavenet: A generative model for raw audio</article-title>
          .
          <source>arXiv preprint arXiv:1609.03499</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Zhang et al.,
          <year>1998</year>
          ]
          <string-name>
            <given-names>Guoqiang</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B Eddy</given-names>
            <surname>Patuwo</surname>
          </string-name>
          , and Michael Y Hu.
          <article-title>Forecasting with artificial neural networks:: The state of the art</article-title>
          .
          <source>International journal of forecasting</source>
          ,
          <volume>14</volume>
          (
          <issue>1</issue>
          ):
          <fpage>35</fpage>
          -
          <lpage>62</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>