DoA and ToA Estimation Method of OFDM Signal Based on
Cascaded Deep Neural Network
Chaofan Zheng 1, Shaoshuai Fan1, Hui Tian1, Bin Ren2, Ren Da 2, Zhenyu Zhang3 and Shaohui
Sun2
1
  State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and
   Telecommunications, Beijing, China
2
  State Key Laboratory of Wireless Mobile Communications, China Academy of Telecommunications Technology
    (CATT), Beijing, China
3
  School of Electronic and Information Engineering, Beihang University, Beijing, China


                Abstract
                Accurate estimation of the direction of arrival (DoA) and time of arrival (ToA) are very
                important in many scenarios such as accurate positioning. However, it is challenging in
                environments with multipath propagation and noise. This paper proposes the DoA and ToA
                estimation method of OFDM signal based on a cascaded deep neural network (DNN) with a
                uniform grid array (UGA). In the proposed method, we use the channel state information (CSI)
                matrix as the network input rather than the correlation matrix. Simulation results show that the
                trained deep neural network has better estimation accuracy under multipath propagation and
                noisy interference environment compared with the conventional DoA and ToA estimation
                method.

                Keywords 1
                Direction of arrival, time of arrival, deep learning, convolution neutral network

1. Introduction
    Direction of arrival (DoA) and time of arrival (ToA) of wireless signals are widely used in
commercial and military fields, such as indoor positioning, underwater and air target tracking and
monitoring, and some intelligent robots. Under these applications, it is often necessary to obtain both
DoA and ToA. Estimation of DoA and ToA is relatively straightforward under high signal-to-noise (SNR)
conditions. However, in complex wireless environments where the transmitted signal is subject to fading
and interference, the SNR is low and there are few effective components in the received signal, the
estimation of ToA and DoA is extremely challenging.
    In the past few years, many physically driven methods have been proposed to estimate DoA and ToA
with high accuracy including matrix pencil (MP) , multiple signal classification algorithm (MUSIC),
estimation of signal parameters via rotation variance (ESPRIT), manifold separation technique, etc. In
[1], the array manifold matrix was constructed by using the spatial characteristics of the uniform circular
array (UCA) and the time diversity of OFDM subcarriers, then a virtual space smoothing method is
designed to enhance the covariance matrix of the signal, and MUSIC algorithm was used to estimate the
DoA and ToA of the multipath signal. A 3-D matrix pencil method is proposed in [2], which decomposed
the covariance matrix of LTE signal by singular value decomposition, and extracts DoA and ToA
information from the obtained poles. In [3], an efficient maximum likelihood approximation algorithm
was proposed which alternately updated the DoA and time domain parameters.

IPIN 2021 WiP Proceedings, November 29 -- December 2, 2021, Lloret de Mar, Spain
EMAIL: zhengchaofan@bupt.edu.cn (C. Zheng); fanss@bupt.edu.cn (S. Fan); tianhui@bupt.edu.cn (H. Tian);
renbin@datangmobile.cn (B. Ren); renda@catt.cn (R. Da); zhangzhenyu1@datangmobile.cn (Z. Zhang);
sunshaohui@datangmobile.cn (S. Sun)
ORCID: 0000-0002-0845-8644 (C. Zheng); 0000-0002-2344-9498 (S. Fan); 0000-0001-8876-1389 (H. Tian);
0000-0003-2918-0204 (B. Ren); 0000-0002-8631-9623 (R. Da); 0000-0002-1050-2501 (Z. Zhang);
0000-0003-1383-8833 (S. Sun)
             ©️ 2020 Copyright for this paper by its authors.
             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)
    In recent years, with the continuous progress of artificial intelligence technology, deep neural network
(DNN) [4] is widely used in image processing, speech recognition, pattern recognition and other fields
[5]. In addition, research on DNNs has also been spread to communication areas such as signal
processing, channel estimation [6] and so on. DNN has many advantages: DNN extracts features layer
by layer and combines lower layer features to form higher layer features, allowing for distributed
representation of data [7]; the multi-layer hidden layers of DNN have great non-linear fitting capabilities,
allowing for effective mapping of the relationship between inputs and outputs. Although the training of
DNN may take some time, the trained DNN has a fast computing speed and can get output results quickly.
Therefore, the use of DNN for DoA and ToA estimation is an attractive option.
    DNN-based DoA and ToA estimation has been studied by many scholars. A fully connected neural
network is used for DoA estimation to verify the robustness of DNN under different signal-noise ratio
(SNR)conditions [8]. To improve the estimation accuracy, [9,10,11,12] regarding DNN as a high-
performance filter, the function of filtering is realized by learning the mapping relationship between the
clean covariance matrix and the noisy covariance matrix of low angle of the arrival radar signal. DoA
estimation is modeled as an angle classification problem, and recurrent neural network RNN) is used to
learn the mapping relationship between sampling covariance matrix and angle [13]. In order to increase
the accuracy of DoA estimation of neural network under different SNR, a cascade neural network
structure was proposed in [14]. The SNR was used as the input of the network, and the DoA network
was selectively used according to the strength of the SNR. [15] proposed a deep learning-based
framework for preamble detection and ToA estimation with high accuracy under multipath and noise
interference. [16] presented a learning-based algorithm that estimates the ToA of radio frequency (RF)
signals from channel frequency response (CFR) measurements for wireless localization applications. [17]
proposed a Convolution neutral network (CNN)-based method which can overcomes the negative effect
of false peaks in block interleaved frequency division multiplexing (B-IFDM) structure.
    The above presentation shows the effectiveness of DNN in the estimation of DoA and ToA, and it’s
superiority over traditional methods in some conditions. Although much research has been done, the
joint estimation of TOA and DoA based on neural networks is lacking unattended. Much work has used
the covariance matrix of the received signal as input to the neural network, but this only estimates the
DoA and is not very sensitive to changes in ToA. The CSI matrix, on the other hand, is rich in information,
and can be learned by neural network to extract the features. In this paper, a cascaded neural network
structure is proposed to estimate DoA and ToA of OFDM signals. The cascaded neural network consists
of a filtering neural network and an estimation neural network. The filtering neural network performs
signal enhancement for low SNR CSI matrix to reduce noise. The estimation neural network provides
high accuracy estimation of DoA and ToA. The proposed cascaded neural network has higher accuracy
compared to some other physically driven and data driven methods.
   The paper is organized as follows. In section II, the signal model for the uniform grid array (UGA)
and DNN structure has been discussed. Section III introduces the structure and training strategy of
cascaded neural networks. Simulation parameters and results are given in Section IV and Section V
concludes the paper.

2. System model
2.1. Signal model

     As shown in Fig. 1, a UGA is used with H antennas to receive OFDM signals, and the distance
between antennas is 0.5λ, where λ is the wavelength. We assume that the OFDM signal transmitted by
the source impacts the antenna array with different directions of arrival through the line of sight (LOS)
and non-line of sight (NLOS) paths. The space corresponding matrix of 𝑗 𝑡ℎ antenna can be expressed
as:
                              𝑎𝑗 (𝜃𝑙 , 𝜑𝑙 ) = 𝑒 𝑗2𝜋𝑟𝑗 𝑠𝑖𝑛𝜑𝑙𝑐𝑜𝑠⁡(𝜃𝑙 −𝛾𝑗)∙𝑓/𝑐                          (1)
   where f is the carrier frequency of the transmitted signal and c is the speed of light.⁡𝑟𝑗 and⁡𝛾𝑗 are
distance and angle between 𝑗 𝑡ℎ antenna and the origin of the coordinates respectively.⁡𝜃𝑙 and 𝜑𝑙 are the
azimuth of arrival (AoA) and zenith of Arrival (ZoA) of the incident signal, respectively.
                                                  nlos


                                                           d
                                         los


Figure 1: Configuration of the antenna array

    Similar to [1], we can obtain the channel state information, and then construct a CSI matrix in
which the 𝑛𝑡ℎ column is the 𝑛𝑡ℎ CSI snapshot:
                      𝒄(𝑛) = [𝑐𝑠𝑖11 (𝑛), 𝑐𝑠𝑖21 (𝑛), … , 𝑐𝑠𝑖12 (𝑛), … , 𝑐𝑠𝑖𝐾16 (𝑛)]                 (2)
               𝑗
     Where 𝑐𝑠𝑖𝑘 is the CSI of the 𝑘 𝑡ℎ subcarrier on the 𝑗 𝑡ℎ antenna. We can construct a design matrix
that contains both DoA and ToA parameters by using this matrix. the channel impulse response at the
center of the array is given by [3]:
                                                 𝐿

                                      ℎ(𝑡) = ∑ 𝛼𝑙 𝛿(𝑡 − 𝜏𝑙 )                                       (3)
                                                𝑙=1

where 𝜏𝑙 is the time delay of the 𝑙 path to the center of the array, and 𝛼𝑙 is the gain of the 𝑙 𝑡ℎ path.
                                    𝑡ℎ

The discrete Fourier transform result of ℎ(𝑡) is the channel frequency response, and the 𝑘 𝑡ℎ subcarrier
of CSI can be written as:
                            𝐿                        𝐿
                                   −𝑗2𝜋𝑓𝑘 𝜏𝑙
                   𝑐𝑠𝑖𝑘 = ∑ 𝛼𝑙 𝑒               = ∑ 𝛼𝑙 𝑒 −𝑗2𝜋𝑓1 𝜏𝑙 ∙ 𝑒 −𝑗2𝜋(𝑘−1)∆𝑓𝜏𝑙                (4)
                           𝑙=1                   𝑙=1

     According to the spatial response vector constructed in (1), the CSI of the 𝑗 𝑡ℎ antenna can be
written as:
                              𝐿
                         𝑗
                      𝑐𝑠𝑖𝑘 = ∑ 𝛼𝑙 𝑒 −𝑗2𝜋𝑓1 𝜏𝑙 ∙ 𝑎𝑗 (𝜃𝑙 , 𝜑𝑙 ) ∙ 𝑒 −𝑗2𝜋(𝑘−1)∆𝑓𝜏𝑙                    (5)
                             𝑙=1

     In 𝑛𝑡ℎ snapshot, the 𝑛𝑡ℎ column of CSI matrix of 𝑙-path signal can be written as the following
matrix:
                                          𝑐11 ⋯ 𝑐1𝐾
                                  𝒄(𝑛) = [ ⋮   𝑐𝑘𝑗     ⋮ ]                                     (6)
                                          𝑐𝐻1 ⋯ 𝑐𝐻𝐾
      𝒄(𝑛) is an H ×K matrix, where H and K are the number of antennas and subcarriers, respectively.
𝒄(𝑛) is a complex matrix, but neural networks cannot handle complex numbers directly. In order not to
lose the information in the matrix, we reconstruct the matrix as follows:
                                          𝑅(𝑐11 )        ⋯ 𝑅(𝑐1𝐾 )
                                              ⋮          ⋱    ⋮
                                          𝑅(𝑐𝐻1 )        ⋯ 𝑅(𝑐𝐻𝐾 )
                                 𝒄𝒓(𝑛) =                                                           (7)
                                           𝐼(𝑐11 )       ⋯ 𝐼(𝑐1𝐾 )
                                              ⋮          ⋱    ⋮
                                         [ 𝐼(𝑐𝐻1 )       ⋯ 𝐼(𝑐𝐻𝐾 ) ]
where 𝑅(·) and 𝐼(·)⁡denote the real and imaginary parts of a complex-valued entity, respectively.
Finally, the input matrix in get by the average of all ⁡𝒄𝒓(𝑛).
     For the CSI given in (6), the covariance matrix can be expressed as:
                                     𝑅𝑖𝑗 = 𝑐𝑜𝑣(𝑐(𝑖), 𝑐(𝑗)∗ )                                       (8)
Where 𝑐(𝑖)⁡and 𝑐(𝑗) are 𝑖 𝑡ℎ and 𝑗 𝑡ℎ row of CSI matrix respectively, (∙)* represent conjugate and 𝑐𝑜𝑣(∙
) means the covariance of two vectors. The average covariance matrix of all snapshots is given in the
following equation:
                                                  𝑛
                                            1
                                         𝑹 = ∑ 𝑅(𝑚)                                                (9)
                                            𝑛
                                                 𝑚=1


2.2.    Deep Neutral Network Structure
      Convolution neutral network (CNN) is a type of DNN and has many advantages compared to
traditional techniques, e.g.: good fault tolerance, parallel processing and self-learning capability, can
handle problems in situations with complex environmental information, unclear background knowledge
and unclear inference rules, allowing samples with large deficiencies and distortions, running fast, good
adaptive performance and high resolution. It is a feature extraction function fused into a multi-layer
perceptron through structural reorganization and weight reduction, omitting the complex image feature
extraction process prior to recognition.
      A CNN consists of four main components: convolutional layer, pooling layer, fully connected
layer and an activation function for each layer.


                                                      Filters 3×3
                                                                    Convoluted map
                               Input matrix
Figure 2: Calculation process of convolution


                                                      Maxpool 2×2
                                                                    Sampled map
                                  Input matrix
Figure 3: Calculation process of max-pooling

      Convolution layer is one of the key building blocks of the convolutional neural network, which
extracts features from the input by calculating the correlation between the network input and the kernel
weights. Pooling layer performs the down sampling operation to further reduce the input dimensions,
without losing too much useful information. Fully connected layer means the layer-by-layer connection
is fully connected, i.e., each neuron in one layer is connected to all neurons in the next layer. Such a
structure introduces arbitrary linear combinations of the inputs and can have powerful approximate
behaviors. We can express these three processes as follows:
                                      𝒀 = 𝐻[𝐹(𝑿, 𝑾) + 𝒃]                                          (10)
    where 𝑿, 𝒀,⁡𝑾 and 𝒃 are referred to as input, output, weight, and bias respectively. 𝐹(·) refers to
convolution, pooling or matrix multiplication and 𝐻[·] means the activation function of this layer.
3. DoA estimation with DNN
   This paper presents a detailed study of CNN-based DoA and ToA estimation method. In this work,
a cascaded convolutional neural network is used to solve the DoA and ToA estimation problem with
the aim of learning the mapping of DoA and ToA from the observed antenna array signal to the incident
wave. However, the generalization capability of the neural network is limited, and the performance of
the neural network degrades substantially in the case of large SNR gaps. To overcome this problem, a
noise filtering network is introduced to perform noise filtering at low SNR. The network structure
consist of two steps: a) the noise filtering step and b) the estimation step. We will describe our work in
detail in the following section.
                                            CSI Matrix


                                        Data preprocessing


                                           SNR<0dB?


                                                 Y

                                           Noise Filter         N
                                            Network


                                        Estimation Network


                                        DOA TOA Output


Figure 4: Estimation process of DoA and ToA

3.1.    Noise filtering neutral network
   We first need to classify the SNR of received signal. Referring to [14], the distinction of SNR is
modelled as a binary classification problem. Eigenvalue decomposition is performed on (10), from
which signals with high SNR and low SNR can be distinguished.
   Noise filtering neural networks are used to filter CSI matrices at low SNR to enhance the effective
components of the signal components in the CSI matrix through noise filtering operations. In this paper,
a convolutional neural network is used for noise filtering, and signal enhancement is accomplished by
learning the mapping relationship between the CSI matrix under low SNR conditions and the CSI matrix
under noiseless conditions. The filtering neural network consists of a five-layer structure, containing
two convolutional layers and three fully connected layers.
   We can get a 2H ×K matrix after data preprocessing by (equation), then fed into the neural network.
Next, the input matrix goes through two convolutional layers and two max-pooling layers alternately.
To avoid losing some features and to obtain a larger convolutional perceptual field of view, we use a
zero-padding approach and a convolutional kernel size of 5 ×5 for feature extraction on the input matrix.
The specific number of filters used for the first and second and convolutional layers is 32 and 64
respectively. For all two max-pooling layers, we use the same pooling size 2 and stride of size 2. And
then we can get a 64 × 0.5H × 0.25K three-dimensional features. The extracted features are flattened
and fed into two fully connected layer with 1024 neurons and 2H × K neurons. The final output is
reshaped as a 2H × K matrix, which is the output after noise filtering.

3.2.    Estimation network of DoA and ToA
                                                                             Maxpooling2
                                                         Convolution2
                                           Maxpooling1
                Convolution1
                                                                                                    Three fully connected layers

                                               (a) noise filtering neural network


                                                                                                                SOFTMAX
                                                                                                                             AOA


                                                                                                                SOFTMAX
                                                               Maxpooling2                                                   TOA
                                   Maxpooling1 Convolution2
           Convolution1
                                                                                           Two parallel fully
                                                                                           connected layers
                                                 (b) estimation neural network

Figure 5: Cascaded neural network structure consists of a noise filtering neural network (a) and an
estimation neural network(b).

    In this part, we present the DoA and ToA estimation network. The estimation network works by
learning the mapping relationship between the preprocessed CSI matrix and DoA and ToA. If the
obtained data has a high SNR, then DoA and ToA estimation is performed directly on this data, while
data with a low SNR is first fed into the filtering neural network and then into the estimation network.
We can model the DoA and ToA estimation problem as a classification problem, where DoA and ToA
obtained for each classification result are in set 𝜽 = {𝜃1 , ⋯ , 𝜃𝐾 }⁡and set 𝝉 = {𝜏1 , ⋯ , 𝜏𝐿 } respectively.
    The configuration of the convolutional layers is similar to noisy filter network. As shown in the
figure, the feature maps obtained after convolution are fed into two parallel fully connected networks.
Each parallel network contains only one input layer and one output layer, with the same number of
neurons in both input layers, 64 × 0.5H × 0.25K, and the number of neurons in the output layer being
related to the angle and time resolution respectively. For example, If the DoA is distributed in [𝜃1 , 𝜃2 ] ,
number of DoA output neurons is (𝜃1 − 𝜃2 )/∆𝜃 + 1,where ∆𝜃 is resolution of angle. Similar to DoA,
number of ToA output neurons is (𝜏1 − 𝜏2 )/∆𝜏 + 1. We then put the output through the Softmax
function and the neuron with the highest probability output is used as the final output. We can get results
of Softmax function as follows:
                                                                 𝑒 𝑧𝑖
                                      𝑆𝑜𝑓𝑡𝑚𝑎𝑥(𝑧𝑖 ) =           𝐽      𝑧                                               (11)
                                                              ∑𝑗=1 𝑒 𝑗

where 𝑧𝑖 is the output value of 𝑖 𝑡ℎ neuron and J is the total number of output neurons. The output values
of a multiclassification can be transformed into a probability distribution in the range [0, 1] and
summing to 1 through Softmax function.
   Throughout the cascade network, the activation function used is the Relu function:
                                         𝑓(𝑥) = 𝑚𝑎𝑥⁡(0, 𝑥)                                                            (12)
   Relu is a non-saturated linear unit that speeds up network training, reduces computational
complexity, is more robust to various disturbances and avoids the gradient disappearance problem to
some extent compared to the Tanh and Simgod functions.

3.3.    Training and testing strategy
    The cascaded neural network consists of two neural networks connected together, which are trained
separately. The trained neural network is cascaded to complete the work of filtering out noise and
estimating DoA and ToA. During training, the data is fed in as a batch to reduce the training burden.
Each neural network was trained 100,000 times separately, where the noise filtering neural network
was back-propagated based on minimizing mean square error (MSE) and the estimation neural network
was back-propagated based on minimizing cross-entropy loss. Where MSE and cross-entropy can be
calculated as follows:
                                                𝑛
                                          1
                                    𝐿𝑀𝑆𝐸 = ∑(𝑦𝑖 − 𝑦̂𝑖 )2                                            (13)
                                          𝑛
                                               𝑖=1

                                              ̂) − (1 − 𝒚)𝑙𝑜𝑔(1 − 𝒚
                       𝐿𝑐𝑟𝑜𝑠𝑠⁡𝑒𝑛𝑡𝑟𝑜𝑝𝑦 = −𝒚𝑙𝑜𝑔(𝒚                   ̂)                                (14)
      where 𝑛 is the number of output neurons of the noise filtering neural network, 𝑦𝑖 and 𝑦̂𝑖 is the
output value and the true value respectively. In (14), 𝒚 and 𝒚    ̂ are output vectors and truth vectors
respectively.
      Both neural networks use Adam optimizer for gradient descent to complete the update of the
weights. Dropout was used after every layer to prevent over fitting and improve the stability and
robustness of the neutral network. The selection of the learning rate is also very important for the
training of the neural network. If the learning rate 𝑛 is chosen to be relatively large, the weights 𝑤 will
be adjusted more substantially during the training process, thus speeding up the network training, but
this will cause the network to jitter frequently during the search on the error surface, which leads to the
training process not converging and may cross the optimal optimization 𝑤. Similarly, a relatively small
learning rate can steadily make the network approach the global optimal point, but it may also fall into
some local optimal regions. Experimentally, the learning rate of the filtered neural network is set to 1e-
3 and the estimated neural network is set to 1e-4.
      The testing data is input into the trained neural network to calculate the prediction accuracy and
mean square error, so as to measure the effectiveness of the neutral network. In addition, during the
testing phase, we must make sure that the data used for testing has not been trained in advance so that
our neural network can be considered to work properly.

4. Simulation parameters and results
4.1. Simulation setup
   In our experiments, the proposed convolutional neural network is implemented in Python 3.5 with
TensorFlow 1.12, and the conventional correlation and MUSIC based methods are implemented by
MATLAB R2019a. All experiments are performed on a lab server with two NVIDIA GeForce GTX
TITAN Xp Graphical Processing Units (GPUs) with 24GB of memory.

4.2.    Dataset generation
    In the simulation, a uniform grid array of 4 × 4 is used, with 16 single- polarized antennas evenly
distributed in the array at half-wavelength spacing. It is assumed that the source emitted signal impinges
into the antenna array via the direct and reflected paths, with the central frequency set at 2 GHz and the
ratio of the variance of the power of the two paths is 10dB. All data are generated by the simulation
software rather than direct measurements in real scenarios. The received signal impinging on the
antenna array is an OFDM signal and has K subcarriers with a subcarrier spacing of 30K Hz. The CSI
of the received signal can be obtained by (8), and the information matrix is calculated according to 50
snapshots of the CSI of the received signal.
    Our proposed neural network is used to estimate both the DoA and ToA of OFDM signals. DoA
contains AoA and ZoA，and this paper focuses on the estimation of AoA, with ZoA being assumed to
be a constant value. The neural network is trained by treating the data of the direct path as the true
output of the signal. We assume that the AoA of the signal transmitted through the reflect path occurs
20°larger and arrives 30 ns later than the direct path. we assume that the AoA of the direct path is
uniformly distributed at (-60,60] and the angular search resolution is set to 1°, containing a total of 120
AoA incident directions, for each AoA, the corresponding ToA is assumed to be uniformly distributed
at (10,50] and the resolution is set to 1 ns, so there are total of 120 × 40 directions of arrival with
different time of arrival in the dataset. For each DoA and ToA, 90 independent noisy signal vectors
generated from UGA ’s received signal vector after adding noise are used for training.

4.3.    Neural network parameters initialization
    The initial values of network weights also have a great influence on the training of neural networks,
if the initial weights are not set properly, it may lead to slow training, gradient disappearance or gradient
explosion, etc. In general, the connection weights and thresholds of the network are initialized to be
distributed in a relatively small interval with 0 mean. In this paper, the weight parameters w of the
filtering and estimation networks obey a Gaussian truncated distribution with mean 0 and standard
deviation 0.01 and 0.1, respectively, and are set to 0.01 and 0.1 for all bias parameters b, respectively.

4.4.    Simulation result
    First, to verify that our neural network works, we explored the variation of loss with the number of
iterations for both networks during training.


Figure 6: Loss of noise filtering neural network with the number of iterations


Figure 7: Loss of estimation neural network with the number of iterations

   Figure 6 and Figure 7 show the images of the loss functions of the filtered and estimated neural
networks with the number of iterations, respectively, and it can be seen that the loss functions are
decreasing as the number of iterations increases, and finally converge to a range. The neural network
can learn the mapping relationship between estimation parameters and input matrix. And ToA training
is better than the AoA, as will be given specifically in the simulation below.
    To evaluate the effectiveness and robustness of our proposed convolutional neural network structure,
we compared our proposed cascaded neural network with other four methods:
    1. MUSIC-enhanced: A algorithm based on MUSIC. The time diversity of every OFDM subcarrier,
        and a virtual spatial smoothing method was used for construction of the correlation matrix. DoA
        and ToA estimation were then performed based on MUSIC algorithm.
    2. AML: An efficient approximate maximum likelihood algorithm for indoor location, which
        updates the DoA and ToA parameters alternatingly.
    3. CNN-class: A CNN-based estimation method that first classifies the signal-to-noise ratio and
        then selectively uses two neural networks for ToA and DoA estimation.
    4. CNN-base: Estimate DoA and ToA through CNN directly.
    The first two methods are physically driven and the latter two and our proposed methods are data
driven.
    Two evaluations chosen in this paper are the mean absolute error (MAE) of DoA and the mean
squared error (MSE) of DoA estimation. MAE is a better reflection of the actual error in the predicted
values and MSE can indicate the accuracy of the predicted values Where the MAE can be expressed as:
                                                𝑛
                                          1
                                     𝑀𝐴𝐸 = ∑|𝑦𝑖 − 𝑦̂𝑖 |                                           (15)
                                          𝑛
                                               𝑖=1

The MSE can be calculated as follows:
                                               𝑛
                                         1
                                    𝑀𝑆𝐸 = ∑(𝑦𝑖 − 𝑦̂𝑖 )2                                           (16)
                                         𝑛
                                              𝑖=1
      Figure 8 show the MSE and MAE of AoA estimation for different SNRs, respectively, and it is
clear from the figures that our proposed method performs better than the other four whatever the SNR
is. The estimation errors are decreasing as the SNR increases, and the performance of the neural
network-based estimation methods is comparable to that of the physically driven methods at different
SNRs due to the influence of the generalization ability of the neural networks. The classification-based
CNN network is the same structure as this paper at SNR⁡≥ 10 dB, and both have the same MSE and
MAE. When SNR < 0 dB, the performance is better than the other methods due to the filtered noisy
neural network of the proposed method. Although it can be seen from Fig.6 that the CSI matrix with
filtered noise is similar to that without SNR, the estimated performance is not as good as with high SNR
due to the inherent correlation between the matrix data and the loss of some correlation properties after
training. But it still performs better than others.


Figure 8: AoA estimation MSE and MAE versus SNR (dB).
     Figure 9 show the MSE and MAE of the ToA estimates for different SNRs, respectively. Our
proposed method has a higher accuracy for ToA estimation. As can be seen from the figure, the data-
driven approach is much more sensitive to changes in ToA than the physically driven approach at low
SNRs. At SNR ≥ 0dB, the neural network can complete the classification task with a resolution of 1ns
perfectly, achieving an accuracy of 100, and no error in these cases. When SNR <⁡0dB, the error of the
physically driven-based approach increases sharply, but the ToA estimation of the data-driven approach
has some noise immunity and still provides a relatively accurate estimate of ToA.


Figure 9: ToA estimation MSE and MAE versus SNR (dB).

   We then investigated the relationship between the estimation performance of the neural network and
the number of subcarriers.


Figure 10: AoA and ToA estimation MSE versus carrier number

    Figure 10 show the variation of the MSE of our proposed method with the number of subcarriers for
different SNR. It can be seen that the estimation accuracy of both AoA and ToA increases as the number
of subcarriers increases. By (8), the size of the input matrix is H ×2K, so when the number of subcarriers
increases, the dimension of the input matrix of the neural network also increases, so the neural network
can extract more features and can learn the mapping relationship between input and output more
accurately, then the estimation accuracy will increase. Therefore, the number of subcarriers is also one
of the important factors affecting the estimation accuracy. However, if the number of carriers is too
much, the data dimension explodes dramatically, which greatly increases the load of our neural network
training. We should choose the right number of carriers according to the demand in order to reduce the
training load.

5. Conclusion
     In this paper, we propose a deep learning CNN-based method for estimating DoA and ToA of
OFDM signal. A cascaded neural network is used to filter noise and estimate DoA and ToA. Extensive
simulation results show that the proposed CNN-based estimation method is more resistant to multipath
and noise compared to the conventional estimation methods, which demonstrates the potential of the
data-driven approach in parameter estimation for accurate positioning.

6. Reference
[1] L. Chen, W. Qi, P. Liu, E. Yuan, Y. Zhao and G. Ding. "Joint 2-D DoA and ToA estimation for
     multipath OFDM signals based on three antennas." IEEE Communications Letters 22.2 (2017):
     324-327.
[2] Shamaei, Kimia, Joe Khalife, and Zaher M. Kassas. "A joint TOA and DOA approach for
     positioning with LTE signals." 2018 IEEE/ION Position, Location and Navigation Symposium
     (PLANS). 2018.
[3] Wen, F., Liu, P., Wei, H., Zhang, Y., & Qiu, R. C. "Joint azimuth, elevation, and delay estimation
     for 3-D indoor localization." IEEE Transactions on Vehicular Technology 67.5 (2018): 4248-4261.
[4] Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. "Reducing the dimensionality of data with
     neural networks." science 313.5786 (2006): 504-507.
[5] Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., and Alsaadi, F. E. (2017). A survey of deep neural
     network architectures and their applications. Neurocomputing, 234, 11-26.
[6] Neumann, David, Thomas Wiese, and Wolfgang Utschick. "Learning the MMSE channel
     estimator." IEEE Transactions on Signal Processing 66.11 (2018): 2905-2917.
[7] Chen, Min, Yi Gong, and Xingpeng Mao. "Deep Neural Network for Estimation of Direction of
     Arrival With Antenna Array." IEEE Access 8 (2020): 140688-140698.
[8] Kase, Y., Nishimura, T., Ohgane, T., Ogawa, Y., Kitayama, D., & Kishiyama. "Fundamental Trial
     on DoA Estimation with Deep Learning." IEICE Transactions on Communications (2020):
     2019EBP3260.
[9] Xiang, H., Chen, B., Yang, M., Yang, T., and Liu, D. "A novel phase enhancement method for
     low-angle estimation based on supervised DNN learning." IEEE Access 7 (2019): 82329-82336.
[10] Xiang, H., Chen, B., Yang, M., Yang, T., and Liu, D. "Phase enhancement model based on
     supervised convolutional neural network for coherent DoA estimation." Applied
     Intelligence (2020): 1-12.
[11] Xiang, H., Chen, B., Yang, M., Yang, T., and Liu, D. "Improved de-multipath neural network
     models with self-paced feature-to-feature learning for doa estimation in multipath
     environment." IEEE Transactions on Vehicular Technology 69.5 (2020): 5068-5078.
[12] Xiang, H., Chen, B., Yang, M., Yang, T., and Liu, D. "Improved direction-of-arrival estimation
     method based on LSTM neural networks with robustness to array imperfections." Applied
     Intelligence (2021): 1-14.
[13] Wajid, M., Kumar, B., Goel, A., Kumar, A., and Bahl, R. "Direction of arrival estimation with
     uniform linear array based on recurrent neural network." 2019 5th international conference on
     signal processing, computing and control (ISPCC). IEEE, 2019.
[14] Guo, Y., Zhang, Z., Huang, Y., and Zhang, P. "DoA estimation method based on cascaded neural
     network for two closely spaced sources." IEEE Signal Processing Letters 27 (2020): 570-574.
[15] Sun, H., Kaya, A. O., Macdonald, M., Viswanathan, H., & Hong, M. "Deep learning based
     preamble detection and ToA estimation." 2019 IEEE Global Communications Conference
     (GLOBECOM). IEEE, 2019.
[16] Hsiao, Yao-Shan, Mingyu Yang, and Hun-Seok Kim. "Super-Resolution Time-of-Arrival
     Estimation using Neural Networks." 2020 28th European Signal Processing Conference
     (EUSIPCO). IEEE, 2021.
[17] Luo, Zhe, Tao Tao, and Jianguo Liu. "ToA Estimation Scheme Based on CNN for B-IFDM-Based
     Preambles." 2019 IEEE 89th Vehicular Technology Conference (VTC2019-Spring). IEEE, 2019.