A Novel Approach of Cognitive Base Station with Dynamic Spectrum
                            Management For High-speed Rail ∗

                       Qingting Wu, Yiming Wang, Zhijie Yin, Hongyu Deng, Cheng Wu†
                      School of Urban Rail Transportation, Soochow University, Suzhou, China


                             Abstract                               always deployed outside along the railway and BBU is insid-
                                                                    e. One BBU is connected to multiple RRUs. BBU and RRU
        The characteristic of fast movement in high-speed           are used to process baseband signal and radio frequency sig-
        rail seriously affects the stability of vehicular wire-     nal, respectively. To ensure the communication between RRU
        less communication. Applying cognitive technolo-            and passengers, two vehicular stations (VS) are installed on
        gy to individual users often brings frequent channel        the top and final carriages of the train. The network architec-
        switch and inefficient blind learning. To address           ture is illustrated in Fig. 1 [isheng Zhao et al., 2013], [Tian
        these issues this paper proposes a novel concept of         et al., 2012]. The GSM-R system consists of base transceiver
        Cognitive Base Station (CBS), which has the capa-           stations (BTS) along the railway lines and embedded GSM-
        bility of forecasting spectrum holes and assigning          R mobiles connected to antennas on the roof of the trains.
        spectrum to individuals. We then give the model of          The train has to be permanently connected to the trains con-
        cognitive base station and evaluate the performance         trol center. This connection has a high priority level, and if
        in our simulation platform within high-speed rail           the modem connection is lost, the train stops automatically
        environment. The experiment results further prove           [Dudoyer et al., 2012].
        that the model can significantly improve the perfor-
        mance of vehicular communication.                              However, under the circumstance of high-speed railway
                                                                    [Zhang et al., 2012], vehicular communication often shows
                                                                    unstable, even sometime dreadful [Ai et al., 2014]. Usual-
1       Introduction                                                ly, when the speed is up to 350 kilometers per hour, there
                                                                    unavoidably arises some issues, such as Doppler shifts, fast
With the development of era, the demand for rail transit is
                                                                    cell switching and the penetration loss [Zhou and Ai, 2014].
rapidly increasing. When travelling on train, the passengers
                                                                    The Doppler shifts results from the relative motion between
always hope to enjoy better communication quality and faster
                                                                    a vehicle and a base station. Doppler Effect becomes another
data access service. European Rail Traffic Management Sys-
                                                                    pivotal factor degrading system performance, which increas-
tem (ERTMS) is a revolution in railways to guarantee the
                                                                    es randomness of received signal [Liu et al., 2011], [Li and
communication, which is consist of European Train Control
                                                                    Zhao, 2012], [Dybala and Radkowski, 2013]. The high speed
System (ETCS) and a mobile-communications network opti-
                                                                    operation of the train leads to fast cell switching. As a train
mized for railways called GSM-R.
                                                                    moves across the footprint of the satellite beam, the receiv-
   GSM-R is the Global System for Mobile Communications-
                                                                    ing signal level may vary, especially towards the edge of the
Railway in the worldwide and is dedicated to provide the bidi-
                                                                    beam, which significantly impacts service rates even causing
rectional radio bearer for the train signaling systems, which
                                                                    service drops [Li et al., 2013], [Alkayal and Saada, 2013].
operates in a 4MHz band (876-880 MHz for uplink and 921-
                                                                    The fully enclosed body structure with good sealing proper-
925 MHz for downlink) [Sniady and Soler, 2012]. It is possi-
                                                                    ty of the high-speed train results in penetration loss. Typi-
ble to divide the authorized band into 19 channels of 200KHz
                                                                    cally, the terminals inside the train connect to the base sta-
width in each GSM-R group. The rail line is covered with
                                                                    tions along the railway tracks via wireless links, in which the
GSM-R groups and each consists of many GSM-R cells. A s-
                                                                    large penetration loss will directly degrade the communica-
ingle GSM-R cell can use only few of the channels in a round
                                                                    tion link quality and decrease the cell coverage [Zhu et al.,
robin manner, because the same channel cannot be reused by
                                                                    2013], [Liu et al., 2012]. Furthermore, Federal Communi-
neighboring cells due to interference. Each cell is equipped
                                                                    cations Commission (FCC) released the investigation on the
with a base station. The base station is made up of building
                                                                    usage of spectrum In 2003. It suggested that the authorized
baseband unit (BBU) and radio remote unit (RRU). RRU is
                                                                    band in 3 − 6GHz range is less than 0.5% utilized on av-
    ∗
      Project supported by the National Nature Science Foundation   erage. And so is the band below 3GHz, which is less than
of China (No. 61471252) and the Natural Science Foundation of       35% [Commission and others, 2003]. Just based on these
Jiangsu Province (No. BK20130303).                                  viewpoints, it is necessary to introduce a novel architecture
    †
      Corresponding Author: cwu@suda.edu.cn                         for high-speed vehicular communication to address the issues
                                         BBU                                                BBU


                        RRU


                                                train


                                    R                          R
                                                                              vs
                                         CR                         CR
                               CR                        CR
                                    CR                         CR


                        Figure 1: Networks architecture for the high-speed rail communication system.


from individual user’s high-speed movement along the rails                same environment and each user is independent. So they
and the inefficiency in the spectrum usage.                               compete each other for the spectrum resources, which
   In recent years, a lot of researchers used cognitive radio             leads to blind learning and frequent conflicts.
(CR) to improve the performance of wireless communication.           (2) The rail transit contains a large number of CR user-
The basic idea of CR networks is that the unlicensed devices              s. While every user sense the environment, the sys-
(also called cognitive radio users or secondary users) need to            tem works with heavy workload and high computational
vacate the spectrum band once detect the licensed devices (al-            complexity.
so known as primary users). Simon HayKin defined the CR as
an intelligent wireless communication system that is aware of        (3) The operations of mutual competition and cooperation
its environment and uses the methodology of under-standing-               between the CR users interfere with not only primary
by-building to learn from the environment and adapt to sta-               users, but also themselves and their neighbors.
tistical variations in the input stimuli [Haykin, 2005]. Letaief     (4) Spectrum holes in each base station are different. It
presented a cognitive space-time-frequency coding technique               would inevitably occur spectrum handoff.
that can opportunistically adjust its coding structure by adapt-       For addressing the above issues, we try to propose a novel
ing itself to the dynamic spectrum environment [Letaief and         model of cognitive base station in the paper. Our proposed
Zhang, 2009]. Soyeon Kim proposed a CR operational algo-            CBS attempts to use the authorized bands for railway without
rithm for mobile cellular systems, which was applicable to the      interrupting PUs. The CBS model should satisfy the follow-
multiple secondary user environment [Kim and Sung, 2014].           ing conditions:
These results proved CR technology can significantly reduce
interference to licensed users, while maintaining a high prob-       (1) The CBS can forecast spectrum holes according to its
ability of successful transmissions in a cognitive radio (CR)             experience and assign spectrum to individuals within its
ad hoc network.                                                           range of coverage. In this way, the computational com-
   There are few publications about applying CR to the field              plexity of the entire network can be reduced.
of urban rail transit. Wu proposed a wireless cognitive model        (2) The rail transit runs daily over a fixed route according
for high-speed individuals’ spectrum management and show a                to its timetable. The CBS can take the advantage of
small performance improvement in wireless communication                   these characteristics, cooperate with each other to fore-
[Wu et al., 2015]. Although using cognitive radio in high-                cast spectrum holes on the whole route.
speed-railway has improved the performance, there are still            This paper is organized as follow. We first introduce the
so many issues that are open to address:                            concept of cognitive base station and its mathematical mod-
(1) Most of the cognitive radio users usually sense in the          el in Section 2. Section 3 then applies the novel CBS model
                                                                  in which, the CBS agents correctly infer the presence of the
                                  Radio                           PUs if the former lies within the PUs’ transmission range.
                               Environment                            • Long-term Awareness of Spectrum Usage
                                                                        Characterizing the spectrum bands based on their activi-
                                                                        ty, and in particular, learning about the utilization of the
                                                                        channel is a key function of the CR users. Online learn-
                                                                        ing algorithms must be developed that allow the CBS a-
                                                                        gents to continuously gather information about its radio
                              Spectrum
                                                                        environment, and construct a utilization function. Apart
                               Mobility                                 from simply classifying the spectrum as busy or avail-
                                                                        able, it is beneficial if a probability distribution of the
                                                                        anticipated transmission/silent durations of the PUs can
                                                                        be derived. We propose a tightly integrated reinforce-
    Spectrum                                         Spectrum           ment learning equipped link layer protocol to schedule
     Sharing                                          Sensing           the transmissions between CBS agents and CR user a-
                                                                        gents over time.
                                                                      • End-to-End Learning
                                                                        Distributed networks rely on multihop forwarding of
                                Spectrum                                packets between a source-destination pair. Each CBS a-
                                 Decision                               gent on this path learns of its own spectrum environment
                                                                        over time, and this information can be leveraged at the s-
                                                                        tart and end points of the path to make optimal decisions
    Figure 2: The cognitive cycle of a cognitive base station.          regarding the spectrum choices and routing options. As
                                                                        an example, spectrum switching costs locally at a node
                                                                        affects end-to-end delays. While spectrum characteris-
with RL into the scenario of high-speed rail, and propose the           tics can be locally inferred, the specific choice of the
cooperation mechanism of multiple CBS agents. The experi-               spectrum at each link to minimize intra-path switching
mental simulation results are given in Section 4. We conclude           must be undertaken at the end points of the path. We
this paper in Conclusion.                                               explore ways to share this learning and spectrum aware-
                                                                        ness obtained by a node between its local neighbors, and
2     Cognitive Base Station Model                                      subsequently over multiple hops to the destination. The
Our proposed CBS is deployed along the railway, which                   cost of this learning and the benefits are investigated as
works as a spectrum assigner. It learns from feedback re-               part of this project.
ceived through interactions with an external environment and
assigns spectrum to the passengers in the range of coverage.      3     SPECTRUM MANAGEMENT BASED
We consider each CBS to be an agent, which has four spec-               COGNITIVE BASE STATION
trum management functions: spectrum sensing, spectrum
mobility, spectrum decision and spectrum sharing [Chkirbene       3.1    The Q-Learning
and Hamdi, 2015], [Lee and Akyildiz, 2012]. Fig. 2 gives          Reinforcement learning, which is inspired by psychological
the steps of the cognitive cycle within the framework of CBS,     learning theory from biology [Waltz and Fu, 1965], enables
which is formed by the spectrum-aware operations. Each CB-        the agent to learn behavior through trail-and error interactions
S agent uses reinforcement learning to operate spectrum man-      with a dynamic environment [Sutton and Barto, 1998]. The
agement. All of the agents can sense the environment, obtain      classical reinforcement algorithm is Q-Learning, the process
its own current state about spectrum usage, and communicate       of which is as follows [Puterman, 1994]. On each step of
with each other for the purpose of cooperation. They then         interaction the agent chooses an action according to the ex-
make decision according to its own state and the whole net-       ternal environment based on its current state. As a result, the
work situation, then use spectrum mobility to choose actions.     action changes the environment and receives a reward. The
Finally, these CBS agents continue to send its new state to the   agent need to develop a policy, that maximizes the long-run
other neighbor CBS agents.                                        measure of reinforcement.
   We assume that our cognitive radio network along high-            The classic reinforcement learning algorithm is formulat-
speed rail consists of a collection of CBS agents and CR user     ed as follows. At each time t, the agent perceives its current
agents. Each CBS agent has its own PUs and available spec-        state st ∈ S and the set of possible actions Ast . The agent
trums. The CBS agents undertake decisions on choosing the         chooses an action a ∈ Ast and receives from the environ-
spectrum independently of the CR user agents in the range.        ment a new state st+1 and a reward rt+1 . Based on these
A choice of spectrum by the CBS agent i is essentially the        interactions, the reinforcement learning agent must develop a
choice of the frequency represented by f i ∈ F . The CR           policy Pπ : S → A which maximizes the long-term reward
user agents continuously monitor the spectrum that the CBS        R =       t γrt for MDPs, where 0 ≤ γ ≤ 1 is a discount-
agent choose in each slot time. We assume perfect sensing,        ing factor for subsequent rewards. The long-term reward is
                      PU agents


                                      CBS agent   PU agents
                                                                   PU agents                 PU agents


                                                       CBS agent


                                           CR
                                                                    CBS agent
                                          agent

                                                                                             CBS agent


                         Figure 3: The cognitive base station within the high-speed-rail transportation.


the expected accumulated reward that the agent expects to re-      mally, if there are m spectrums, we can using the index
ceive in the future under the policy, which can be specified       to specify these spectrums. In this way, we have SP       ~ =
by a value function. In this way, the Q-learning can calcu-        {SP 1 , SP 2 , ..., SP m }.
late an update to its expected discounted reward, Q(st , at ) as      At a particular time and a particular state, the CBS will take
follows:                                                           action according to learning results to either switch channel
   Q(st , at ) ← Q(st , at ) +                                     or transmit. At time t we define at = k, where k is the action
                 α[rt + γ max Q(st+1 , a) − Q(st , at )]           that CBS chooses at time t and
                                  a
where γ is the discount factor such that 0 ≤ γ < 1. The agent           k ∈ {switch to channel1 , switch to channel2 ,
stores the state-action values in a table Q [Wu et al., 2010],
[Jiang et al., 2011], [Bkassiny et al., 2013].                              ..., switch to channelm , transmit data}.
   Recently the reinforcement learning has attracted increas-      Once the CBS agent has detected any active PU, it would
ing interest in the machine learning and artificial intelligence   take action to channel switching. We use the Q table to s-
communities. Kadam etc. applied the Q-Learning into rout-          tore state-action values. At time t, the state is spt and the
ing data in Wireless Sensor Network scenario to route data         action is k, then we can calculate the value Q(spt , k) by the
efficiently from one source to multiple mobile sinks [Kadam        above Q-learning formulas. If PU is detected, the CBS agent
and Srivastava, 2012]. It turned out that the algorithm can        would switch to the other available spectrum with the largest
extend the network lifetime.                                       Q-value.
3.2   Application to Cognitive Base Station                           The reward is the estimate for spectrum usage availablity
                                                                   on a CBS agent. The different network situation results in
We illustrate the high-speed railway environment with CBS          different rewards as follows.
agents along the way in Fig. 3 . We further model a cogni-
tive radio network as consisting of a set of Cognitive Base          • CR-PU interference: If a PU’s activity occurs in the
Stations, denoted CBS, a set of primary users, denoted P U ,           spectrum shared by any CR user, and in the slot same
and a set of available frequencies, denoted SP . We assume             selected for transmission, then a high penalty of −15 is
that the topological structure of a given network is fixed.            assigned. The intuitive meaning of this is as follows: We
   Spectrum holes vary due to the behavior of PUs, which               can avoid the collisions among the CR users using the
causes the change of environment. CBS agents can perceive              mediation from the CBS agents. However, the concur-
the states within the environment. The state of an CBS agent           rent use of the spectrum with a PU goes against the prin-
is the current spectrum of its transmission. The state of the          ciple of protection of the licensed devices, and hence,
multi-agent system includes the state of every CBS agent. We           must be strictly avoided.
therefore define the state of the system at time t, denoted st ,     • Successful Transmission: If none of the above condition-
as                                                                     s are observed to be true in the given transmission slot,
                           st = (sp)
                                  ~ t                                  then packet is successfully transmitted from the sender
, where sp~ is a vector of spectrums across all agents. Here           to receiver, and a reward of +5 is assigned, which is
spi are the spectrum on the ith agent and spi ∈ SP    ~ . Nor-         found experimentally to give the best results.
                                                                   Algorithm 1 Pseudo code of Q-learning on CBS
                    Initial state and
                         reward                                      Main()
                                                                                                                 ~ value;
                                                                     Initialize state st and action at and their Q
                                                                     repeat
                                                                                             ~
                                                                        Q-learning(st , at , Q)
                                             Assign +5               until all episodes are traversed
                      Is PU on？         No
                                              reward
                                                                                              ~
                                                                     Q-with-Kanerva(st , at , Q)
                          Yes                                        repeat
                                                                       Take action st , observe reward rt , get next state st+1
                      Assign -15                                       Get Q(st at ) from the Q-table;
                       reward                                          for all actions a* under new state st+1 do
                                                                          Generate the state-action pair st+1 at+1 from state
                                                                          st+1 and action a*
                                                                          Get Q(st+1 at+1 ) from the Q-table;
                                                                       end for
                     Change state                                      δ = r + γ ∗ maxQ(st+1 at+1 ) − Q(st at )
                                                                       ∆Q ~ =α∗δ
                                                                        ~
                                                                       Q=Q    ~ + ∆Q ~
      Figure 4: The Q-learning process on CBS model.                   st = st+1
                                                                       if random probability ≤ ε then
   Once detected the primary user, a harsh punishment will be             for all actions a* under current state st do
given. Otherwise, a positive reward will be assigned. Fig. 4                 at = argmaxa Q(st at )
illustrates the proposed process, and Algorithm 1 describes               end for
our algorithm for implementing the Q-learning on CBS agent.            else
                                                                          at = random action
4     EXPERIMENTAL SIMULATION                                          end if
                                                                     until st is terminal
4.1   Experimental Design
In this section, we describe preliminary results from applying
our reinforcement learning based approach to the cognitive         transmissions in the network. The Cross Layer Repository
radio model. To detect the PUs correctly is the necessary          facilitates the information sharing between the different pro-
prerequisite. The overall aim of our proposed learning based       tocol stack layers.
approach is to allow the CBS agents to decide on an optimal           We conduct our experiment in the following scenario: there
choice of spectrum so that (i) PUs are not affected, and (ii) CR   are 2 trains which take on 21 passengers for each and 5 CBS
users share the spectrum in a fair manner. These two rules are     agents aside the railway. The average speed of train is 10m/s.
to simulate the public’s behaviors in Urban Rail Transit En-       We have 10 primary users in the range of each CBS. The ac-
vironment. That is, those bands that are frequently occupied       tivity of primary users is based on ON-OFF model and each
by licensed users are rarely utilized because of open areas or     primary user is assigned the spectrum randomly from 5 spec-
relatively closed environment, and the public can opportunis-      trums (small network) or10 spectrums (large network) . The
tically use band resources with a same probability.                CBS agent senses the spectrum holes per 0.1 second and as-
   Our novel CBS network simulator within the framework            signs available spectrum to CR user agent. The simulation
of high-speed rail has been designed to investigate the effect     parameters are summarized in Table 1.
of the proposed reinforcement learning technique on the net-
work operation. The implemented ns-2 model is composed of          4.2   Experimental results
several modifications to the physical, link and network layers     We compare the performance of our CBS with reinforcement
in the form of stand-alone C++ modules. The PU Activity            learning (CBS-RL) scheme with the CBS with Round-Robin
Block describes the activity of PUs based on the on-off mod-       scheme (CBS-RR), which is a typical way in GSM-R sys-
el, including their transmission range, location, and spectrum     tem. The Round-robin (RR) scheme employs the principle
band of use. The Channel Block contains a channel table            that once a spectrum is not available, the agent switches to
with the background noise, capacity, and occupancy status.         next channel in equal portions and in circular order, handling
The Spectrum Sensing Block implements the energy-based             all switches without priority (also known as cyclic executive).
sensing functionalities, and if a PU is detected, the Spectrum     This method is simple, easy to implement, and starvation-
Management Block is notified. This, in turn causes the device      free. In our RL-based scheme, the exploration rate  is set to
to switch to the next available channel, and also alert the up-    0.2, which we found experimentally to give the best results.
per layers of the change of frequency. The Spectrum Sharing        The initial learning rate α is set to 0.8, and it is decreased by
Block coordinates the distributed channel access, and calcu-       a scaling factor of 0.995 after each time slot.
lates the interference at any given node due to the ongoing           Figure 5(a) shows an example about the distribution of
         Chan.5                                                                                                                                      Channel5


                           0             100         200        300          400                             500        600         700        800              900
                                                                             The Number of Epoch

         Chan.4                                                                                                                                      Channel4


                           0             100         200        300          400                             500        600         700        800              900
                                                                             The Number of Epoch

         Chan.3                                                                                                                                      Channel3


                           0             100         200        300          400                             500        600         700        800              900
                                                                             The Number of Epoch

         Chan.2                                                                                                                                      Channel2


                           0             100         200        300          400                             500        600         700        800              900
                                                                             The Number of Epoch

                                                                                                                                                     Channel1
         Chan.1


                           0             100         200        300          400                             500        600         700        800              900
                                                                             The Number of Epoch
                                        (a) An example about the distribution of spectrums occupancy on CBS with 5 spectrums.


                               (b) Average rewards for 5 spectrum bands                                            (c) Average rewards for 10 spectrum bands

                      14                                                                                     25

                      12
                                                                                                             20
                      10
Channel	
  Switches


                                                                                       Channel	
  Switches


                                                                                                             15
                      8

                      6
                                                                                                             10

                      4
                                                                      CBS-­‐RR                               5                                             CBS-­‐RR
                      2
                                                                      CBS-­‐RL                                                                             CBS-­‐RL
                      0                                                                                      0
                                                 Epoch                                                                               Epoch
(d) Cumulative number of channel switching for 5 spectrum bands (e) Cumulative number of channel switching for 10 spectrum bands

                                                      Figure 5: CBS simulations with RL and RR schemes.
                                                                      Matolak, David G Michelson, and Cesar Briso-Rodriguez.
               Table 1: Simulation Parameters                         Challenges toward wireless communications for high-
    Parameters                         Values
    Topology size                      X:7000m Y:500m                 speed railway. Intelligent Transportation Systems, IEEE
    Number of passengers                                              Transactions on, 15(5):2143–2158, 2014.
                                       42
    Number of primary users            50                          [Alkayal and Saada, 2013] Fisal Alkayal and Johnny Bou
    Number of cognitive base station 5                                Saada. Compact three phase inverter in silicon carbide
    Speed                              10m/s                          technology for auxiliary converter used in railway applica-
    Number of spectrums                6                              tions. In Power Electronics and Applications (EPE), 2013
    Bandwidth                          2000000Hz                      15th European Conference on, pages 1–10. IEEE, 2013.
    Simulation time                    1000s                       [Bkassiny et al., 2013] Mario Bkassiny, Yang Li, and Sud-
                                                                      harman K Jayaweera. A survey on machine-learning tech-
                                                                      niques in cognitive radios. Communications Surveys &
spectrums occupancy on the CBS with 5 spectrums. Spec-                Tutorials, IEEE, 15(3):1136–1159, 2013.
trums occupancy on CBS follows the ON-OFF model: the
ON mode is in the normal distribution with the parameter           [Chkirbene and Hamdi, 2015] Zina Chkirbene and Noured-
µ = 25, and the OFF mode is in the exponential distribu-              dine Hamdi. A survey on spectrum management in cogni-
tion with the parameter β. the value of which is randomly             tive radio networks. International Journal of Wireless and
generated.                                                            Mobile Computing, 8(2):153–165, 2015.
   Figure 5(b) and 5(c) show the average rewards received by       [Commission and others, 2003] Federal Communications
CBS agent across all spectrums using the CBS-RL scheme.               Commission et al. Facilitating opportunities for flexible,
The result in Figure 5(b) shows that after learning over 1000         efficient, and reliable spectrum use employing cognitive
epochs, Channel 5 receives the largest positive reward of ap-         radio technologies. Et docket, (03-108):05–57, 2003.
proximately +5.5, while Channel 1, 2, 3 and 4 gets a reward        [Dudoyer et al., 2012] Stephen Dudoyer, Virginie Deniau,
of approximately −11.8, +0.7, −5.1 and +3.3. The results              Ricardo Adriano, MN Ben Slimen, Jean Rioult, Benoı̂t
indicate that our approach pushes the CBS agents to gradual-          Meyniel, and Marion Berbineau. Study of the susceptibili-
ly achieve higher positive rewards and choose more suitable           ty of the gsm-r communications face to the electromagnet-
spectrum for their transmission. The results also indicate that       ic interferences of the rail environment. Electromagnet-
the reward tends to be suitable to the distribution of spectrums      ic Compatibility, IEEE Transactions on, 54(3):667–676,
occupancy. A similar trend is observed in Figure 5(c), with           2012.
Channel 10 receiving the highest average reward of approxi-
                                                                   [Dybala and Radkowski, 2013] Jacek Dybala and Stanislaw
mately +5.2.
   Figure 5(d) and 5(e) show the cumulative number of chan-           Radkowski. Reduction of doppler effect for the needs of
nel switching using CBS-RL and CBS-RR schemes. The                    wayside condition monitoring system of railway vehicles.
result in Figure 5(d) shows the average number of channel             Mechanical Systems and Signal Processing, 38(1):125–
switches for the small topology. We observe that after learn-         136, 2013.
ing, the CBS-RL scheme tends to decrease number of channel         [Haykin, 2005] Simon Haykin. Cognitive radio: brain-
switching to 5, while CBS-RR keeps the channel switches               empowered wireless communications. Selected Areas in
to approximately 12. For the large topology in Figure 5(e),           Communications, IEEE Journal on, 23(2):201–220, 2005.
the CBS-RL scheme reduces the channel switches to 6, while         [isheng Zhao et al., 2013] isheng Zhao, Xi Li, Yi Li, and
CBS-RR keeps the channel switches approximately 23. The               Hong Ji. Resource allocation for high-speed railway
results indicate that our proposed CBS-RL approach can keep           downlink mimo-ofdm system using quantum-behaved par-
the channel switches lower than the CBS-RR approach and               ticle swarm optimization. In Communications (ICC),
converge to an optimal solution.                                      2013 IEEE International Conference on, pages 2343–
                                                                      2347. IEEE, 2013.
5   CONCLUSIONS                                                    [Jiang et al., 2011] Tianzi Jiang, David Grace, and Paul D
To address the issues of frequent channel switches and inef-          Mitchell. Efficient exploration in reinforcement learning-
ficient blind learning in high-speed rail, we propose a novel         based cognitive radio spectrum sharing. Communications,
concept of Cognitive Base Station, which has the capability           IET, 5(10):1309–1317, 2011.
of forecasting spectrum holes and assigning spectrum to indi-      [Kadam and Srivastava, 2012] Kaveri Kadam and Navin Sri-
viduals. Our simulation results prove that after autonomous           vastava. Application of machine learning (reinforcement
learning, the CBS-RL scheme can forecast spectrum holes.              learning) for routing in wireless sensor networks (wsns).
In this way, our proposed model can significantly improve             In Physics and Technology of Sensors (ISPTS), 2012 1st
the performance of vehicular communication, which can de-             International Symposium on, pages 349–352. IEEE, 2012.
crease cell-switching and unsuccessful transmission.               [Kim and Sung, 2014] Soyeon Kim and Wonjin Sung. Op-
                                                                      erational algorithm for wireless communication systems
References                                                            using cognitive radio. In Communication, Networks and
[Ai et al., 2014] Bo Ai, Xiang Cheng, Thomas Kurner,                  Satellite (COMNETSAT), 2014 IEEE International Con-
  Zhangdui Zhong, Ke Guan, Ruisi He, Lei Xiong, David W               ference on, pages 29–33. IEEE, 2014.
[Lee and Akyildiz, 2012] Won-Yeol Lee and Ian F Akyildiz.         cognitive radio in intelligent transportation system. In Ap-
  Spectrum-aware mobility management in cognitive radio           plied Mechanics and Materials, volume 743, pages 765–
  cellular networks. Mobile Computing, IEEE Transactions          773. Trans Tech Publ, 2015.
  on, 11(4):529–542, 2012.                                      [Zhang et al., 2012] Jiayi Zhang, Zhenhui Tan, Xiaoxi Yua,
[Letaief and Zhang, 2009] Khaled Ben Letaief and Wei              Haibo Wang, and Linwen Zhang. Review of public broad-
  Zhang. Cooperative communications for cognitive radio           band access systems for high-speed railways and key tech-
  networks. Proceedings of the IEEE, 97(5):878–893, 2009.         nologies. Journal of the China Railway Society, 34(1):46–
                                                                  53, 2012.
[Li and Zhao, 2012] Jinxing Li and Youping Zhao. Radio en-
   vironment map-based cognitive doppler spread compensa-       [Zhou and Ai, 2014] Yuzhe Zhou and Bo Ai. Quality of ser-
   tion algorithms for high-speed rail broadband mobile com-      vice improvement for high-speed railway communication-
   munications. EURASIP Journal on Wireless Communica-            s. Communications, China, 11(11):156–167, 2014.
   tions and Networking, 2012(1):1–18, 2012.                    [Zhu et al., 2013] Xiangqian Zhu, Shanzhi Chen, Haijing
[Li et al., 2013] Ying Li, Lei Lei, Zhangdui Zhong, and Siyu      Hu, Xin Su, and Yan Shi. Tdd-based mobile communica-
                                                                  tion solutions for high-speed railway scenarios. Wireless
   Lin. Performance analysis for high-speed railway com-
                                                                  Communications, IEEE, 20(6):22–29, 2013.
   munication network using stochastic network calculus. In
   Wireless, Mobile and Multimedia Networks (ICWMMN
   2013), 5th IET International Conference on, pages 100–
   105. IET, 2013.
[Liu et al., 2011] Qiuyan Liu, Miao Wang, and Zhangdui
   Zhong. Statistics of capacity analysis in high speed rail-
   way communication systems. Tamkang Journal of Science
   and Engineering, 14(3):209–215, 2011.
[Liu et al., 2012] Liu Liu, Cheng Tao, Jiahui Qiu, Houjin
   Chen, Li Yu, Weihui Dong, and Yao Yuan. Position-based
   modeling for wireless channel on high-speed railway un-
   der a viaduct at 2.35 ghz. Selected Areas in Communica-
   tions, IEEE Journal on, 30(4):834–845, 2012.
[Puterman, 1994] M. L. Puterman. Markov decision process-
   es. In Wiley, 1994.
[Sniady and Soler, 2012] Aleksander Sniady and Jose? Sol-
   er. An overview of gsm-r technology and its shortcomings.
   In ITS Telecommunications (ITST), 2012 12th Internation-
   al Conference on, pages 626–629. IEEE, 2012.
[Sutton and Barto, 1998] R. Sutton and A. Barto. Reinforce-
   ment Learning: An Introduction. Bradford Books, 1998.
[Tian et al., 2012] Lin Tian, Juan Li, Yi Huang, Jinglin Shi,
   and Jihua Zhou. Seamless dual-link handover scheme
   in broadband wireless communication systems for high-
   speed rail. Selected Areas in Communications, IEEE Jour-
   nal on, 30(4):708–718, 2012.
[Waltz and Fu, 1965] M. D. Waltz and K. S. Fu. A heuris-
  tic approach to reinforcment learning control systems. In
  IEEE Transactions on Automatic Control, 10:390-398.,
  1965.
[Wu et al., 2010] Cheng Wu, Kaushik Chowdhury, Marco Di
  Felice, and Waleed Meleis. Spectrum management of
  cognitive radio using multi-agent reinforcement learning.
  In Proceedings of the 9th International Conference on
  Autonomous Agents and Multiagent Systems: Industry
  track, pages 1705–1712. International Foundation for Au-
  tonomous Agents and Multiagent Systems, 2010.
[Wu et al., 2015] Cheng Wu, Yiming Wang, Xiang Qiang,
  and Zhaoyang Zhang. Adaptive spectrum management of