<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Hardware acceleration for ultra-fast Neural Network training on FPGA for MRF map reconstruction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mattia Ricchi</string-name>
          <email>mattia.ricchi@phd.unipi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Pisa</institution>
          ,
          <addr-line>Largo Bruno Pontecorvo 3, 56127, Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Insitute of Nuclear Physics, Division of Bologna</institution>
          ,
          <addr-line>Viale Carlo Berti Pichat 6/2, 40127, Bologna</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <abstract>
        <p>Magnetic Resonance Fingerprinting (MRF) is a fast quantitative MR Imaging technique that provides multiparametric maps with a single acquisition. Neural networks (NNs) accelerate reconstruction but require significant resources for training. We propose an FPGA-based NN for real-time brain parameter reconstruction from MRF data. Training the NN takes an estimated 200 seconds, significantly faster than standard CPU-based training, which can be up to 250 times slower. This method could enable real-time brain analysis on mobile devices, revolutionising clinical decision-making and telemedicine.</p>
      </abstract>
      <kwd-group>
        <kwd>magnetic resonance fingerprinting</kwd>
        <kwd>neural network</kwd>
        <kwd>hardware acceleration</kwd>
        <kwd>FPGA</kwd>
        <kwd>real-time</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        supported by artificial intelligence (AI) in data analysis [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. A key AI application in MRI is the
      </p>
      <p>CEUR</p>
      <p>ceur-ws.org
various platforms and clinical environments.</p>
      <p>The purpose of this work involves hardware programming for an FPGA-accelerated NN training
algorithm for the reconstruction of MR parameters (T1 and T2) using clinical MRF. To test the ability
to accelerate the training process on FPGA, the original NN must first be redesigned, i.e., simplified
and quantized, to meet the available resources of the hardware accelerator. This would result in an
important reduction in training time and power consumption.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Matherials and Methods</title>
      <p>
        The NN model by Barbieri et al. [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] is a feedforward network with nine fully connected layers. It uses
ReLU activations for the first eight layers and a linear activation for the output layer. The model inputs
are the real and imaginary parts of MRI signals and outputs T1 and T2 quantitative maps. Training was
supervised using the Mean Squared Error (MSE) loss function, over 500 epochs with 1000 gradient steps
each, a learning rate of 10−4, optimized with Adam Optimiser [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], implemented with Keras TensorFlow
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], taking around 16 hours on an AMD Ryzen 9 3900 CPU. To fit FPGA resources, the first two layers
were removed and the network was retrained on the original dataset of 250M MRF simulated signals.
Performance was evaluated on 5000 new synthetic signals. Quantization Aware Training (QAT) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
was applied to use lower precision (integer parameters) without degrading performance.
      </p>
      <p>
        A low-level HDL design approach, in which every firmware component is written in VHDL, without
any high-level synthesis support, has been selected, ensuring full control and data protection through
an on-FPGA firewall security algorithm [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The ALVEO U250 FPGA board, with 1.7M LUTs, 3.4M FFs,
12k DSPs, and 2.6k BRAMs, was selected for implementation. Firstly, the behaviour function of a single
node was implemented as given in Eq. (1).
(1)
(2)
      </p>
      <p>This function was generically implemented once and then used all the necessary times to cover all
the node operations present in the NN. Proper functioning was verified by deploying 16 nodes in the
FPGA and comparing their output with those of Python. Secondly, the backpropagation algorithm was
implemented. As a starting point, the simple stochastic gradient descent was chosen, which describes
how the parameters of the NN, i.e. weights and biases, are updated at each iteration during training,
following Eq. (2).</p>
      <p>Finally, an assessment was conducted to evaluate the resource requirements for node operations,
backpropagation, and memory storage on the FPGA, involving the necessary LUTs, DSPs, and FFs.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Results and Discussion</title>
      <p>design. In our scenario, based on the synthesis outcomes of a single node and the backpropagation</p>
      <p>=1
 =  ( ∑   ⋅   +  )
  = ((w+1 )  +1 ) ∘  ′( )</p>
      <p>ℒ
 w
=  −1  
and
ℒ
  =  
MAPE (%)</p>
      <p>MPE (%)
RMSE (ms)
process, a clock frequency of 200 MHz is totally feasible, with the possibility of increasing it to 250
MHz. Based on the estimation of the required resources for executing all the necessary operations
on the FPGA, the whole network and backpropagation algorithm cannot be implemented on the
FPGA but, it is feasible to implement 16 nodes of the second layer on the FPGA, as well as the
backpropagation between the layers containing 16 and 32 nodes. Thus, by iterating these two blocks
multiple times in a semiparallelised way, all the operational requirements of the network can be covered.</p>
      <p>To verify the correct implementation of the single node function in VHDL, the outputs produced on
both the FPGA and in software were compared after providing identical inputs, weights, and biases.
The comparison produced promising results, as there was no diference between the Python outputs
and those of the FPGA, indicating that the mathematical operations were correctly translated into VHDL.</p>
      <p>The estimate of the necessary FPGA resources was 145k LUTs, 5k DSPs, and 146k FFs. This implies
that the entire NN and backpropagation use 8% of the available LUTs and 40% of the available DSPs,
demonstrating that the algorithm’s implementation is entirely viable from the resource point of view.
PCI Express technology was chosen to communicate from the PC’s CPU to the FPGA and back resulting
in additional resources of 83k LUTs, 148kn FFs and 150 BRAMs, the internal RAM memories of the FPGA.</p>
      <p>Finally, a fairly accurate estimate of the training time can be made. Each node needs 4 clock cycles to
perform its operations. The 16 nodes implemented on the FPGA work in a semi-parallel way, resulting
in 56 clock cycles required for all levels. Similarly, the single backpropagation module requires 3 clock
cycles, iterating through the entire process for a total of 104 clock cycles. With a clock frequency of 200
MHz, the clock period is 5 ns, considering 250M training data the total training time results in:
(5 ⋅ (250′000′000 ⋅ (56 + 104)))= 200 s
(3)</p>
      <p>This result highlights how the NN can be trained on FPGA in less than 5 minutes, which is 200 times
faster than the corresponding training on CPU. The proposed method poses a big step in the direction
of real-time and personalized healthcare, opening the possibility of having an integrated NN hardware
accelerator for map reconstruction inside the MRI scanner.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>The author would like to thank all the people who contributed to this work: Fabrizio Alfonsi (INFN
Bologna), Camilla Marella (University of Bologna), Marco Barbieri (Stanford University), Alessandra
Retico (INFN Pisa), Leonardo Brizi (University of Bologna), Alessandro Gabrielli (University &amp; INFN
Bologna), Claudia Testa (University &amp; INFN Bologna).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Gore</surname>
          </string-name>
          , Artificial intelligence in medical imaging,
          <source>Magnetic Resonance Imaging</source>
          <volume>68</volume>
          (
          <year>2020</year>
          )
          <fpage>A1</fpage>
          -
          <lpage>A4</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/S0730725X19307556. doi:https: //doi.org/10.1016/j.mri.
          <year>2019</year>
          .
          <volume>12</volume>
          .006.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Shen</surname>
          </string-name>
          , G. Wu,
          <string-name>
            <surname>H.-I. Suk</surname>
          </string-name>
          ,
          <article-title>Deep learning in medical image analysis</article-title>
          ,
          <source>Annual Review of Biomedical Engineering</source>
          <volume>19</volume>
          (
          <year>2017</year>
          )
          <fpage>221</fpage>
          -
          <lpage>248</lpage>
          . URL: https://www.annualreviews. org/content/journals/10.1146/annurev-bioeng-
          <volume>071516</volume>
          -
          <fpage>044442</fpage>
          . doi:https://doi.org/10.1146/ annurev- bioeng
          <string-name>
            <surname>-</surname>
          </string-name>
          071516- 044442.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Gulani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Seiberlich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sunshine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Duerk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Griswold</surname>
          </string-name>
          , Magnetic resonance ifngerprinting,
          <source>Nature</source>
          <volume>495</volume>
          (
          <year>2013</year>
          )
          <fpage>187</fpage>
          -
          <lpage>92</lpage>
          . doi:
          <volume>10</volume>
          .1038/nature11971.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Barbieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Brizi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Giampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Solera</surname>
          </string-name>
          , G. Castellani,
          <string-name>
            <given-names>C.</given-names>
            <surname>Testa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Remondini</surname>
          </string-name>
          ,
          <article-title>Circumventing the curse of dimensionality in magnetic resonance fingerprinting through a deep learning approach</article-title>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Barbieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Brizi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Giampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Solera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Manners</surname>
          </string-name>
          , G. Castellani,
          <string-name>
            <given-names>C.</given-names>
            <surname>Testa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Remondini</surname>
          </string-name>
          ,
          <article-title>A deep learning approach for magnetic resonance fingerprinting: Scaling capabilities and good training practices investigated by simulations</article-title>
          ,
          <source>Physica Medica</source>
          <volume>89</volume>
          (
          <year>2021</year>
          )
          <fpage>80</fpage>
          -
          <lpage>92</lpage>
          . doi:
          <volume>10</volume>
          .1016/j. ejmp.
          <year>2021</year>
          .
          <volume>07</volume>
          .013.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sanaullah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Alexeev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yoshii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Herbordt</surname>
          </string-name>
          ,
          <article-title>Real-time data analysis for medical diagnosis using fpga-accelerated neural networks</article-title>
          ,
          <source>BMC Bioinformatics 19</source>
          (
          <year>2018</year>
          ).
          <source>doi:10.1186/ s12859- 018- 2505- 7.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <article-title>Mri-based brain tumor segmentation using fpga-accelerated neural network</article-title>
          ,
          <source>BMC bioinformatics 22</source>
          (
          <year>2021</year>
          )
          <article-title>421</article-title>
          . doi:
          <volume>10</volume>
          .1186/s12859- 021- 04347- 6.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sanaullah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Alexeev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yoshii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Herbordt</surname>
          </string-name>
          ,
          <article-title>Real-time data analysis for medical diagnosis using fpga-accelerated neural networks</article-title>
          ,
          <source>BMC Bioinformatics 19</source>
          (
          <year>2018</year>
          ).
          <source>doi:10.1186/ s12859- 018- 2505- 7.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          ,
          <article-title>Adam: A method for stochastic optimization</article-title>
          ,
          <source>International Conference on Learning Representations</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Abadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Barham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Brevdo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Citro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Corrado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Davis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Devin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghemawat</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Harp</surname>
          </string-name>
          , G. Irving,
          <string-name>
            <given-names>M.</given-names>
            <surname>Isard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kudlur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Levenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zheng</surname>
          </string-name>
          , Tensorflow : Large-scale
          <source>machine learning on heterogeneous distributed systems</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B.</given-names>
            <surname>Jacob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kligys</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Howard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Adam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kalenichenko</surname>
          </string-name>
          ,
          <article-title>Quantization and training of neural networks for eficient integer-arithmetic-only inference</article-title>
          ,
          <year>2017</year>
          . URL: https: //arxiv.org/abs/1712.05877. arXiv:
          <volume>1712</volume>
          .
          <fpage>05877</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Grossi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alfonsi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Prandini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gabrielli</surname>
          </string-name>
          ,
          <article-title>A high throughput intrusion detection system (ids) to enhance the security of data transmission among research centers</article-title>
          ,
          <source>Journal of Instrumentation</source>
          <volume>18</volume>
          (
          <year>2023</year>
          )
          <article-title>C12017</article-title>
          . URL: https://dx.doi.org/10.1088/
          <fpage>1748</fpage>
          -0221/18/12/C12017. doi:
          <volume>10</volume>
          .1088/
          <fpage>1748</fpage>
          - 0221/ 18/12/C12017.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>