<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Workshop on Connecting Education and Research Communities for an Innovative Resource
Aware Society
haas@es-augsburg.de (F. Haas); altmeyer@es-augsburg.de (S. Altmeyer)
{ https://es-augsburg.de/haas (F. Haas); https://es-augsburg.de/altmeyer (S. Altmeyer)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A Prototyping and Evaluation Framework for Research on Timing-analysable Memory Hierarchies for Embedded Multicore SoCs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Florian Haas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sebastian Altmeyer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Augsburg</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Research on memory hierarchies regarding the non-functional requirements in embedded multicore systems demands for a framework to support the prototyping and evaluation of new methods. In current multicore processors, accesses on shared resources by arbitrary tasks lead to interferences, which can result in timing violations of high-priority tasks. However, incorporating all potential interferences in the schedulability analysis leads to an enormous overestimation of the task execution times, and requires a full analysis of all tasks running on the system. Enhancements in the memory hierarchy can provide isolation to restrict potential interferences, thus improving the worst-case performance. To research on modifications in the memory hierarchy of a multicore processor, a prototyping and evaluation framework is required. This paper describes the design of such a framework, and outlines the individual parts and their interconnections.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;parallel real-time system</kwd>
        <kwd>memory hierarchy</kwd>
        <kwd>FPGA prototyping framework</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The performance of multicore processors is strongly desired in various domains of embedded
systems to satisfy the increasing demand for computational power. Complex algorithms and
software systems, e. g. in autonomous driving, benefit from high-performance general-purpose
shared-memory multicores. However, these processors do not meet the typical requirements on
real-time and safety, and thus cannot be used without performance-degrading and laborious
software mechanisms. Elaborate methods in such systems have been developed to further
improve the average-case performance of the processor, for example the increasing depth of
the memory hierarchy. These and the shared resources, like last-level caches, buses, and main
memory, result in the ultimate challenge of calculating tight WCET bounds for the tasks in a
time-critical system.</p>
      <p>
        The crucial problem is the missing guaranteed freedom of interferences between tasks that run
on separate cores. Thus, an arbitrary low-priority task is able to influence the timing behaviour
of another, potentially high-priority task on a diferent core. This can happen through accesses
on shared resources, for example shared caches or the main memory [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. As a consequence, a
schedulability analysis of the overall system with only minimal overestimation becomes nearly
impossible for more than a few cores and deeper memory hierarchies.
      </p>
      <p>
        The general objective of research on this topic is to facilitate predictable performance, with
minimal over-estimation of timing bounds, by reducing the sources of potential interferences on
shared resources. Existing software-based approaches, e. g. performance counter monitors [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
or program modification during compilation, are limited, as they can either only detect excessing
interferences, or are required to be applied to all tasks of the system. Thus, hardware mechanisms
promise a better lever to control the behaviour of any task on the system. However, to research
hardware-implemented methods, a proper evaluation platform is required. For example, a
hardware implementation of a memory bandwidth reservation mechanism like MemGuard [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
could be evaluated and compared with other approaches. To research potential improvements
on shared resource accesses under timing constraints, a realistic model of a typical memory
hierarchy is needed in the first place. Microarchitecture simulators with multicore configurations
exist, but their processor-centric design does not support for a prototype implementation and a
realistic evaluation. Further, the evaluation system needs to be capable of executing realistic
benchmarks, for prototyping diferent ideas, as well as for a thorough evaluation of their impact
on the performance.
      </p>
      <p>
        Previous work focused mostly on fault tolerance of parallel systems, but the research always
involved shared-memory systems. Diferent systems have been used to evaluate the proposed
methods, from software-only approaches on typical desktop and server hardware, over the
Gem5 simulator, to FPGA prototypes. As a side efect of the conducted implementations and
evaluations, some experience with diverse platforms has been collected. The work on a
softwareonly fault tolerance mechanism [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] showed the numerous restrictions of an unmodifiable
hardware implementation. To overcome these limitations, later research was undertaken on the
Gem5 microarchitecture simulator, where a customized hardware transactional memory was
built into the memory hierarchy [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. However, since the simulator focuses on the detailed
simulation of the processor cores itself, it provides only a rather functional memory hierarchy
with limited timing accuracy. Switching to an FPGA prototype with multiple MicroBlaze
softcores [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] showed the dificulties of integrating hardware and software parts with non-open
processor cores. Overall, these experiences afirm the demand for an open system to prototype
and evaluate memory hierarchies for future research ideas.
      </p>
      <p>This paper describes a prototyping and evaluation framework for embedded multicore
systems, and outlines the assembly of the individual parts into a synthesizable design for both
simulation and prototyping on an FPGA. The framework is based on ChipYard, which supports
design and evaluation of full-system hardware, using the Rocket Chip generator and its in-order
RISC-V CPUs. The main benefit of ChipYard is the configurability and customizability of the
involved modules. The interconnects could also be replaced with a NoC to research on manycore
systems, or a combination of both with shared-memory clusters connected through a NoC.
Based on the proposed framework, the research on elements of the memory hierarchy will be
facilitated to improve the applicability of multicore processors in embedded systems.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Requirements for Research on Timing Predictable</title>
    </sec>
    <sec id="sec-3">
      <title>Shared-memory Multicore Systems</title>
      <p>To approach the objective of calculating tight WCET bounds for time-sensitive tasks in
sharedmemory multicore systems, the potential interferences on shared resources have to be identified
and measured first. While such evaluations can be performed on existing hardware, potential
new methods to prevent or restrict interferences require customisable hardware components.</p>
      <p>A system that enables the modification and enhancement of individual elements in the
memory hierarchy should fulfil the following requirements:
• Customisable hardware to extend or modify elements of the memory hierarchy
• Measurement of the overall performance and counting individual accesses on shared
resources
• Independence of CPU architectures
• Scalable number of processor cores
• Hardware cost estimation of extensions and customisations
• Fast response on functional correctness of the implementation
• Fast and approximate evaluation of the simulated model
• Accurate full-system evaluation on an FPGA
These requirements are satisfied by our proposed framework, for which the Chipyard project
provides a promising foundation. It is the predestined choice, since it is built around the open
RISC-V ecosystem, and allows to customize or replace individual elements of the memory
hierarchy. It further supports simulation and FPGA synthesis based on the same and identical
code.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Overview of the Framework</title>
      <p>The evaluation framework builds upon existing open-source projects that have been developed
in recent years around the prevalent RISC-V architecture.</p>
      <sec id="sec-4-1">
        <title>3.1. Chipyard</title>
        <p>
          Chipyard [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] simplifies the process of designing full-system hardware by integrating all
necessary parts from CPU cores to supplementing logic to connect the devices of an FPGA evaluation
board. Fig. 1 illustrates the individual parts of Chipyard: Processor cores can be created for
example with the Rocket Chip Generator, which generates configurable and customizable cores
that implement the RISC-V instruction set, either in-order Rocket cores, or the more complex
and powerful out-of-order BOOM cores. Beside the L1 caches provided by the Rocket Chip
Generator, secondary level caches and diferent kinds of interconnecting buses can be generated.
There is also code provided to connect to and communicate with peripheral devices like UART
and JTAG.
        </p>
        <p>The generated Verilog code can be further compiled with Verilator for a simulation of the
overall system, or with FireSim, which additionally allows to simulate DDR3 main memory.</p>
        <p>Rocket Chip Generator</p>
        <p>Peripherals
FPGA shells</p>
        <p>Verilator / FireSim
FPGA prototype</p>
        <p>Simulation</p>
        <p>Alternatively, individual FPGA shells wrap the code with a harness to connect the units of the
SoC to I/O pins of a concrete FPGA, to build a prototype running on an FPGA evaluation board.
Such prototype is able to communicate with the built-in peripheral devices like UART and JTAG,
as well as the of-chip DDR memory.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Rocket Chip Generator</title>
        <p>
          The Rocket Chip Generator [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] produces designs of a SoC with multiple processor cores, a
memory hierarchy, and interconnects. Fig. 2 depicts a generated chip with four processor tiles,
consisting of an in-order Rocket RISC-V core and L1 instruction and data caches, L2 cache banks
with the memory bus, and additional buses for peripheral devices, DMA devices, and control
units like the boot ROM and interrupt controllers. All processor tiles and all individual buses
are connected through a shared system bus, which is typically implemented as a crossbar, but
can also be configured as a ring bus.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>3.3. Memory Hierarchy Evaluation Framework</title>
        <p>A common objective of research on memory hierarchies for real-time systems is to reduce
interferences on shared resources. From this, the main elements of the system under evaluation
are derived: All units that control access to shared resources, like the peripheral bus, or the L2
cache, are of interest, as well as the private L1 caches that are connected to the shared system
bus. In Fig. 3, these elements are shown below the processor cores, which are not of special
interest for interference analysis. All accesses to shared resources that originate in the cores
have to pass through the L1 instruction or data caches, which can control the communication.</p>
        <p>The prototyping flow from implementing a design of one or more specific parts of the memory
hierarchy to code generation and simulation or evaluation is depicted in Fig. 4. Unit tests can
Rocket Chip</p>
        <p>Rocket Tile</p>
        <p>Rocket Tile</p>
        <p>Rocket Tile</p>
        <p>Rocket Tile</p>
        <p>Core
L1I L1D</p>
        <p>Core
L1I L1D</p>
        <p>Core
L1I L1D</p>
        <p>Core</p>
        <p>L1I L1D
System Bus</p>
        <p>Periph. Bus
L2 Bank</p>
        <p>L2 Bank</p>
        <p>Ctrl. Bus</p>
        <p>Front Bus
Memory Bus</p>
        <p>Boot ROM, Intrs., . . .</p>
        <p>DMA
provide fast checks of the functional correctness of the implemented or modified mechanisms.
After passing these tests, Verilog code is generated, which can be simulated with Verilator
to test the design with a set of benchmarks. The simulation provides fast feedback on the
behaviour of the system, to compare diferent potential implementations before running the
full evaluation of the synthesised bitstream on the FPGA. The evaluation of the design on the
FPGA provides accurate timing measurements of the individual tasks, and a trace log of accesses
on shared resources. These results allow to quantify the improvements of the implemented
memory hierarchy modifications, and enable the detection of timing violations or forbidden
Core</p>
        <p>Core
L1I L1D</p>
        <p>L1I L1D</p>
        <p>System Bus
L2 Cache</p>
        <p>Periph. Bus
interferences that should not occur.</p>
        <p>The possibility to connect a debugger to the simulation, as well as to the system on the
FPGA, facilitates the detection of implementation faults, and provides detailed insight into the
behaviour of the system under specific circumstances when needed.</p>
        <p>With the feedback loop between the design and the simulation, available computational
capabilities can be leveraged to compare numerous diferent design variations, to select a few
designs of interest for the full evaluation of the FPGA.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusion &amp; Future Work</title>
      <p>This paper described the design of a prototyping and evaluation framework to research on
memory hierarchies, for getting closer to the overall objective of enabling high-performance
multicore processors in embedded real-time systems. The framework is built upon existing
open-source projects around the RISC-V architecture, connecting the diferent tools together. It
integrates all the required steps to automatically generate the Verilog code, compile and run the
simulation, to synthesise the bitstream and program the FPGA with it, and to run the evaluation.</p>
      <p>The next step is to implement the basic tool chain for automatic unit tests, code generation,
simulation, and synthesis. Afterwards, measurement facilities in the individual components
of the memory hierarchy have to be added to evaluate the behaviour of the system under
parallel workloads. Such workloads first have to be identified based on use-cases from diferent
industries, and reconstructed by a set of diferent benchmarks.</p>
      <p>Based upon the proposed framework, research on new approaches for controlling
interferences on shared resources within shared-memory multicores can take of.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is partially supported by the CERCIRAS COST Action no. CA19135 funded by COST.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Maiza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Rihani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Rivas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Goossens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Altmeyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. I.</given-names>
            <surname>Davis</surname>
          </string-name>
          ,
          <article-title>A Survey of Timing Verification Techniques for Multi-Core Real-Time Systems</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>52</volume>
          (
          <year>2019</year>
          ). doi:
          <volume>10</volume>
          .1145/3323212.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Freitag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Uhrig</surname>
          </string-name>
          , T. Ungerer,
          <article-title>Virtual timing isolation for mixed-criticality systems</article-title>
          ,
          <source>in: Euromicro Conference on Real-Time Systems (ECRTS)</source>
          ,
          <year>2018</year>
          , pp.
          <volume>13</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          :
          <fpage>23</fpage>
          . doi:
          <volume>10</volume>
          .4230/ LIPIcs.ECRTS.
          <year>2018</year>
          .
          <volume>13</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Yun</surname>
          </string-name>
          , G. Yao,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pellizzoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Caccamo</surname>
          </string-name>
          , L. Sha,
          <article-title>MemGuard: Memory Bandwidth Reservation System for Eficient Performance Isolation in Multi-core Platforms</article-title>
          ,
          <source>in: RealTime and Embedded Technology and Applications Symposium (RTAS)</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>55</fpage>
          -
          <lpage>64</lpage>
          . doi:
          <volume>10</volume>
          .1109/RTAS.
          <year>2013</year>
          .
          <volume>6531079</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Haas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Weis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ungerer</surname>
          </string-name>
          , G. Pokam,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>Fault-Tolerant Execution on COTS Multi-core Processors with Hardware Transactional Memory Support</article-title>
          ,
          <source>in: Architecture of Computing Systems (ARCS)</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>16</fpage>
          -
          <lpage>30</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -54999-
          <issue>6</issue>
          _
          <fpage>2</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Haas</surname>
          </string-name>
          ,
          <article-title>Fault-tolerant Execution of Parallel Applications on x86 Multi-core Processors with Hardware Transactional Memory</article-title>
          ,
          <source>Phd thesis</source>
          , Universität Augsburg,
          <year>2019</year>
          . URL: https://opus.bibliothek.uni-augsburg.de/opus4/59566.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Amslinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Weis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Piatka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haas</surname>
          </string-name>
          , T. Ungerer,
          <article-title>Redundant Execution on Heterogeneous Multi-cores Utilizing Transactional Memory</article-title>
          ,
          <source>in: Architecture of Computing Systems (ARCS)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>155</fpage>
          -
          <lpage>167</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -77610-1_
          <fpage>12</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Piatka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Amslinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Weis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Altmeyer</surname>
          </string-name>
          , T. Ungerer,
          <article-title>Investigating transactional memory for high performance embedded systems</article-title>
          ,
          <source>in: Architecture of Computing Systems (ARCS)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>108</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -52794-
          <issue>5</issue>
          _
          <fpage>8</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Amslinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Piatka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Weis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ungerer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Altmeyer</surname>
          </string-name>
          ,
          <article-title>Hardware multiversioning for fail-operational multithreaded applications</article-title>
          ,
          <source>in: International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>20</fpage>
          -
          <lpage>27</lpage>
          . doi:
          <volume>10</volume>
          . 1109/SBAC-PAD49847.
          <year>2020</year>
          .
          <volume>00014</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Amid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Biancolin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Grubb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Karandikar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Liew</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Magyar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Pemberton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rigge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wright</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. S.</given-names>
            <surname>Shao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Asanović</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Nikolić</surname>
          </string-name>
          , Chipyard: Integrated Design, Simulation, and
          <article-title>Implementation Framework for Custom SoCs</article-title>
          , IEEE Micro 40 (
          <year>2020</year>
          )
          <fpage>10</fpage>
          -
          <lpage>21</lpage>
          . doi:
          <volume>10</volume>
          .1109/MM.
          <year>2020</year>
          .
          <volume>2996616</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>K.</given-names>
            <surname>Asanovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Avizienis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bachrach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Beamer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Biancolin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Celio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Cook</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dabbelt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hauser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Izraelevitz</surname>
          </string-name>
          , et al.,
          <source>The Rocket Chip Generator</source>
          ,
          <source>Technical Report UCB/EECS2016-17</source>
          , EECS Department, University of California, Berkeley,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>