<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Hardware Usage Improvement for Small Data Problem Solving by Deep Learning Methods</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Iurii Krak</string-name>
          <email>yuri.krak@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olexander Barmak</string-name>
          <email>lexander.barmak@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vladislav Kuznetsov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Serhii Kondratiuk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleg Stelia</string-name>
          <email>oleg.stelya@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Veda Kasianiuk</string-name>
          <email>veda.kasianiuk@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Glushkov Cybernetics Institute</institution>
          ,
          <addr-line>Kyiv, 40, Glushkov ave., 03187</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Khmelnytskyi National University</institution>
          ,
          <addr-line>11, Institutes str., Khmelnytskyi, 29016</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Taras Shevchenko National University of Kyiv</institution>
          ,
          <addr-line>Kyiv, 64/13, Volodymyrska str., 01601</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This article dedicated to the question of improvement of deep calculations, particularly, the price of appliances, the price of electricity and others, which may, in turn, affect not only the price of research but also repeatability and consistency of the experiments. According to the research, we propose an idea, which should be useful for sophisticated calculations, such as small data machine learning using different constructs of neural networks, including RCNNs, DCGANs, which are known to be very time and memory consuming methods if applied on a particular data. Here, the improvement underlies the usage of our own approaches to process data, to compare hardware using well known techniques to benchmark hardware as central processing units (CPUs) as well as graphic processing units (GPUs). We use this straightforward approach in our experiments in order to assess the hypotheses about computational devices as well as algorithms, using proven and online available datasets. This in turn, helped us to infer the device properties based upon their behavior in experiments and, hence, enhance the productivity and quality of our experiments. For our experiments, we used a wide range of libraries such as scikit-learn, TensorFlow, Pytorch and Direct ML for the hands-on available desktop and laptop computing devices running CPUs and GPUs as well. According to the study, we figured the main differences in behavior of CPUs and CPUs based upon available hands-on in the particular situation - training of deep neural networks. Our findings gave us the possibility, with some confidence, to choose appropriate devices for the deep learning tasks based on their sophistication.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Deep learning</kwd>
        <kwd>training</kwd>
        <kwd>hardware</kwd>
        <kwd>network architecture</kwd>
        <kwd>Pytorch</kwd>
        <kwd>Tensorflow</kwd>
        <kwd>DirectML</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In these years, a various number of approaches to process a data were developed, including, for
instance new methods of deep learning, as well as computational devices. This development, in
particular, was good for large enterprises, as well as big international scientific institutions. However,
it didn’t fit well for all of the tasks. As a result of this, some individual researchers have found that a
large scale (“big”) problem that needed a lot of resources may need some specific set of software,
opposing to ones that were small, more intricate and hence very applied tasks of a small scale (“small”)
problem. Hence, one can formulate this as follows: the set of methods, used for large jobs (large
datasets, data with big variety, volume, velocity, or known as “big data”) need quite a different set of
methods or approaches for much smaller problem solving (so-called “small data”) [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ]. It means, as a
result, that individual researchers, before using some cloud solutions as well as investing in more
expensive hardware, make use all of the devices they have available hands-on, like gaming graphics
cards, cards for data centers and built-in video cards for laptops [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. As a result of this, there has to be
a stage for a computation cost estimation before upgrading on more advanced hardware. However, one
may argue, that the misconception about using much more advanced hardware may not always be useful
and hence economically feasible: the graphic cards for large workloads (for big datacenters, for
instance) may have seem an overkill for the simple job, and regular low tier consumer-grade hardware
may not work intermediate jobs, since on some steps of calculation, there may appear more clearly
some memory bottlenecks, that may in turn create performance decline.
      </p>
      <p>
        Why is it so important? During the last years, in the 2020-2021, because of COVID-19 pandemic,
there was a significant shortage of semiconductors, in particularly consumer grade graphics cards, used
so widely in research. This caused drastic effects in global supply chain, not only in advanced
semiconductors, but also other types hardware as well; which in turn made this hardware very expensive
– so the cost of research too. The availability of the low or entry level hardware, was also questionable.
While the desktop processors were not that hardly affected by the costs (not more than 30% increase in
price), in contrary, the graphics processors were hugely affected, when the prices raised twofold or
more. It created a dilemma: on one hand they were available, on the other hand they cost a lot of money,
so the researcher had to decide the priority – being able to study the problem in detail in large scale or
postponing the research. One may argue that some graphic cards were less affected by the price change
like graphic cards for data centers and for businesses (Nvidia Quadro and AMD FirePro series) and one
could take a hand on one to continue the research, however, they were likely part time solutions. After
few years of advancements in technology and restoration of global supply chain these effects were
highly mitigated. As a result, as of today, the problem has slightly resolved. Firstly, there are now three
companies that make graphic cards for computers: Nvidia, AMD and Intel, that are available for the
research purposes. Secondly, now there are many proprietary and open-source libraries that can
available to use the graphic cards of any of 3 vendors available on the market – for instance libraries
such as CUDA [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] (Nvidia), Microsoft DirectML [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] (AMD and mobile devices), Intel One API [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
(Intel graphic cards and computing devices). Also, as a promising feature – there is quite a big
availability of multi-core processors, made by 2 of these vendors, which gives the hope to use them for
specific cases. However, we must not underestimate the cases when such a disruption may occur in a
global supply chain again. Even now there seem some possible situations, when this problem may occur
again in larger scale because of some economical or geo-political problems. So the presence of the three
vendors is good, but we have to put the question: what would we do, if similar problem happens again?
If such a problem may have happened again, individual researchers should consider finding such
solutions, that either may help them utilize outdated hardware instead of investing in it when it is not a
feasible solution. Discussing such a problem in a retrospective, we want to address again such a
problem: when a researcher faces a problem in buying some hardware for his research, because of
external effects, affecting his solution, he has to choose to either to apply ineffective solution or to refit
or fine-tune the hardware for a specific job (scaling down the machine learning model or use it for tasks
like small data learning), so as the research could have been continued in place, and not being affected
by some external factors.
      </p>
      <p>We believe that the solution, that makes one to adapt is a better one.</p>
      <p>Therefore, the purpose of this investigation is to find given hypothetical problem solution. For this
purpose, we proposed to formulate and solve the following set of problems:
• to propose an idea for testing and comparing the performance of individual devices;
• to propose tasks for comparing (benchmarking) devices in deep learning tasks;
• to find the lower and upper limits (performance margin) of the use of such devices;
• to find an opportunity to use more computing power for a more complex task;
• to provide a simple solution for selecting the appropriate device for the job.</p>
      <p>According to these problems, we split up our paper in sections as follows: in chapter 2 we discuss
the related works and possible solutions, in chapter 3 we discuss the experiments on benchmarking, in
chapter 4 we address some problems that may occur in experiments, in chapter 5 we discuss the results
of our experimental study and in 6 – we write some key insights about the study.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works and Proposed Solution</title>
      <p>This section discusses the problem of small data and the approaches to process it with a range of
hardware available for data analysts and scientists in artificial intelligence area.
2.1.</p>
    </sec>
    <sec id="sec-3">
      <title>The small data problem and the solution</title>
      <p>
        If we compare the availability a high-performance computation solution from the start of computing
to a nowadays, we see quite many shifts of paradigm of computing – centralized, decentralized, remote
and local, which made a big turn. Nowadays, however the users have a possibility to choose whether to
use remote solutions, like cloud services or to use local services based upon the desktop computer or
small computational server. The benefits of the latest solution are obvious – full control on the
experiment, since all the available resources are dedicated to a specific computational task. However,
despite such wide range of possibilities, sometimes the researchers have to guess the setup, in order to
build up the experimental system for a specific purpose [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In order to do so, there must be a measure
to estimate the capabilities of current system. Using well-known methods, one can identify the key
relationships between the used resources on a small- and the big-scale task, based on a test data.
      </p>
      <p>Sometimes, it may happen, that such tasks to be solved, lay just between the area of small-scale tasks
and really bigger tasks, which have to be made on dedicated servers or cloud solutions. We consider
naming these tasks “small data solving” tasks, since they lay behind cutting edge solutions (big models)
and there is a margin for unknown, where in order to obtain the parameters of the desired system,
sometimes is better to use a small data task in order to estimate the capabilities, to seek the positive and
negative parts of the used solution.</p>
      <p>Since, here we discuss mainly on commonly available desktop or laptop solutions, we will focus on
the capabilities of their architecture – CPU, GPUs (if available) and memory of the system.</p>
      <p>
        To figure out the unknown part, given a small data solving task, we think is a better way is to use
commonly used datasets, related to image recognition, machine learning, as well as proper hardware
that was proven suitable perform such a task [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. Using such an approach, we can be reassured, that
not only the proposed hardware answers our needs in terms of efficiency, raised power consumption
and its own costs, but also suitability for a given small data solving task, which to be discussed further
in the paper, using given hardware and proposed methods. We propose, in order to do so, also compare
the capabilities of the processors against the graphic processing units, available in our laboratory, so as
we can infer some key insights from this comparison as well.
2.2.
      </p>
    </sec>
    <sec id="sec-4">
      <title>The general idea to estimate the efficiency of a computational device</title>
      <p>
        We want to address here that despite the fact a lot of a solutions are available online, we can’t just
easy rely upon them as well, as their metrics: for instance, there are a plenty of solution to estimate the
calculation capability of a computer system, using a benchmarks to estimate a performance level of a
system in one specific task (for instance, calculating the lighting in the 3D scene, encoding the video,
calculating some complex mathematical expression or formula [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10-12</xref>
        ]), in general these solutions may
give a raw estimate for a performance [
        <xref ref-type="bibr" rid="ref13 ref14 ref15">13-15</xref>
        ]. For obvious reasons, the estimate of performance in a
3D rendering does not mean it works the same way to solve a system of linear equations, as well as
training a deep convolution network. For these purposes, we must develop our benchmarks, in order to
be more sure about our assumptions. We also may be aware of utilization of the system – both
underutilization, as well as overload, since in both cases they may affect the results of the benchmarking and
our assumption about efficiency of a certain computational device.
      </p>
      <p>
        However, this may be helpful enough to assume either it is suitable for specific tasks we need. For
these specific purposes we will focus on different machine learning models, as well data dimensionality
reduction methods, such as singular value decomposition [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], T-stochastic neighbor embedding [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]
and autoencoders. We also consider widening our experiments to compare the behavior of different
devices on deep neural network constructs, so as we can leverage most valuable information of behavior
of our test systems in different conditions. We hope this may be very helpful for a specific task, related
to a small data problem solving. Using our estimates, we can check either we use a proper hardware for
a given task, or it has to be upgraded; we also consider some techniques in order to enhance our data or
tweak the hyperparameters of the methods we are using, which may be helpful for dimensionality
reduction, data classification and clustering tasks, which are tightly connected to small data and its
processing, as we will discuss more in detail further in paper.
      </p>
    </sec>
    <sec id="sec-5">
      <title>3. Experiments with CPU devices and general approach for task solving</title>
      <p>This section discusses mostly the experiments made with the general purpose CPU and CPU devices
for home desktop machines to implement different tasks for machine learning, in particular training
deep neural network models, using our hands-on equipment available in the laboratory.
3.1.</p>
    </sec>
    <sec id="sec-6">
      <title>The general scheme of an experiment</title>
      <p>In order to proceed an experiment, we propose an approach, that allows us carefully to study different
devices and make a raw estimate about their performance. In order to do so, we benefit from a controlled
environment, which contents of a local or remote desktop machine (depending on the task), which allow
all necessary controls to measure the utilization of resources, using either standard task manager (or
device monitor), or the specific software, dedicated to a device (for instance GPU resource monitor).
We benefit from this data, since we can establish usage of the resources before and under load. Since,
our experiment would be likely less affected by background tasks.</p>
      <p>
        In order to perform a data analysis task, we carefully prepare the data into our environment; the key
is to keep the data within the system memory (either RAM, video RAM or RAM and VRAM
combined), so as we would not be affected by caching the data from the disk, but focus on RAM-VRAM
transfers, that would be likely faster than using page file. The data are pre-extracted features, generated
by cascading model, which includes: singular value decomposition (SVD) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], T-stochastic neighbor
embedding (T-SNE) [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], and K-means clustering [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. This allows us for being focused solely on
execution time and the factors that may affect it in deep neural network tasks, such as initialization of
the network, training of the network, as well as testing the network on the test data.
      </p>
      <p>In order to document an experiment, we will follow the info from the resource monitor, as well as
execution in the Jupyter environment – the console of the Jupyter server (it prints timestamps when the
work stopped and autosave was created), as well as Jupyter notebook output and the resource monitor.
Using such a means, we can control the flow of the experiment and identify if the Python kernel may
have stopped or there is throttling of the processor or the GPU. In theory, based solely on the graph, we
could identify the boundaries of each cycle of load – either next epoch or loading a new batch of the
data. This may also help to figure the load (in %) and identify under load or overload.</p>
      <p>After the experiment we focus not only on numerical data, but general performance, making some
notes in our workbook if there were some issues with some device or algorithm and why they may have
happened. Then, we will carefully analyze the behavior of this device few more times to find the
possible hints of such behavior. We consider scaling the task, so it may help identify problems.</p>
      <p>Since our test systems consist of different devices, we will try to benefit from their features, if there
are available. We also analyze the behavior of the system, e.g. performance rate per iteration.</p>
      <p>Since our scheme touches both processors and GPU devices, we will use it as a general template and
adapt it, with respect to properties of each device, so they won’t differ significantly. In general, this is
quite a straightforward approach, which differs slightly for the processors or GPU devices.
3.2.</p>
    </sec>
    <sec id="sec-7">
      <title>The theoretical and practical performance of processors and GPUs</title>
      <p>Let’s discuss more in detail our experimental setup to test different processors, we tested hands-on
in our laboratory environment. It comprised of set of desktop and laptop computers, based on Intel and
AMD processors such as: AMD A10-9620P, Intel Core i5-6600k, AMD Ryzen 5 3600X, AMD Ryzen
7 4800H using Windows 10 and Linux (Ubuntu 18 and OpenSuse 15) computing environment.</p>
      <p>We decided to set up our systems to lower computational boundary or 16 GB of RAM, because
some of our experiments over exceeded the 8 GB memory in the most of the systems we had hands on.
Since we decided to focus our experiments on a specific pieces of hardware that satisfied this minimal
requirement (system memory size). This requirement is caused by the fact, that we tested our equipment
against different algorithms, like deep neural networks, some of them were computationally intense and
needed more than 8 GB of memory. Hence, the intents of our experiments were not to prefer one device
over another, as well as some vendor over another, but explore their capabilities, in context of different
state-of-art methods of deep learning. We hope that after some tweaks, some of these devices and
systems could pass the most experiments in the future.</p>
      <p>
        In our text environment, we tested 2 different computers with AMD Zen2 [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] processors, including
1 desktop processor (AMD Ryzen 5 3600X) and one for mobile systems (AMD Ryzen 7 4800H). The
specifications of the processors are depicted in the Table 1; note, we used also performance metric,
using a well known PassMark benchmark and PassMarkScore [20].
      </p>
      <p>The reason we are using this benchmark quite straightforward: this text includes a range of different
tests, that involve the different capabilities of the test system and averaged between the tests. This makes
us confident about the results to compare the benchmark performance with our experimental results
related to deep learning (e.g training, testing and initialization).</p>
      <p>In order to perform our tests, we make use of verified and well-known software and libraries, being
widely used in scientific and data analyst community. We benefit from Jupyter test environment, based
on Anaconda 3.7 Python distribution for all of our environments (Windows and Ubuntu). In order to
perform deep learning tasks, we utilize TensorFlow [21]. We also tried to use CUDA acceleration in
some test systems, if it was enabled (the desktop machine had a GPU, but it was disabled).</p>
      <p>Since our intent is to compare validity of usage of known benchmarks, we performed our own
experiments to benchmark our laboratory equipment. We also hope that this will give us better
understanding of results (Table 2).</p>
      <p>The numbers in table 2 in superscript denote following: 1 is for Windows test environment, 2 is for
Ubuntu test environment, 3 is for environment with enabled CUDA. If there are two numbers, for
instance, 1 and 3 – it means usage of Windows environment with CUDA enabled.</p>
      <p>During our experiments we figured that the hardware may behave differently from what we
generally expecting in such tasks. Firstly, we found out, that usage of some OS may give advantage, but
for deep neural networks it may be considered insignificant – around 1-2% for testing and training,
which are the most time-consuming tasks, so usage of OS is mostly the personal preference.</p>
      <p>However, when we tested two systems, we found out that the performance of the system, using the
laptop graphic card had shown us some problems: since the GPU in general are considered superior in
comparison to CPUs, we found out, that in this experiment, the GPU had significant disadvantage if
comparing to desktop processor. Since we were hoping that it needs some clarification. To do so, we
decided to make a series of extra experiments, that will be able to explain, why the graphics card does
not work as we expected, according to general performance of such devices in deep learning tasks.</p>
    </sec>
    <sec id="sec-8">
      <title>4. The problem with graphics card performance and resolution</title>
      <p>This section discusses the problem of finding bottlenecks and situations when the GPUs are not an
appropriate device to perform a deep learning task and hence discover an area of maximal efficiency
within memory and calculation range to get the desirable level of GPU efficiency.
4.1.</p>
    </sec>
    <sec id="sec-9">
      <title>Solving the Problem: Libraries, Devices, and GPU Experiments</title>
      <p>According to previous experiments, we figured out that we had some problems using a laptop
graphics card (RTX 2060 mobile) on our custom dataset. Since we are unable to track which was a
cause of the problem, we decided to perform a controlled experiment, excluding some conditions, that
may affect our experiment. In order to do so, we decided to use different libraries for neural networks
instead of we used to (TensorFlow with CUDA), to use a desktop system based on previously tested
AMD Ryzen 3600X and desktop GPU AMD Radeon RX6500XT to leverage our tasks. We also decided
to rely upon the publicly available datasets in order to not be obscured by some results we got.
However, for experimental purposes, we may try to use this data for other tasks.</p>
      <p>Since, here we were using a TensorFlow and PyTorch with its adapter (backend) Direct ML, which
allows us to use it with our Radeon RX6500XT GPUs. While it may not be a fair comparison against
Nvidia RTX 2060 Mobile, our goal is to find some possible caveats or losses of performance as we
faced with our GPU, which would likely occur with other GPU in same circumstances. So here we
would be testing GPU ability against data, not GPU vs GPU.</p>
      <p>In order to estimate the capabilities of our test system we will be used to test fixed number of neural
network architectures, as shallow networks, deep autoencoders, convolution and recurrent convolution
networks, as well as general adversarial networks. This would allow us to test hardware against different
architectures one by one and estimate the efficiency level for these devices and to find an effective
range of usage for a CPU or GPU we are currently using in the experiments below.</p>
      <p>We also proposed an experiment, which may, in theory, allow as to find the performance curve of a
graphic card, in comparison to a CPU we are currently using. The suggested approach is quite simple
and straightforward – to simulate the linear dimensionality reduction by exploiting the lambda functions
into an optimization procedure of an autoencoder [22]. We hope that this simple task would help us to
answer the key problems we faced in previous experiments.
4.2.</p>
    </sec>
    <sec id="sec-10">
      <title>Experiments with deep networks and tiny datasets</title>
      <p>We tested various models including normal networks, convnets, and autoencoders on tiny datasets
(fashion M.N.I.S.T. [23] and M.N.I.S.T. numbers [24]) but found that the graphics cards were not useful
in this scenario. Due to the low graphics card performance, we decided to study autoencoders which
are more flexible than other models. Overall, our findings give us a hint, that use of graphics cards in
the analysis of these datasets may not be as effective, as we thought before, and that other approaches,
such as exploring more sophisticated models, may gain additional performance we are targeting here.
4.3.</p>
    </sec>
    <sec id="sec-11">
      <title>Experiments with autoencoders and the different lambda functions</title>
      <p>The first idea, which makes autoencoders more flexible, is the possibility of forward and reverse
transformations. Since we can change the transformation and create the best transformations (actually
an infinite number), we can also send a lambda function. For example, this function describes the
property of weights, a target function, etc. If this function is changed, it is possible to increase the overall
efficiency and to a few realizations of the same model, which may use less resources of the graphic card
(memory, cores, etc.).</p>
      <p>We propose to use these lambda functions, which do this: calculate the level of orthogonality of the
weights; calculate the norm of the matrices; calculate the independence of weights. If we use all these
functions, we can build a linear autoencoder [25], which looks like the singular value decomposition.
But, if we do not use any, we create a non-linear autoencoder, which creates a transformation in a
reduced dimension. The performance with different functions demonstrates in Table 3.</p>
      <p>Here we have interesting results: the use of a non-linear (normal) autoencoder gives the difference
in performance, or a processor (here AMD Ryzen 5 3600X) is the most 3 times more efficient than a
graphics card (here AMD Radeon R.X. 6500 X.T.). But, on the contrary, when using more lambda
functions, this difference was inverse - the graphics card had more than five times more than a
processor. Also, we can see that before a while, the calculation times of a graphics card were practically
the same.</p>
      <p>This gives an idea, that there is a performance bottleneck, and therefore, if the card has passed this
bottleneck, the performance gets increased. Also, this gives the best explanation with our Nvidia RTX
2060 mobile card: why in the last experiment it does not give an advantage. To verify our hypotheses,
additional experiments are needed. Now we know that in simple jobs there is no need for a graphics
card. But, when we do more time-consuming job, the GPUs start giving the best results. Indeed, we
think these results could improve. But to think like that, you have to do what the experiments say: when
the cards are the most efficient and also if we can find another border, when the card can lower its level
of efficiency and thus, we can find an interval of effectiveness.
4.4.</p>
    </sec>
    <sec id="sec-12">
      <title>Experiment with convolution networks for human face recognition</title>
      <p>The experiment we're doing here, dedicated to human face recognition. Here, we use data available
online - FER2013 [26] and a convolution network that does the classification of face expressions.
Because here, we are not looking for precision, but performance, this work is a work of checking
devices. To verify our work, we use the AMD Radeon resource monitor [27] (Figure 1), and used the
Anaconda console that writes the necessary information (Figure 2).</p>
      <sec id="sec-12-1">
        <title>After we experimented, we figured some important results (Figure 3).</title>
        <p>According to this test, we found that the times are very important and with each iteration, the total
efficiency is increased. Also, according to the AMD monitor, approximately 1.2 GB of VRAM
resources have been used, which are the weights of our network and of the data. Besides, if we compare
epoch one and epoch two results, we find that it takes approximately 80 seconds to send data from RAM
memory to video memory. The time required for computations is comparable for small datasets, but
when the task becomes more complex, the performance may not be as prominent. During one iteration,
the performance difference between the graphics card and processor was approximately 4.5 to 1, and
for larger numbers of iterations, the difference could be as much as 20 to 1, demonstrating that graphics
cards also require significant computation time to process and load the data. However, as the number
of iterations increases, the overall time can also increase dramatically.</p>
        <p>Based on study of library calls in Jupyter console (in particular, library calls during the initialization
process as shown on Figure 2), one can explore the Direct ML interface to leverage tensors in AMD
architecture. In order to transmit data from and into AMD GPU, the adapter has to use DirectX 12
library to transform data using Direct ML and DirectX 12 library to transform tensors, which are not
directly compatible between the AMD and Nvidia, since they have different memory structure. Because
we address this problem, now we can prepare better the further experiments.</p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>Additional experiments: if there is not enough memory</title>
      <p>Several studies have reported that Recurrent Convolutional Neural Networks (RCNNs) and Deep
Convolutional Generative Adversarial Networks (DCGANs) are known to require a significant amount
of memory for training due to their complex architectures and large number of parameters. Therefore,
we conducted an experiment focusing on these memory-intensive networks, specifically RCNN
convolutional networks and generative adversarial networks. In order to get a consistent result, we use
the system with the same processor and video card, but another realization of the networks - the PyTorch
library with Direct ML. We utilized different techniques to process two datasets. To perform image
segmentation for the "Cityscapes Dataset", we employed the hidden network RCNN which effectively
segmented images of pedestrians. On the other hand, for image generation, we leveraged the
D.C.G.A.N. model.</p>
      <p>Indeed, in two experiments, it is found that there was not enough memory.</p>
      <p>To find the solution, we suggest using the graphics card and the processor at the same time. How it
works: the model was loaded into the video memory and system memory, we do the calculations (loss),
and little, we use the card to calculate the gradient. Although we would prefer to calculate the entire
network at once, we faced memory limitations, with the memory usage reaching approximately 3.8 GB
of video memory and 12.8 GB of system memory (shared). As you can see, processing such a large
memory chunk while utilizing all VRAM was not an easy task. (Figure 4).</p>
      <p>Another effect that we could not find: the graphics card, if we used it with a processor, improved
results by 2.74 times, if we would compare it with processor alone. This means that the card may be
useful when the work needs to be finished. But the problem with video memory bottlenecks is very big,
because huge chunks of memory have to be sent after each iteration or epoch. Also, as can be seen,
there are needs of fine tuning of network or data.</p>
    </sec>
    <sec id="sec-14">
      <title>5. Discussion</title>
      <p>During the experimental runs in different conditions, we figured some problems to discuss.
Foremost, we must reassure, that we use a proper devices and techniques to solve our tasks; either it
can be done using different tweaks, optimizations or other approaches, either related to data,
specifications of the test system as well as algorithms and their particular implementations.</p>
      <p>Using such an approach, we can be confident not only in our results, but also, we have proper
performance, regarding work time and energy efficiency.</p>
      <p>As we said in introductory chapters, it is very important to take into consideration which task has to
be solved – to train a network, or doing such efficiently. Second, but not least, scaling up and down the
tasks may make us discover that the used hardware answers our needs; sometimes for a complex tasks
one need very expensive workload, but in others, like “toy datasets” one can rely on a processor.</p>
      <p>We think that solution finding is a thoughtful process which may need to take in consideration many
factors. Firstly, one has to identify the problematic parts of the system, that may not work properly or
at least show a behavior, different from one should expect from. By using this knowledge, we can be
confident about the test system, as well as to predict behavior if it was scaled up or down.</p>
      <p>We see also, that scaling up and down does not give proportional gain and loss of performance. In
fact, this process is non-linear and we may expect the desirable performance in certain circumstances
and certain conditions. For instance, if we scale down the dimensions of the network, according to the
size of the input vector, we can find that some information may be lost, since not all the features would
be present, as of architecture of a certain size. Hence, it would be a good idea to follow the guidelines
related to the data and network, as it was tested by many researchers in the area.</p>
      <p>We also must consider the time needed to expect the results; since there are general rules of how
many iterations are needed to get a proper result for a specific piece of a hardware, we may expect that
this time may be higher, than we considered before.</p>
      <p>So as happens with relative fast tasks, that may have a purpose to establish the most appropriate
scenario and hyperparameters in series of tests.</p>
      <p>We also found very helpful to conduct an experiment in a controlled environment, where we can be
reassured that there are any factors (e.g., user applications), that may affect the experiment and hence
make the experiment incorrect. This environment must also allow one to identify the problems and to
fix them on the fly, optimizing for speed, accuracy or price, including hardware.</p>
      <p>In overall, we see the task of improving the hardware usage for a specific deep learning task some
sort of “optimization”, where we not just optimize an algorithm, but its behavior in a test environment,
in order to enhance the overall efficiency in complex. If we thoughtfully identify key parameters of our
setup, that affect efficiency, then we likely would use our setup in smart way, so it would answer needs
in term of performance of models but also in terms of resource optimization.</p>
    </sec>
    <sec id="sec-15">
      <title>6. Conclusion</title>
      <sec id="sec-15-1">
        <title>In overall the main results from the article would be summarized as follows.</title>
        <p>We have figured that one should not only rely upon a specific architecture, but study its good sides
and weaknesses, and based upon the study decide which of them is more appropriate for the task. We
found, for instance, that hardware, needed to use on complex computer vision tasks, like for instance in
RCNN is quite different from that is needed to train a network on a MNIST dataset.</p>
        <p>The process of tweaking is not quite straightforward, but rewarding in new knowledge about
possibilities of a certain hardware to perform on specific tasks, like, for instance, image recognition.
However, we also found some drawbacks of modern computer vision related neural networks, that
sometimes may have some disadvantages, if compared to a classic computer vision methods.</p>
        <p>Thus, to unleash all the power of these novel methods, one may need much more advanced setup or,
if that not possible, scale down the data, with its own known negative effects on accuracy.</p>
        <p>It is also worth noting that the competition between the different vendors is quite high; since, the
computational capabilities of certain devices may par with a similar one but from other vendors. We
also figured out, that sometimes the devices having similar scores in ideal conditions like computer
benchmarks and having the same score, may behave totally differently in other tasks; for instance, the
difference between the desktop and laptop processors of same architecture (Zen2) happened much
bigger than we may have expected; we believe, though, that desktop processors of a similar class and
performance may have shown much more consistent and much more predictable results.</p>
        <p>To really achieve the research goals, one need ideally a set of a different devices, dedicated to a task,
for instance processors for one tasks and graphic cards for another or even more, use them in
conjunction. Such an approach may guarantee us exploiting most positive characteristics of different
devices and decline the negative ones, balancing the efficiency and performance.</p>
        <p>We see that the researcher should be open minded to new ideas and try different methods and
laboratory equipment to achieve the research goals. This suggestion may be helpful in our future
experiments, dedicated to computer vision, if we decide expand the results and use other devices for
more complex tasks and to improve, as a result, our current results.
7. References
[20] Y. Wang, V.Lee, G.-V. Wei, D. Brooks, Predicting new workload or CPU performance by
analyzing public datasets, ACM transactions on architecture and code optimization, 15 4 (2019)
1–21. Doi:10.1145/3284127
[21] M. Ramchandani, et al., Survey: tensorflow in machine learning, Journal of physics:
conference series, 2273 1 (2022) 1-12. Doi:10.1088/1742-6596/2273/1/012008
[22] O. Fontenla-Romero, B.Pérez-Sánchez, B. Guijarro-Berdiñas, LANN-SVD: a non-iterative
SVD-based learning algorithm for one-layer neural networks, IEEE transactions on neural
networks and learning systems, 29 8 (2018) 3900–3905. Doi:10.1109/tnnls.2017.2738118
[23] A. S. Henrique, et al., Classifying Garments from Fashion-MNIST Dataset Through CNNs,
Advances in science, technology and engineering systems journal, 6 1 (2021) 989–994.</p>
        <p>Doi:10.25046/aj0601109
[24] A. Baldominos, Y.Saez, P. Isasi, A survey of handwritten character recognition with MNIST
and EMNIST, Applied sciences, 9 15 (2019) 3169-3185. Doi:10.3390/app9153169
[25] H. Bourlard, S. H. Kabil, Autoencoders reloaded, Biological cybernetics, 2022.</p>
        <p>Doi:10.1007/s00422-022-00937-6.
[26] P. Giannopoulos, I. Perikos, I. Hatzilygeroudis, Deep learning approaches for facial emotion
recognition: A case study on FER-2013, Advances in hybridization of intelligent methods.</p>
        <p>Cham, 85 (2018) 1-16. Doi:10.1007/978-3-319-66790-4_1
[27] J. Peddie, The GPU environment-software extensions and custom features, in: The history of
the GPU - eras and environment. Cham, 2022. P. 251-281. Doi:10.1007/978-3-031-13581-1_7</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Adedigba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Adeshina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Aibinu</surname>
          </string-name>
          ,
          <article-title>Performance evaluation of deep learning models on mammogram classification using small dataset</article-title>
          ,
          <source>Bioengineering, 9</source>
          <volume>4</volume>
          (
          <issue>2022</issue>
          )
          <fpage>161</fpage>
          -
          <lpage>181</lpage>
          . doi:
          <volume>10</volume>
          .3390/bioengineering9040161
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>McBride</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Persson</surname>
          </string-name>
          , E. Reichmanis,
          <string-name>
            <given-names>M.A.</given-names>
            <surname>Grover</surname>
          </string-name>
          ,
          <article-title>Solving materials' small data problem with dynamic experimental databases</article-title>
          ,
          <source>Processesб 6 7</source>
          (
          <issue>2018</issue>
          )
          <fpage>79</fpage>
          -
          <lpage>96</lpage>
          . doi:
          <volume>10</volume>
          .3390/pr6070079
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          , et al.,
          <article-title>Scene optimization of GPU-based back-projection algorithm</article-title>
          ,
          <source>The journal of supercomputing</source>
          ,
          <volume>79</volume>
          (
          <year>2022</year>
          )
          <fpage>4192</fpage>
          -
          <lpage>4214</lpage>
          . doi:
          <volume>10</volume>
          .1007/s11227-022-04785-w
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Nangla</surname>
          </string-name>
          ,
          <article-title>GPU Programming using NVIDIA CUDA</article-title>
          .
          <source>International journal for research in applied science and engineering technology, 6</source>
          <volume>6</volume>
          (
          <issue>2018</issue>
          )
          <fpage>79</fpage>
          -
          <lpage>84</lpage>
          . doi:
          <volume>10</volume>
          .22214/ijraset.
          <year>2018</year>
          .6016
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Raman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hoder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bisson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Branscombe</surname>
          </string-name>
          ,
          <string-name>
            <surname>Azure</surname>
            <given-names>AI</given-names>
          </string-name>
          <article-title>services at scale for cloud, mobile, and edge: building intelligent apps with azure cognitive services and machine learning</article-title>
          .
          <source>O'Reilly Media, Incorporated</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Krainiuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Goli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. R.</given-names>
            <surname>Pascuzzi</surname>
          </string-name>
          ,
          <article-title>OneAPI open-source math library interface</article-title>
          , in: International workshop on performance,
          <article-title>portability and productivity in HPC (P3HPC), St</article-title>
          . Louis,
          <string-name>
            <surname>MO</surname>
          </string-name>
          , USA, November
          <volume>14</volume>
          -19
          <year>2021</year>
          ,
          <year>2021</year>
          , pp.
          <fpage>22</fpage>
          -
          <lpage>32</lpage>
          . Doi:
          <volume>10</volume>
          .1109/p3hpc54578.
          <year>2021</year>
          .
          <volume>00006</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y. V.</given-names>
            <surname>Krak</surname>
          </string-name>
          ,
          <article-title>Dynamics of manipulation robots: Numerical-analytical method of formation and investigation of computational complexity</article-title>
          ,
          <source>Journal of Automation and Information Sciences, 31</source>
          <volume>3</volume>
          (
          <year>1999</year>
          )
          <fpage>121</fpage>
          -
          <lpage>128</lpage>
          . doi:
          <volume>10</volume>
          .1615/JAutomatInfScien.v31.
          <source>i1-3</source>
          .
          <fpage>170</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rhim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ha</surname>
          </string-name>
          ,
          <article-title>S3NAS: fast hardware-aware neural architecture search methodology, IEEE transactions on computer-aided design of integrated circuits</article-title>
          and
          <source>systems. 41</source>
          <volume>11</volume>
          (
          <year>2021</year>
          )
          <fpage>4826</fpage>
          -
          <lpage>4836</lpage>
          . Doi:
          <volume>10</volume>
          .1109/tcad.
          <year>2021</year>
          .3134843
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Anderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Alkabani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>El-Ghazawi</surname>
          </string-name>
          ,
          <article-title>Towards energy-quality scaling in deep neural networks</article-title>
          ,
          <source>IEEE design &amp; test, 38</source>
          <volume>4</volume>
          (
          <year>2021</year>
          )
          <fpage>27</fpage>
          -
          <lpage>36</lpage>
          . Doi:
          <volume>10</volume>
          .1109/mdat.
          <year>2019</year>
          .2952328
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>I. G.</given-names>
            <surname>Kryvonos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. V.</given-names>
            <surname>Krak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. V.</given-names>
            <surname>Barmak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Ternov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. O.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          ,
          <article-title>Information technology for the analysis of mimic expressions of human emotional states</article-title>
          ,
          <source>Cybernetics and Systems Analysis, 51</source>
          <volume>1</volume>
          (
          <year>2015</year>
          )
          <fpage>25</fpage>
          -
          <lpage>33</lpage>
          . doi:
          <volume>10</volume>
          .1007/s10559-015-9693-1
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y. V.</given-names>
            <surname>Krak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Barmak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Baraban</surname>
          </string-name>
          ,
          <article-title>Usage of NURBS-approximation for construction of spatial model of human face</article-title>
          ,
          <source>Journal of Automation and Information Sciences, 43</source>
          <volume>2</volume>
          (
          <year>2011</year>
          )
          <fpage>71</fpage>
          -
          <lpage>81</lpage>
          . doi:
          <volume>10</volume>
          .1615/JAutomatInfScien.v43.
          <year>i2</year>
          .
          <fpage>70</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>I. G.</given-names>
            <surname>Kryvonos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. V.</given-names>
            <surname>Krak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. V.</given-names>
            <surname>Barmak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. O.</given-names>
            <surname>Bagriy</surname>
          </string-name>
          ,
          <article-title>New tools of alternative communication for persons with verbal communication disorders</article-title>
          ,
          <source>Cybernetics and Systems Analysis, 52</source>
          <volume>5</volume>
          (
          <year>2016</year>
          )
          <fpage>665</fpage>
          -
          <lpage>673</lpage>
          . doi:
          <volume>10</volume>
          .1007/s10559-016-9869-3
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H. H.</given-names>
            <surname>Holm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Brodtkorb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Saetra</surname>
          </string-name>
          ,
          <article-title>Performance and energy efficiency of CUDA and opencl for GPU computing using python, Parallel computing: technology trends 36 (</article-title>
          <year>2020</year>
          )
          <fpage>593</fpage>
          -
          <lpage>604</lpage>
          . Doi:
          <volume>10</volume>
          .3233/apc200089.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>W.</given-names>
            <surname>Varghese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bermbach</surname>
          </string-name>
          , et al.,
          <article-title>A survey on edge performance benchmarking</article-title>
          .
          <source>ACM computing surveys, 54</source>
          <volume>3</volume>
          (
          <issue>2021</issue>
          )
          <fpage>1</fpage>
          -
          <lpage>33</lpage>
          . Doi:
          <volume>10</volume>
          .1145/3444692.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Thomson</surname>
          </string-name>
          ,
          <article-title>Modelling VM latent characteristics and predicting application performance using semi-supervised non-negative matrix factorization</article-title>
          ,
          <source>in: 2020 IEEE 13th international conference on cloud computing (CLOUD)</source>
          , Beijing,
          <fpage>19</fpage>
          -23
          <source>October</source>
          <year>2020</year>
          , (
          <year>2020</year>
          ), P.
          <fpage>470</fpage>
          -
          <lpage>474</lpage>
          . Doi:
          <volume>10</volume>
          .1109/cloud49709.
          <year>2020</year>
          .
          <volume>00069</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Fenget,
          <article-title>Parallel tensor decomposition with distributed memory based on hierarchical singular value decomposition Concurrency and computation: practice and experience,</article-title>
          <year>2021</year>
          , Doi:
          <volume>10</volume>
          .1002/cpe.6656
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Balamurali</surname>
          </string-name>
          ,
          <article-title>T-Distributed stochastic neighbor embedding</article-title>
          ,
          <source>Encyclopedia of mathematical geosciences. Cham</source>
          ,
          <year>2021</year>
          . Doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -26050-7_
          <fpage>446</fpage>
          -
          <lpage>1</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Alian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Al-Naymat</surname>
          </string-name>
          ,
          <article-title>Questions clustering using canopy-K-means and hierarchical-Kmeans clustering</article-title>
          ,
          <source>International journal of information technology</source>
          ,
          <volume>14</volume>
          (
          <year>2022</year>
          )
          <fpage>3793</fpage>
          -
          <lpage>3802</lpage>
          . Doi:
          <volume>10</volume>
          .1007/s41870-022-01012-w
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>D.</given-names>
            <surname>Suggs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subramony</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bouvier</surname>
          </string-name>
          ,
          <article-title>The AMD “zen 2” processor</article-title>
          ,
          <source>IEEE micro, 40</source>
          <volume>2</volume>
          (
          <year>2020</year>
          )
          <fpage>45</fpage>
          -
          <lpage>52</lpage>
          . Doi:
          <volume>10</volume>
          .1109/mm.
          <year>2020</year>
          .2974217
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>