<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Systematic Mapping Study on Use of Pre-Trained Open Machine Learning Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Riku Alho</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mikko Raatikainen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jukka K. Nurminen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lalli Myllyaho</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lucy Ellen Lwakatare</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science</institution>
          ,
          <addr-line>PL 68 (Pietari Kalmin katu 5)</addr-line>
          ,
          <institution>University of Helsinki</institution>
          ,
          <country country="FI">Finland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Accurate understanding of pre-trained open source machine learning models, and their frameworks, and datasets use can help software engineers simplify, reduce costs, and improve the quality of application development to diferent domains. This paper investigates how pre-trained Open Machine Learning (ML) models, and their frameworks, and datasets are shared and used in diferent domains. A systematic mapping study is used to identify published studies. Statistical and qualitative results are formed for 499 studies which provide suficient information regarding the use of open source pre-trained models, frameworks, and datasets. Based on a relatively large sample, the reviewed 499 studies provide a listing of Open ML models, frameworks, and datasets used in research as well as their relative popularity. The selected studies consisted of a large number of diferent domains, which saw benefits ranging from minor decline to moderate improvement when compared to the previously used state of the art machine learning methods. Most of the models in studies were used under the TensorFlow framework with ImageNet as the dataset. The majority of studies were made in laboratory environments. Pre-trained Open ML models show positive promise for improvement in machine learning. Additional diversity of available open source models pre-trained with diferent datasets would improve this efect. More comparable studies are needed, especially from the industry, that use and apply open source machine learning, which report their context, methodology, and performance comprehensively.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Machine Learning</kwd>
        <kwd>Open Source</kwd>
        <kwd>Reuse</kwd>
        <kwd>Systematic Literature Review</kwd>
        <kwd>Systematic Mapping Study</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>ware engineers, researchers, students, businesses, and
private enthusiasts to help reap the benefits of available
Although there are important diferences, machine learn- data without requiring them to invest work on
reinventing system developers can learn a lot from traditional ing the wheel [2].
software engineering [1]. Both begin their task by fa- The goal of this study is to assess the shared usage,
miliarizing themselves with the problem domain. Soft- adoptability, and evolvability of pre-trained open source
ware engineers explore existing and similar solutions, machine learning models in diferent application
dosoftware, and databases, whereas machine learning engi- mains. The study is carried out as a systematic mapping
neers explore available machine learning options, models, study [3], now an established research method in
comframeworks, and datasets for the problem domain. puter science to systematically collect an overview of</p>
      <p>Traditionally many technologies related to machine research state-of-the-art.
learning have been hidden behind technology indus- This review paper is structured as follows. First,
Sectry walls. Not until recently, we have seen a large and tion 2 introduces to the terminology and background
systematic introduction of multiple, new open source of this study. In Section 3, the research questions and
machine learning technologies: Such shared of-the- applied research method are presented. The results are
shelf pre-trained open source machine learning models, introduced and analysed in Section 4. The analysed
reframeworks, and datasets can provide competitive state sults and findings are discussed on the Section 5. Finally,
of the art capabilities in terms of performance, cost- Section 6 concludes the paper.
efectiveness, and adaptability in diferent application
domains when applied to new machine learning
problems. This may provide afordable new avenues for soft- 2. Background
To properly understand the terminology and background
for this study, we briefly describe the diferent terms and
ifelds related to it.
2.1. Basic concepts
of training or input data, which can be challenging to
acquire. Transfer learning can be used to mitigate these
challenges [9]. Using a pre-trained model taught with
large amounts of data that even slightly overlaps the
targeted domain may achieve comparable or even better
results than using only small datasets available to the
targeted domain in question.</p>
      <p>Specifically, in this paper, we use the term Open ML
model for pre-trained open-source machine learning
models that can be reused as such or after retraining as
components in other systems.</p>
      <sec id="sec-1-1">
        <title>Artificial Intelligence (AI) consists of all technical aspects</title>
        <p>that aim to get computers to imitate intelligent behaviour
observed in humans [4]. This includes machine learning,
natural language processing (NLP), language synthesis,
computer vision, robotics, sensor analysis, optimization,
and simulation.</p>
        <p>A subset of AI is Machine Learning (ML), which
consists of techniques that enable computers to change their
functionality based on given information (e.g., sensor
data), thus improving their behaviour to achieve the goal
[5]. ML techniques include decision trees, neural
networks, support vector machines, and many more. 2.3. Frameworks</p>
        <p>Neural Networks (NNs) are a part of ML. They are com- The majority of Open ML models are used inside
dedputer programs inspired by biological neural network icated frameworks, such as Cafe [ 10], Keras[11], Weka
processes [6]. These consist of perceptrons, convolu- [12], PyTorch[13], TensorFlow[14] or MatConvNet [15].
tional neural networks, recurrent neural networks, Boltz- Frameworks work as the interface between an ML model,
mann machines, deep neural networks, and many more. users, and hardware and can, thus, afect how hardware
Basic NNs with one to a few layers of neurons usually calculations and values are given to the model during
require user assistance in forming classification classes. training and use.</p>
        <p>Deep Neural Networks (DNNs) are under the NN category.</p>
        <p>They are neural networks, which consist of multiple
layers providing them the ability to form new classification 2.4. Datasets
classes. Most of-the-shelf open ML models ofered by diferent</p>
        <p>Machine learning can be categorized into supervised, frameworks are made available pre-trained under a
cerunsupervised learning, and reinforcement learning [7]. Su- tain dataset. There are many diferent datasets with
varypervised learning utilizes training data for classification ing scales ranging from entries of tens of thousands to
and regression. Unsupervised learning constructs pre- a billion. Datasets, such as ImageNet [16], Places365
dictions of classification based on the given input data. [17], CIFAR [18] and Pascal VOC [19], are ofered in
preReinforcement learning uses trial and error based on an trained open ML models. The benefit of pre-trained
modoracle, such as a repeatable simulation or a game, to find els is that training a model comprehensively on a large
the optimal outputs. dataset takes a very long time and a lot of computing</p>
        <p>Open source refers to a computer program for which resources and data. For instance, one study [20] reports
the source code is available to the general public for use that "it takes a Nvidia M40 GPU 14 days to finish just one
or modification from its original design [ 8]. Open source 90-epoch ResNet-50 training execution on the
ImageNetcode is a collaborative efort where programmers improve 1k dataset". Transfer learning makes it possible to train
upon the source code and share the changes within the useful models with just a fraction of the original
computcommunity. Code is released with a license specifying ing efort and with only a small amount of training data.
the conditions under which others may download, use, Of-the-shelf pre-trained machine learning models have
modify, and publish their versions to the community. gained academic interest, especially after the ImageNet
This view to open source is not restricted to any particular Large Scale Visual Recognition Challenge 2014 (ILSVRC
license. 2014) [16], which provided a noticeable leap in Open ML
model performance used for image classification.</p>
        <sec id="sec-1-1-1">
          <title>2.2. Open ML Models</title>
          <p>ML models are computer programs or components that
have formed statistical and mathematical insights from
data, such as a trained neural network. Although ML
models usually refer to something already trained, they
have also been used to refer to untrained, manually tuned,
or default-valued programs. Statistical and mathematical
insights can be formed with machine learning or manual
tuning.</p>
          <p>Training an efective ML model requires large amounts</p>
        </sec>
        <sec id="sec-1-1-2">
          <title>2.5. Raw and Tuned models</title>
          <p>Of-the-shelf open ML models have been used as
templates for modification and changes that may alter the
resource cost, performance, and accuracy of models in
the same task. In this paper, we use the term raw model
for a reused pre-trained ML model from an open-source
provider. We use the term tuned models for those
pretrained ML models that have had their parameters
manually tuned/changed, their structure modified, or that have</p>
        </sec>
        <sec id="sec-1-1-3">
          <title>2.6. Related Studies</title>
          <p>been amalgamated together for the same task. Amal- Table 1
gamations may consist of structural merging or even Electronic sources searched, and the numbers of papers found
majority voting between multiple models of the same and finally included.
type trained with diferent subsets of the same retraining
dataset.</p>
          <p>Electronic sources sNeuamrcbhe(rdoufphliictastpese)r
SpringerLink Journals 854 (76)
IEEE Xplore 233 (0)
Total 1087 (76)</p>
          <p>Number of selected results
per search
392
107
499</p>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>We are not familiar with other systematically conducted</title>
        <p>reviews directly related to shared Open ML usage. The • RQ3: What evidence is available on the
perforclosest related study was done by Nguyen et al. [21] in mance and evolvability of pre-trained Open ML
the form of an expertly opinioned and observed survey. model solutions?
It provides information regarding diferent statistically
popular Open ML models, frameworks, and hardware. In 3.3. Search strategy
addition, there are also lists of open-source ML libraries,
such as the one curated by The Institute for Ethical Ma- The automatic search was conducted by executing search
chine Learning, available at GitHub [22]. strings on search engines of the following digital libraries:</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Research approach</title>
      <p>The systematic mapping study as a form of a
systematic review is a well-defined research method to identify,
analyze, and synthesize all relevant studies regarding a
particular research question or topic area [23, 3]. The
systematic mapping study method was chosen for this paper
because it aims at a holistic, credible, and fair overview
of studies on shared pre-trained Open ML model usage.</p>
      <sec id="sec-2-1">
        <title>3.1. Protocol</title>
        <p>An important step when performing a systematic review
is the development of a protocol. The protocol specifies
all steps performed during the review, increasing its rigor
and reliability. The protocol was constructed following
the systematic review guidelines [24]. The protocol used
in this study was also inspired and adapted from the
procedure introduced by Mahdavi-Hezavehi et al. [25] in
their review.</p>
        <p>The procedures start with the research question
definition, search strategy identification, and search scope
selection. After that, study inclusion and exclusion
criteria were formed based on the research questions. An
empirical data extraction form was created based on the
research questions. The data collection was conducted
by filling out the data extraction form from the analyzed
studies found and included in searches.</p>
      </sec>
      <sec id="sec-2-2">
        <title>3.2. Research questions</title>
        <sec id="sec-2-2-1">
          <title>This study covers the following research questions:</title>
          <p>• RQ1: What solutions are used for shared
pretrained Open ML models?
• RQ2: How does research compare diferent open
ML models, datasets, and frameworks?
• IEEE Xplore: ("machine learning" OR "Deep
learning") AND ("pre-trained model" OR "pre-trained
models")
• SpringerLink: ("machine learning" OR "Deep
learning") AND ("pre-trained model" OR
"pretrained models")</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>The following study inclusion criteria were used for</title>
          <p>the inclusion of the papers:
• I1: The paper experiments with the usage of
pretrained ML models. Experiments are required to
collect information in order to analyse solutions,
adoptions, and evolvability.</p>
          <p>The following study exclusion criteria were used:
• E1: The paper does not feature the usage of
pretrained Open ML models. If the focus of a paper
was on other than Open ML models, the paper
was excluded.
• E2: The Open ML model used in the paper is
presented as a novel one. The model is not yet
shared if it is novel.
• E3: Paper is an editorial, technical report,
position paper, abstract, keynote, opinion, tutorial
summary, panel discussion, or a book chapter.
• E4: Paper is grey literature. Grey literature is
argued to be of lower quality than papers published
in journals and conferences as they usually are
not thoroughly peer-reviewed [26].</p>
        </sec>
        <sec id="sec-2-2-3">
          <title>The numbers of papers found and included during the</title>
          <p>search phase are shown in Table 1. The publication date
of searched papers was limited between 2013-2020. On
SpringerLink, the search was limited to journal articles
due to a hign number of conference papers (over 1900)
requiring identification. We decided to prefer journals
over conference papers for their more meticulous
peerreview process compared to the shorter review time of
conference papers. IEEE consisted of only 41 journal
articles in total, so we decided to include conference
papers in order to provide a more comprehensive sample.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>3.4. Data Extraction</title>
        <p>Data was extracted using the data extraction form
(Table 2). For the evidence levels (F10), the classification
system proposed by Alves et al. [27] was used consisting
of six levels:
• 1. No evidence.
• 2. Evidence obtained from demonstration or
working out toy examples.
• 3. Evidence obtained from expert opinions or
observations.
• 4. Evidence obtained from academic studies (e.g.,
controlled lab experiments).</p>
        <p>Task
Image
classification
Image
object detection
Text classification
Video
object detection
Video
classification
pNaupmerbse(r%o)f Task pNaupmerbse(r%o)f
395 (79,2%) Face recognition 13 (2,6%)</p>
        <p>Image translation 9
159 (31,9%) Text detection 7 (1,4%)</p>
        <p>Speech recognition 4 (0,8%)
48 (9,5%) Data mining 5 (1,0%)
14 (2,8%) Music generation 3 (0,6%)</p>
        <p>Texture categorization 1 (0,2%)
13 (2,6%) Human activity 1</p>
        <p>recognition
• 5. Evidence obtained from industrial studies (i.e.,
studies are done in industrial environments, e.g.,
causal case studies).
• 6. Evidence obtained from industrial application
(i.e., actual use of a method in industry)</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Results and analysis</title>
      <sec id="sec-3-1">
        <title>This section first gives an overview of the identified stud</title>
        <p>ies and extracted information. After that, the research
questions are answered by representing the extracted
data and summarizing the data as an answer to each
question.</p>
        <sec id="sec-3-1-1">
          <title>4.1. Results overview and demographics</title>
          <p>After performing the search and selection described
above in Section 3, we included 499 papers in the data
analysis.</p>
          <p>Number of papers
35 30 26 19 10 8 3 2 6 2 5 5 5 2 4 2 5 1 4 2 1 3 2 2 1 1 2 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 27 30 31 33 35 36 40 42 45 47 48 50 54 62 64 75 104 114 125 126 281 697 1641</p>
          <p>Citations</p>
          <p>The number of papers published each year between papers) of the studies. Text classification was addressed
January 2013 and October 2020 is shown in Figure 1. The by 9.5% of the studies. The rest of the domains had less
ifrst papers started to appear only in 2016, and the highest than 15 studies addressing them. Some studies addressed
number of studies was published in 2020. Figure 1 also more than one domain, so the total number of papers in
indicates incremental interest following the ILSVRC 2014 the table is more than the amount reviewed. In summary,
competition taking into account writing and publishing the domains related to images or videos are clearly the
delays of over a year. In particular, the increase in interest most prevalent.
has been exponential rather than linear over recent years.</p>
          <p>The number of papers at diferent evidence levels is 4.2. RQ1: Open ML models, datasets, and
shown in Table 3. Almost all papers (97.8%, i.e., 488 frameworks
papers) provided Level 4 evidence (academic studies) of
their findings. The few remaining papers present Level To answer this research question, the data of F6 (used
2 (Demonstration) or Level 5 (industrial study) evidence. models), F7 (used datasets), and F8 (used frameworks)
As this review focused on existing Open ML models, it is were analyzed from the data extraction form and
summaunsurprising that at least Level 4 is achieved. However, rized in what follows. Because some studies used more
there needs to be more practice-oriented studies at Levels than one model in their comparisons, the total numbers
5 and 6. All studies provide mostly academic or industrial- of papers are more than the amount reviewed.
level research, but most do not ofer enough comparative Table 6 presents pre-trained Open ML models used
evidence to adopt their used models. Only a few studies by the studies and the number of studies applying each
critically examined the potential influence of diferent pre-trained model. 149 diferent Open ML models were
actors, such as the researchers’ bias, sponsors, and the identified. The most popular pre-trained Open ML model,
quality of tests used to validate their study. i.e., VGG-16, is used by 168 (33.7%) studies, while the next</p>
          <p>Figure 2 shows the citation counts for studies. As can most popular AlexNet and ResNet-50 are included in 100
be seen, the lowest and highest citation counts are 0 and (20.2%) and 99 (19.8%) studies, respectively. The majority
1641, respectively. 464 (around 93.0%) have a citation of the models have less than six studies using them.
count in the range of 0–20, and 35 papers (7.0%) have Table 7 shows datasets used to train and test in the
high citation counts in the range of 21–126. A few sig- studies. A total of 49 diferent datasets were identified,
nificant outliers were S111 (review of deep learning for and in 114 studies, the framework was not specified.
Imatime series classification), S203 (new simple approach geNet is used by 60.7% of the studies (i.e., 303 papers). In
for batch normalization), and S31 (new edge detection contrast, the second most popular MS COCO and Google
algorithm), with citation counts of 281, 697, and 1641, News Word2Vec are included in a significantly smaller
respectively. number of studies, i.e., 19 studies each. The majority of</p>
          <p>The domains and tasks addressed by studies are shown the datasets have less than three studies using them. 114
in Tables 4 and 5. 79.2% of the studies (i.e., 395 papers) studies do not mention their datasets explicitly, and thus,
addressed Image classification, while the second most they could not be extracted.
popular class was Image object detection with 31.9% of Table 8 lists the frameworks used in the studies. We
studies (i.e., 159 papers). Biology was addressed by 24.0% identified in total 37 diferent frameworks, and the
frameof the studies. Medical got addressed by 19.4% (i.e., 97 work is not specified in 153 studies. 21.8% of the studies
(i.e., 109 studies) used the TensorFlow framework. The
second most often used is Keras, with 16.2% studies (i.e.,
81 studies). Over a half of the frameworks have less than
four studies using them. 30.7% of studies did not explicitly
mention their frameworks and were unresolvable.
4.2.1. Summary to RQ1
There is a large diversity in shared Open ML model usage
across studies, although VGG-16 stood out as the most
popular. Many diferent Open ML models have been used
in diferent studies. A large part of the variety appears
due to the multiple application domains addressed, as
seen in Table 4, and their requirements for models with
specialized application domain capabilities, such as word
relation recognition (Parsey McParseface, word2vec) and
music recognition (MidiNet, etc.). With frameworks and
especially datasets, we see less disparity between the
studies. Especially in the case of datasets, although there
is a larger number of diferent datasets than OpenML
models, ImageNet stands out as dominant, appearing in
303 (60.7%) studies, and most datasets appear in at most a
couple of studies. This lack of disparity may be due to the
lack of interest and usefulness regarding the less widely
appearing datasets or the significant time and work efort
required to train new models based on them.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>4.3. RQ2: Comparisons of open ML models, datasets, and frameworks</title>
          <p>In addition to the data analyzed in the above RQ1, the 4.3.1. Summary to RQ2
number of diferent Open ML models that were compared
within each study was analysed in order to form a con- The number of Open ML models used per study was
ceivable adoption preference based on the comparison. counted to see how well they are represented between
In Table 9, the number of Open ML models which are comparisons. Around six out of ten studies compared
explicitly compared by studies is listed. Many studies Open ML models to other Open ML models, which can
that do not compare Open ML models with other open be considered quite a large amount.
models, however, make a comparison to diferently li- Also, the amount of dataset and framework
comparcensed ones, such as copyrighted or proprietary models isons were counted, but they provided significantly less
and frameworks without openly available source code. instrumental results. Only one-tenth of studies compared
In total, 59.6% of studies compare their results to other the results of diferent datasets and about slightly over
Open ML models, and 35.8% compare to more than one one-tenth with diferent frameworks. Those that did
comOpen ML model. 40.4% of the papers do not compare pare datasets and frameworks were also mainly limited
their results to other Open ML models. to only comparing two. However, unlike Open ML
Mod</p>
          <p>Likewise, an analysis was carried out on how many els, datasets and frameworks have only a few dominant
diferent datasets were compared in each study. Table designs that are widely applied (cf. RQ1). It should also
10 lists the number of datasets that were compared in be taken into account that dataset and framework results
studies. 10.4% of the studies (i.e., 52 papers) compare are inaccurate because many studies do not explicitly
the use of other datasets, and 15 of them to more than mention what was used by name.
one. As also seen in Table 10, at least 66.1% of the papers The limited amount of contrastive studies did not
ofdo not compare their results to other datasets. Due to fer enough information for reliable results of
domainthe unresolved datasets used by 22.8% of the studies, an specific or general Open ML model adoptions. The scale
unspecified category was added to the table. of dataset and framework adoption is also unclear.
How</p>
          <p>Finally, Table 11 lists the number of frameworks com- ever, also scientific literature provides some evidence on
pared in studies. 15.0% of the studies (i.e., 75 studies) the popular adoption rates of certain Open ML models:
compare their results with other frameworks, and 13 of VGG-16 and AlexNet, frameworks such as TensorFlow,
them to more than one. As also visible in Table 11, at and datasets like ImageNet.
least 54.5% of the studies do not compare their results
with other frameworks. The results are not very accurate
due to the unresolved framework datasets used by nearly
a third (30.5%) of the studies. These were categorized as
unspecified in the table.</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>4.4. RQ3: Evidence available on the performance and evolvability of pre-trained Open ML model solutions</title>
          <p>tion approaches to evaluate the raw and tuned models.</p>
          <p>The studies could not directly be used to evaluate the
The model type used for performance measurement stud- evolvability of models, but gave positive promise for their
ies is shown in Table 12. 272 studies (53.8%) give results evolvability.
for raw model performance that only use transfer
learning. 112 studies (22.1%) provided results for models that
were ensembled, modified, or tuned by researchers. Tun- 5. Discussion
ing ranges from individual value changes to layer
replacement. 122 (24.1%) studies provided both raw and tuned This section summarizes and discusses the main findings,
performance results. The performance measurement re- limitations to the review, and threats to validity.
sults provided by the studies are not directly comparable
and thus not listed. 5.1. Main findings</p>
          <p>Table 13 shows the performance of pre-trained Open
ML Model solutions described in the studies when the
studies compare their results to the previous
state-ofthe-art solutions, such as NN, SVM, and constant feature
classification algorithms. Improvement in Table 13
consists of studies describing at least one of the Open ML
Models outperforming the state-of-the-art solutions. The
partial improvement consists of studies describing Open
ML Models being competitive and partially
outperforming in specific categories, such as lower computational
cost without much performance disadvantage compared
to others. Decline consists of studies where Open ML
models performed worse than other solutions. As seen
from Table 13 majority of studies, 74.1% (i.e., 370 papers),
provide minor to moderate improvement compared to
previous methods. 24 (4.8%) studies found a decline
compared to the state-of-the-art performance when using
Open ML models. 108 (21.0%) studies do not show
improvement, lack comparison, or have mixed results when
using Open ML models.</p>
          <p>The main goal of this study was to investigate how
pretrained Open ML models, frameworks, and datasets are
shared and used in diferent domains through a
systematic mapping study research method. Based on a
relatively large sample, the reviewed 499 studies provide a
listing of Open ML models, frameworks, and datasets
used in research as well as their relative popularity. The
studies consist of many diferent domains, which saw
benefits ranging from minor decline to moderate
improvement compared to previously used machine
learning methods. This indicated that pre-trained Open ML</p>
        </sec>
        <sec id="sec-3-1-4">
          <title>5.3. Reflection of review</title>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>To objectively evaluate a systematic review, Kitchenham et al. proposed four quality questions for systematic reviews [27]:</title>
        <p>Are inclusion and exclusion criteria described? It is
considered that this review meets this criterion as it explicitly</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
      <sec id="sec-4-1">
        <title>This work was partly funded by Business Finland under</title>
        <p>IML4E and IVVES of ITEA programme.
models and frameworks show positive promise for
improvement in machine learning. Most of the models in
the studies were used under the TensorFlow framework
with ImageNet as the pre-train dataset. Most studies
were academic, and only a few industrial studies were
identified. More industrial-level studies are required to
be reviewed in order to have more reliable and accurate
representations of Open ML model performance in the
real world.</p>
        <p>Suggestion for the future is to increase the coverage of
studies and modify the review inclusion criteria for study
extraction when assessing the usage of shared pre-trained
Open ML models. Another more conclusive option is to
prototype and create a constantly updated open curated
database of results for diferent Open ML models
running on diferent frameworks using diferent pre-train
datasets. These configurations would then be run and
tested on diferent domain datasets. The diferent
possible combinations and results could then be calculated on
a cloud platform, with the only requirement of having
to insert the new model, dataset, or framework addition
through a curated application form.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>the 47th International Conference on Parallel Pro-</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>cessing</surname>
          </string-name>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . [21]
          <string-name>
            <given-names>G.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dlugolinsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bobák</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>ifcial Intelligence Review</source>
          <volume>52</volume>
          (
          <year>2019</year>
          )
          <fpage>77</fpage>
          -
          <lpage>124</lpage>
          . [22]
          <article-title>Curated list of open source li-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>awesome-production-machine-learning/</article-title>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Online; accessed 13-04-2020]. [23]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kitchenham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Charters</surname>
          </string-name>
          , Guidelines for perform-
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>neering</surname>
          </string-name>
          ,
          <year>2007</year>
          . [24]
          <string-name>
            <given-names>S.</given-names>
            <surname>Keele</surname>
          </string-name>
          , et al.,
          <article-title>Guidelines for performing systematic</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>cal Report</source>
          ,
          <source>Technical report, Ver. 2</source>
          .3 EBSE Technical
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Report. EBSE</surname>
          </string-name>
          ,
          <year>2007</year>
          . [25]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mahdavi-Hezavehi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Galster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Avgeriou</surname>
          </string-name>
          , Vari-
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>mation and Software Technology</source>
          <volume>55</volume>
          (
          <year>2013</year>
          )
          <fpage>320</fpage>
          -
          <lpage>343</lpage>
          . [26]
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Kitchenham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Brereton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Turner</surname>
          </string-name>
          , M. K.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>ware Engineering</source>
          <volume>15</volume>
          (
          <year>2010</year>
          )
          <fpage>618</fpage>
          -
          <lpage>653</lpage>
          . [27]
          <string-name>
            <given-names>V.</given-names>
            <surname>Alves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Niu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Alves</surname>
          </string-name>
          , G. Valença, Requirements
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>Technology</source>
          <volume>52</volume>
          (
          <year>2010</year>
          )
          <fpage>806</fpage>
          -
          <lpage>820</lpage>
          . [28]
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Ioannidis</surname>
          </string-name>
          , Why most published research find-
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <article-title>ings are false</article-title>
          ,
          <source>PLos med 2</source>
          (
          <year>2005</year>
          )
          <article-title>e124</article-title>
          . [29]
          <string-name>
            <given-names>K.</given-names>
            <surname>Eykholt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Evtimov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Fernandes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Rah-
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>on Computer Vision and Pattern Recognition</source>
          ,
          <year>2018</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          pp.
          <fpage>1625</fpage>
          -
          <lpage>1634</lpage>
          . [30]
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanschoren</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. N. van Rijn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bischl</surname>
          </string-name>
          , L. Torgo,
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>SIGKDD Explorations</source>
          <volume>15</volume>
          (
          <year>2013</year>
          )
          <fpage>49</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>