<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Systematic and open exploration of FaaS and Serverless Computing research</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mohammed Al-Ameen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Josef Spillner</string-name>
          <email>josef.spillner@zhaw.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>University of Sharjah</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>OpenUAE research group</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>United Arab Emirates</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zurich University of Applied Sciences</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Engineering, Service Prototyping Lab (blog.zhaw.ch/splab)</institution>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <fpage>30</fpage>
      <lpage>35</lpage>
      <abstract>
        <p>the popularity of the term serverless computing, but also due to increasing discussions of appropriate archiResearch interest in Function-as-a-Service (FaaS) de- tectures and support services beyond FaaS, we decided velopment, execution and ecosystems is growing. Con- to name the dataset accordingly. sequently, an increasing body of literature focusing on This work is not a survey, but rather an enablement FaaS and cloud services is evolving. While the field is for future surveys and systematic literature reviews still young, we propose a community-maintained and (SLRs) with anticipated high quality, consistency and curated open dataset which uniquely and umambigu- comparability. Its extensible tree structure will not ously references relevant articles in order to derive com- be sufficient for all use cases, including graph analyparable bibliometric data and statistics. The dataset sis which would require directed graphs with annotasupports the generation of knowledge about the evolv- tions or ontology representations, but it does also not ing history, research trends and significance. This sur- preclude the production of such enhanced representavey enablement paper introduces the 60-article dataset, tions. Moreover, beside the literature view, it will allow explains the governance model and benefits, and shows for different views including one on FaaS technologies first insights derived by a literature analysis. We argue which will raise interest in derivative works with softthat along with accelerating technological trends, fresh ware and cloud engineers in industry. research method flavours assist in faster and more comprehensive knowledge exploration and dissemination.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Governance. Our dataset is published [Spi18] and id / DOI
semi-regularly or upon request updated with versioned → {title,author,journal,year}
Digital Object Identifier (DOI) in the Serverless Com- (automatically populated)
puting community at Zenodo, an open science tool for → {countries,institutions,...}
scholarly processes and research outputs in the form (manually annotated)
of digital artefacts. Any researcher can suggest a new ,→ {technologies,open source,...}
version and, as long as the changes are only additive (manually annotated)
or corrective in nature, the upload will be accepted by
the dataset maintainer. Any contributor qualifies as Most manual curation steps could potentially be
auco-maintainer in order to ensure the long-term distri- tomated or semi-automated by querying indexation
bution of maintenance tasks. The precise extent of gov- services and performing semantic NLP on the full-text
ernance remains unknown for now due to a lack of com- works. However, at the time of writing and due to
parable datasets, but upon being informed about this the still small number of works, a manual process has
model, for instance at ESSCA 2018, many researchers been chosen to keep the initial effort low and stimulate
in the field have signalled interest and support. early community involvement. Furthermore,
automation is non-trivial due to the need to disambiguate in
a context-aware way the pure mention of terms from
their detailed study.</p>
      <p>Population. To ensure quality publications with
comparable results, it is mandatory that
publications appear indexed in the DBLP computer
science bibliography in order to qualify. More- Representation. All files are represented as four
over, the publication must be found with a DBLP structured and extensible JSON files which are either
title keyword search for the terms serverless manually or in the case of the bibliographic
informaapplication, serverless computing, serverless in tion file automatically ordered through the
maintegeneral, function-as-a-service, lambda or cloud nance scripts. Automatic ordering eases maintenance
function, or despite absence of a title match evolve by suppressing diff noise, but is not always easy to
imclosely around these topics. Terms in risk of over- plement due to lexicographic versus numeric ordering
generalisation, such as serverless and even more so (e.g. 1, 10, 2). The files encompass around 1330 lines
lambda, similarly require brief title or even article read- containing 970 key-value assignments.
ing to decide on the eligibility for inclusion in a manual In order to gain insight into all works, the dataset
and potentially error-prone post-filtering step. contains the provision to store the PDF files of all
pub</p>
      <p>Further indexation services and keywords can be lications in a subfolder. We have assembled this
comagreed on by the community dataset maintainers as the panion dataset and will use it in this article exemplarily
technology evolves, not just to capture more works, but but for copyright and licencing reasons are not able to
also to subdivide sets of works into specialised subsets. distribute it publicly. The companion dataset combines
Similarly, manual overrides are possible when works 560 pages of research communication on FaaS-related
or entire collections of works are inadvertently miss- topics and has a cumulative size of 43 MB.
ing from or misrepresented in DBLP, which is known
to still occur sometimes despite best efforts to prevent Verification. In order to ensure a high quality and
mistakes [RH11]. validity of the dataset, we run the included consistency</p>
      <p>The dataset is populated in a structured way, start- checking scripts and we manually verify the
completeing with manually assigned unique and consecutively ness with external samples. Concretely, we perform a
increasing identifiers and associated unique DOIs, if cross-check with the previous proceedings of the
Interavailable, captured in a first file. A script then fetches national Workshop on Serverless Computing (WoSC)
bibliographic details and amends the metadata in a sec- in conjunction with a DBLP countercheck and with
ond file. For preprints without assigned DOI, a manual Google Scholar to detect whether all relevant
publiaddition is possible. Further metadata is added man- cations have been included in the dataset. As a
reually to two additional files, one matching the publi- sult, we found an additional paper not matching any
cations structure and one orthogonally capturing tech- search terms, one more matching lambda which slipped
nology aspects. The resulting data structure with its through the manual post-filtering, as well as three more
metadata attributes appears as follows: papers which were not originally available on DBLP
but at cross-check time they had already been indexed.</p>
      <p>All five papers are already included with the dataset.</p>
      <p>More importantly, we found proceedings of one of the
WoSC editions which are distributed via the ACM
Digital Library but, for unknown reasons, not yet available
in DBLP. We have marked these papers as potential
addition. Finally, a trivial search on Google Scholar
revealed no additional articles. In summary, our
approach to find credible quality articles about FaaS
systematically is working but assumes a timely indexing
into DBLP and still causes occasional omissions.
occur individually or in single pairings. A Venn
diagram showing the matching keyword relations is shown
in Fig. 1.</p>
      <p>The ratio of academic to pure industrial to mixed
academic-industrial research is 36 : 7 : 17, with 87%
Exploitation. By not only sharing bibliometric and of works involving academic institutions. On average,
content data, but also scripts to produce associated fig- each publication involves authors from 1.8 institutions,
ures, our reusable dataset leads to standardised visuals with 68 institutions (subsuming all sub-units such as
which allow for comparison across published works. All different research groups or labs) being involved in
tofigures in this article have been produced by the scripts tal. Fig. 2 contains the visual overview about the
contained in the dataset package with some manual ad- institution types.
ditions based on generated numbers and we suggest
future publications on the same topic do the same.</p>
      <p>Among possible exploitation routes are state-of-the-art Figure 2: Overlap of type of author institution
sections in research proposals and papers, as well as
detailed surveys and systematic literature reviews.
4
Key Metrics. Covering the years 2016, 2017 and
most of 2018, the serverless literature dataset contains
a total of 60 articles. Of those, 45 have a DOI assigned
but 15 have not. The publications per year are shown
in Table 1. The growth is remarkable; while focused
researchers have been able to maintain an unassisted
overview for the first two years, a systematic collection The most active countries in absolute terms of
inhas become indispensable for any further holistic view stitutions publishing are the US (26), Switzerland (7)
on the field. and Canada (4) followed by Germany, Spain,
Colom</p>
      <p>The most successful and growing search term is bia and Austria (all 3). In total, research on FaaS is
serverless with 27 occurrences of which two thirds documented to happen in 21 countries across six
contiare complemented with computing. On two occasions, nents. Fig. 3 gives a geographical overview about the
the corresponding title also mentions faas which also countries with publishing activity.
appears in four other titles. All other search terms only Finally, the selection of publication paths shows
some interesting characteristics in Table 2. There is
a mix of publishing through professional societies (P;
31 or 52%), open proceedings via arXiv or USENIX (O;
17 or 28%) and commercial publishers (C; 5 or 8%).
5</p>
    </sec>
    <sec id="sec-2">
      <title>Content Analysis</title>
      <p>It is evident that terms such as function(s) (2936
times), serverless (1517 times), Lambda (1081 times),
time (853 times) and FaaS (409 times) appear very
often, but so do stopwords such as can or use which
at the time being are not yet filtered automatically
and will also be subject to increased automation in
future versions of the curation scripts. Nevertheless,
the word cloud reveals researcher concerns about
specific technologies and characteristics, including also
considerations of data, messages, containers, execution
and requests. For example, on the first half of
documents, Lambda (328 times) occurred more often than
serverless (310 times), signalling a decline in
prominence. Nevertheless, in terms of concrete
implementations or services mentioned, Lambda still leads ahead of
Functions (402 times), which in the capitalised form
presumably refers to Google, Microsoft and IBM
offerings, OpenWhisk (285 times) and OpenLambda (88
times).
This analysis builds on the private companion dataset
containing all PDF representations of the articles. A
script then converts these PDFs into text files and
concatenates the output into one large file (2.2 MB) which
can be uploaded to a word cloud service to turn the list
into sets with specified cardinality of occurrence. The
set of words with 50 occurrences or more is contained
in the dataset to allow for tracking of trends,
encompassing 693 top words. Of interest is that unambiguous
subject-specific words such as cold start/coldstart,
handler(s) and stateless do appear in the top words 6 Technology Analysis
list but with less than 200 occurrences each, i.e. below
the top 20% of the list. In the dataset, we aggregated information about the</p>
      <p>The word cloud services furthermore produce a vi- technologies prominently referenced in the studies and
sual representation of the term frequency. Fig. 4 experiments. In total, 22 different technologies
inshows an exemplary unfiltered result over all top words. cluding FaaS runtimes, tools and commercial services
abd iskh isn sn sonn aadbm faunS litFn rk SO tsen iired SKCO ionZ lreen skaD tyPooep ltrssLae Laad sk lcae rrke
LaS$AmW caeenhppAOW lltcFooooeunugd$CG iifttrrszccFoooeunuAM ilItcFouud$BCM LenpO lrrrssFvoaeeeeSwm lrrssveeeS reeubK lzPo ilrckoooS5M ltrrrssveeeS iltccFoaaaegG abm ttaehub0$AW eESdg Iroon$W</p>
      <p>G
$ $</p>
      <p>LaabdS$AmW isckaeehnhppAOW illtscFooooeunnugd$CG ilItscFoounnud$BCM iifttrrsszccFoooeunnu$AM LaaendbpOm faunS skaD ionZ lttrrrssyvPooeeeepS lilrrckoooeenS5M SKCO trseeenubK Irrrkooen$W llittrssccLFoaaaaeeg$GG lrrssveeeSSO lcaeeESdg ttskaehub0$AW iltFn rkoeu$H liitcaaeng$DO scckaaep$R iftehnSpO lrFyoonuuddC
system/framework under observation
runtime system under observation
could be identified of which 13 are available under open
source and free software licences. Among them, AWS vey conducted in 2017/2018 [LWSH18]. The brighter
Lambda (23), OpenWhisk (8), Google Cloud Functions bars correspond to all participants, including
experi(7), Azure Functions and IBM Cloud Functions (both enced and prospective serverless developers, whereas
5) are the technologies most reported on, followed by the darker bars only include developers who have used
a long tail of others which raise less interest with re- serverless offerings in the past; the differences between
searchers. both are insignificant. AWS Lambda and Microsoft</p>
      <p>Fig. 5 shows their distribution across the covered Azure Function dominate in production and hence are
years. Interestingly, one of the early works by Lynn et currently about 6% and 7% underreported on,
respecal. states that AWS is by far the dominating research tively. In contrast, Apache OpenWhisk received a lot
platform [LRLE17]. While AWS still dominates, the of attention but is not used a lot in practice, leading
field is now more diverse, although many of the new to around 12% overreporting. However, the validity of
contenders are research prototypes not offered for com- these numbers may be limited by a number of factors,
mercial service. All commercial offerings are marked including the recent rebranding of IBM Bluemix
Openwith $ in the figure, and Lambda now accounts for Whisk to IBM Cloud Functions, and the inclusion of
slightly more than half of their coverage. Hence, Lynn’s general cloud providers with strong PaaS but no
dedistatement is still correct for research on public FaaS cated FaaS offering such as Heroku and Digital Ocean.
when considering a relative majority. Still, we believe that it is valuable to continue tracking
the (mis)match over time.</p>
      <p>New FaaS technologies appear with high frequency
and often with widely disseminated public announce- In contrast to the literature dataset, the survey is
ments. This includes recent additions such as KNative a one-time snapshot with a decreasing applicability to
and Qinling in 2018. This raises the question of rele- the evolving field of serverless computing and
applicavance for researchers: Apart from clearly technology- tions. Therefore, to maintain the insights into the
misindependent research, sometimes technical aspects re- match, a recurring mixed-methods study or at least a
quire the focus on one particular implementation. recurring survey regarding the technologies will have to
Which one to choose, then? be conducted in the future, while some metrics can be</p>
      <p>In Fig. 6 we show how the current focus on technolo- derived from recurring industry surveys such as the one
gies in research publications on serverless computing from CNCF which in its recent edition mentions among
and FaaS-related topics mismatches the apparent needs the installable platforms Kubeless with 42%,
Openof developers, based on a systematic developer sur- Whisk with 25% and OpenFaaS with 20% [Bar18].
We have assembled an evolvable dataset to track the
publicly available research communications on
FaaSrelated topics [Spi18]. The numerical and visual
analysis of this dataset, assisted by associated scripts, gives
insight into research actors, topics, trends and mis- [Spi18]
matches. We invite all experts on FaaS topics to
collaboratively maintain future revisions of the dataset
which will serve as substantial foundation for future
surveys and comparison articles. By increasingly
applying data analytics methods, we expect to gain more
insight over time as the dataset increases, including
a mapping of technology popularity over a multi-year
timeframe. Moreover, we envision value-added services
exploiting the dataset such as a FaaS solution
recommender service to appear as prototype or even as
commercial solution on the cloud market.
[Bar18]</p>
      <p>Florian Reitz and Oliver Hoffmann. Did
they notice? - A case-study on the
community contribution to data quality in DBLP.</p>
      <p>In Research and Advanced Technology for</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Digital</surname>
          </string-name>
          Libraries - International
          <source>Conference on Theory and Practice of Digital Libraries, TPDL</source>
          <year>2011</year>
          , Berlin, Germany,
          <source>September 26-28</source>
          ,
          <year>2011</year>
          . Proceedings, pages
          <fpage>204</fpage>
          -
          <lpage>215</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Josef</given-names>
            <surname>Spillner</surname>
          </string-name>
          . Serverless Literature Dataset.
          <source>Zenodo</source>
          (
          <volume>10</volume>
          .5281/zenodo.1175423),
          <year>February 2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [VGO+16]
          <string-name>
            <surname>Mario</surname>
            <given-names>Villamizar</given-names>
          </string-name>
          , Oscar Garces, Lina Ochoa, Harold E. Castro, Lorena Salamanca, Mauricio Verano, Rubby Casallas, Santiago Gil, Carlos Valencia, Angee Zambrano, and
          <string-name>
            <given-names>Mery</given-names>
            <surname>Lang</surname>
          </string-name>
          .
          <article-title>Infrastructure Cost Comparison of Running Web Applications in the Cloud Using AWS Lambda and Monolithic and Microservice Architectures</article-title>
          .
          <source>In IEEE/ACM 16th International Symposium on Cluster, Cloud and Grid Computing</source>
          ,
          <source>CCGrid</source>
          <year>2016</year>
          , Cartagena, Colombia, May
          <volume>16</volume>
          -19,
          <year>2016</year>
          , pages
          <fpage>179</fpage>
          -
          <lpage>182</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Kaitlyn</given-names>
            <surname>Barnard</surname>
          </string-name>
          .
          <source>CNCF Survey: Use of Cloud Native Technologies in Production Has Grown Over</source>
          <volume>200</volume>
          %. online: https://www.cncf.io/blog/2018/08/ 29/cncf-survey
          <article-title>-use-of-cloud-</article-title>
          <string-name>
            <surname>native</surname>
          </string-name>
          \
          <article-title>-technologies-in-production-has-</article-title>
          <string-name>
            <surname>grown</surname>
          </string-name>
          \ -over
          <string-name>
            <surname>-</surname>
          </string-name>
          200-percent/,
          <year>August 2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [LRLE17]
          <string-name>
            <given-names>Theo</given-names>
            <surname>Lynn</surname>
          </string-name>
          , Pierangelo Rosati, Arnaud Lejeune, and
          <string-name>
            <surname>Vincent</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Emeakaroha</surname>
          </string-name>
          .
          <article-title>A preliminary review of enterprise serverless cloud computing (function-as-a-service) platforms</article-title>
          .
          <source>In IEEE International Conference on Cloud Computing Technology and Science, CloudCom</source>
          <year>2017</year>
          ,
          <string-name>
            <given-names>Hong</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <source>December 11-14</source>
          ,
          <year>2017</year>
          , pages
          <fpage>162</fpage>
          -
          <lpage>169</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [LWSH18]
          <string-name>
            <given-names>Philipp</given-names>
            <surname>Leitner</surname>
          </string-name>
          , Erik Wittern, Josef Spillner, and
          <string-name>
            <given-names>Waldemar</given-names>
            <surname>Hummer</surname>
          </string-name>
          .
          <article-title>A mixedmethod empirical study of Function-as-aService software development in industrial practice</article-title>
          .
          <source>PeerJ Preprints</source>
          <volume>6</volume>
          :e27005v1 (https://doi.org/10.7287/peerj.
          <source>preprints.27005v1)</source>
          ,
          <year>June 2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>[RH11]</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>