<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>mary: A domain specific-model and DevOps approach for big data analytics architectures</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Camilo Castellanos</string-name>
          <email>cc.castellanos87@uniandes.edu.co</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlos A. Varela</string-name>
          <email>cvarela@cs.rpi.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dario Correal</string-name>
          <email>dcorreal@uniandes.edu.co</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science Rensselaer Polytechnic Institute</institution>
          ,
          <addr-line>Troy, NY</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ECSA'21: European Conference on Software Architecture</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>System Engineering and Computing Department, Universidad de Los Andes</institution>
          ,
          <addr-line>Bogota</addr-line>
          ,
          <country country="CO">Colombia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <abstract>
        <p>Big data analytics (BDA) applications use machine learning algorithms to extract insights from large, fast, and heterogeneous data sources. Software engineering challenges for BDA applications include ensuring performance levels of data-driven algorithms even in the presence of large data volume, velocity, and variety (3Vs). BDA software complexity frequently leads to delayed deployments and challenging performance assessments. This paper1 proposes a domain-specific modeling (DSM) and DevOps practices to design, deploy, and monitor performance metrics for BDA architectures. Our proposal includes a design process and framework to define architectural inputs, functional, and deployment views via integrated high-level abstractions to monitor the achievement of quality scenarios. We evaluate our approach with four use cases from diferent domains. Our results show shorter deployment and monitoring times and a higher gain factor per iteration than similar approaches.</p>
      </abstract>
      <kwd-group>
        <kwd>ios</kwd>
        <kwd>performance monitoring</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Big data analytics (BDA) applications use Machine Learning (ML) algorithms to extract valuable
insights from large, fast, and heterogeneous data. These BDA applications require complex
software design, development, and deployment to deal with big data characteristics: volume,
variety, and velocity (3Vs), to maintain expected performance levels. However, the complexity
involved in BDA application development frequently leads to delayed deployments [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and
hinders performance monitoring (e.g., throughput or latency) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Besides, heavy workloads
imply batch processing over big data, demand high scalability and fault tolerance for achieving
deadlines. A software architecture specifies the system’s structures and their relationships to
achieve expected quality properties. The BDA solutions development exhibits a high cost and
error-prone transition between development and production environments given the lack of
techniques and tools to enable articulation and integration [
        <xref ref-type="bibr" rid="ref2 ref5">2, 5</xref>
        ]. Despite the growing interest
in big data adoption, real deployments are still scarce (“Deployment Gap” phenomenon) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        1Use the original publication when citing this work[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
nEvelop-O
(D. Correal)
      </p>
      <p>
        We propose ACCORDANT (An exeCutable arChitecture mOdel foR big Data ANalyTics), a
DevOps and DSM approach to develop, deploy, and monitor BDA solutions bridging the gap
between analytics and IT domains. This paper highlights the contributions of this proposal
presented in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. ACCORDANT allows designing BDA applications driven by quality
scenarios (QS), functional, and deployment views. A QS specifies a quality attribute requirement
for a software artifact to support the design and quality assessment. The functional view defines
the architectural elements that deliver the application’s functionality. The deployment view
describes how software is assigned to technology infrastructure. Our deployment strategy
incorporates containers to promote portability and continuous deployment.
      </p>
      <p>ACCORDANT is validated with four use cases from diferent domains by designing functional
and deployment models and assessing performance QS. This validation aims to reduce the time
of design, deployment, and QS monitoring of BDA solutions. In summary, the contributions of
this paper are: i) A DSM framework to formalize and accelerate the development and deployment
of BDA solutions by specifying functional iteratively and deployment views aligned to QS.
ii) Three integrated DSLs to specify architectural inputs, component-connector models, and
deployments. iii) A containerization approach to promote automation delivery and performance
metrics monitoring aligned to QS. iv) The evaluation of this proposal with four use cases from
diverse domains and using diferent deployment strategies and QS.</p>
      <p>The rest of this paper is organized as follows. Section 2 presents our methodology and
proposal overview. Section 3 presents evaluation and discusses the results. Finally, Section 4
summarizes the conclusions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. ACCORDANT: A DevOps and Domain Specific Model</title>
    </sec>
    <sec id="sec-3">
      <title>Approach for BDA</title>
      <p>Our proposal comprises a design and deployment method and a DSM framework. This proposal
includes architectural inputs, containerization, and serverless deployments. Fig. 1 depicts the
ACCORDANT’s process. The steps performed using the ACCORDANT modeling framework
are framed in solid lines, while the steps made with external tools are represented by dotted lines.
The ACCORDANT process is iterative and composed of seven steps: Definition of 1.1) business
goals and 1.2) QS. 2) The data scientist develops data transformations and analytics models
and exports them as PMML files (Predictive Model Markup Language). 3) The IT architect
designs the software architecture using ACCORDANT Metamodel in Functional Viewpoint
(FV) and Deployment Viewpoint (DV). FV model makes use of PMML models to specify the
software behavior. 4) FV and DV models are interweaved to obtain an integrated model. 5) Code
generation is performed from integrated models. 6) The generated code is executed to provision
infrastructure and install the software. 7) QS are monitored in operation to be validated.</p>
      <p>
        Architectural Inputs: The architecture design is driven by predefined quality scenarios (QS)
according to architecture design methods such as Attribute-Driven Design (ADD) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. These
QS are achieved through design decisions compiled in well-known catalogs of architectural
patterns and tactics. Both QS and tactics are inputs of the architecture design. Therefore we
include these initial building blocks in the ACCORDANT metamodel along with other concepts
defined in ADD. The main input building blocks are grouped by the architectural input package
(InputPackage) which contains the elements required to start the architectural design: Quality
Scenario (QScenario), Analyzed QS (AnalyzedQS), SentivityPoint and Tactic.
      </p>
      <p>
        The functional viewpoint (FV) specifies analytics pipelines, including ingestion,
preparation, analysis, and exporting components. FV describes functional requirements of the BDA
solution, and the constructs are described in a technology-neutral way. FV is expressed in a
component-connector structure. Sensitivity points as architectural inputs can be associated
to components and connectors to represent where architectural decisions regarding the QS.
Component metaclasses are specialized in Ingestor, Transformer, Estimators and Sink. Estimator
and Transformer are software component realizations of PMML models and data transformer
respectively, and the PMML file defines their behavior. A Component exposes required and
provided Port. Connectors metaclasses transfer data or control flow among components through
their Roles. The Connector types are defined based on the classification proposed by Taylor et al.
in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]: Procedure Call, Event, Stream, Adaptor, Distributor, and Arbitrator.
      </p>
      <p>The deployment viewpoint (DV) integrates DevOps practices encompassing
containerization and infrastructure as code (IaC). The DV specifies how software artifacts (components
and connectors) are deployed on a set of computating nodes. DV metamodel comprises Pod,
ExposedPort, and Deployment metaclasses to operationalize BDA applications in a specific
technology. It is noteworthy that a FV model can be deployed in diferent DV models either to use
a diferent strategy or to test the fulfillment of predefined QScenarios. DV contains Devices,
Services, Deployments, serverless environments (ServerlessEnv), and Artifacts.</p>
      <p>Once PMML, FV, and DV models are designed and integrated, code generation takes place by
means of model-to-text transformations. Code generation is twofold: software and infrastructure
(IaC) code. In the last step, the performance metrics of the BDA application are gathered to
evaluate them against QS. This process can take several iterations, and this is the whole cycle
that we expect to accelerate and using ACCORDANT.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Evaluation</title>
      <p>
        Our experimentation aims to compare development and deployment time per iteration using
accordant and other two frameworks: FastScore and SpringXD. We chose these frameworks
because they are the closest to our approach regarding the related work detailed in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and they
support portable analytics models. We validated our proposal through four use cases: UC1)
Transport delay prediction, UC2) Near mid-air collision detection, UC3) Near mid-air collision
risk analysis, and UC4) El Nino/Southern Oscillation cycles.
      </p>
      <p>To compare ACCORDANT, SpringXD, and FastScore, we measured the time invested in
development and deployment phases per use case. These phases are performed iteratively since
improvements and refinements are made in each iteration until QS is achieved. Therefore, we
measure the time invested in each iteration. Also, we calculated the gain factor  (,  ) , as
a metric to estimate the cumulative average of time reduction ratio for a use case uc, using
framework f. We define the gain factor as a form to measure the incremental improvement of
using high-level abstractions to evolve an application.</p>
      <p>To design, develop, and deploy the four use cases, we followed the ACCORDANT process
detailed previously in Figure 1. The development time using ACCORDANT is higher (between
23% and 47%) compared to SpringXD and Fastscore, but the deployment time is lower (between
50% and 81%). The higher development time can be explained by the efort required to specify
ACCORDANT models. The highest time diferences arise from UC2, which is the most complex
pipeline. The high-level reuse of previous architectural decisions reduced development time, as
shown by the decrease between use cases and the growing gain factor among iterations. These
results suggest that ACCORDANT is most suitable for applications with multiple iterations, or
in subsequent applications, reusing architectural models reduces development times.</p>
      <p>ACCORDANT’s gain factor was higher for all use cases in the development phase, which
suggests that the high-level abstractions promote the reduction of development time among
consecutive iterations. The highest gain factor was 0.46 in the UC3, reducing in 46% the
development time between iterations. The greatest gain factor diference over the other approaches was
0.13 in the UC3. Regarding the deployment gain factor (Fig. 2b), ACCORDANT also exhibited
the highest gain factor, on a higher proportion, up to 0.75 in UC4. Each deployment iteration
reduces the time by 75% compared to the previous one. The gain factor in the deployment phase
is greater in ACCORDANT because the IaC generation is not ofered in other approaches.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusions</title>
      <p>The results have shown that ACCORDANT can accelerate iterative development and deployment
phases. The greatest time reduction was reported in the deployment phase, achieving up to 81%
compared to other approaches. In contrast, the development times ofered by ACCORDANT
were greater. Despite the longer development time, deployment time is significantly reduced
by using QS, FV, and DV alignment. ACCORDANT’s gain factor was higher, which implies
a higher reduction time in each iteration. In contrast, some limitations have emerged from
experimentation. The development phase is slower than the other approaches due to the current
ACCORDANT’s prototype requires additional manual coding. ACCORDANT requires detailed
design initially, which is then rewarded in consecutive iterations. So, ACCORDANT is most
suitable for applications with multiple iterations. Finally, our approach exploits the reuse of
architectural decisions and models. Hence, first-time or one-time applications may not be
benefited from our proposal.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Castellanos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Varela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Correal</surname>
          </string-name>
          ,
          <article-title>ACCORDANT: A domain specific-model and DevOps approach for big data analytics architectures</article-title>
          ,
          <source>Journal of Systems and Software</source>
          <volume>172</volume>
          (
          <year>2021</year>
          )
          <article-title>110869</article-title>
          . doi:h t t p s : / / d o i .
          <source>o r g / 1 0 . 1 0 1 6 / j . j s s . 2 0</source>
          <volume>2 0 . 1 1 0 8 6 9 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>H.-M. Chen</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Kazman</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Haziyev</surname>
          </string-name>
          ,
          <article-title>Agile Big Data Analytics for Web-Based Systems: An Architecture-Centric Approach</article-title>
          ,
          <source>IEEE Transactions on Big Data</source>
          <volume>2</volume>
          (
          <year>2016</year>
          )
          <fpage>234</fpage>
          -
          <lpage>248</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          <volume>0</volume>
          <fpage>9</fpage>
          <string-name>
            <surname>/ T B D A T A</surname>
          </string-name>
          .
          <volume>2 0 1 6 . 2 5 6 4 9 8 2 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Castellanos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pérez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Varela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. d. P.</given-names>
            <surname>Villamil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Correal</surname>
          </string-name>
          ,
          <article-title>A survey on big data analytics solutions deployment</article-title>
          , in: T.
          <string-name>
            <surname>Bures</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Duchien</surname>
          </string-name>
          , P. Inverardi (Eds.),
          <source>Software Architecture</source>
          , Springer International Publishing, Cham,
          <year>2019</year>
          , pp.
          <fpage>195</fpage>
          -
          <lpage>210</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ranjan</surname>
          </string-name>
          ,
          <article-title>Streaming big data processing in datacenter clouds</article-title>
          ,
          <source>IEEE Cloud Computing</source>
          <volume>1</volume>
          (
          <year>2014</year>
          )
          <fpage>78</fpage>
          -
          <lpage>83</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          0 9 / M C C .
          <volume>2 0 1 4 . 2</volume>
          <fpage>2</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Wegener</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rüping</surname>
          </string-name>
          ,
          <article-title>On reusing data mining in business processes-a pattern-based approach</article-title>
          , in: International Conference on Business Process Management, Springer,
          <year>2010</year>
          , pp.
          <fpage>264</fpage>
          -
          <lpage>276</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>H.-M. Chen</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Schütz</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Kazman</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Matthes</surname>
          </string-name>
          ,
          <article-title>How Lufthansa Capitalized on Big Data for Business Model Renovation</article-title>
          ,
          <source>MIS Quarterly Executive</source>
          <volume>1615</volume>
          (
          <year>2017</year>
          )
          <fpage>299</fpage>
          -
          <lpage>320</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Castellanos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Correal</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-D. Rodriguez</surname>
          </string-name>
          ,
          <article-title>Executing Architectural Models for Big Data Analytics</article-title>
          , in: C. E.
          <string-name>
            <surname>Cuesta</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Garlan</surname>
          </string-name>
          , J. Pérez (Eds.),
          <source>Software Architecture</source>
          , Springer International Publishing, Cham,
          <year>2018</year>
          , pp.
          <fpage>364</fpage>
          -
          <lpage>371</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Wojcik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bachmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Clements</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Merson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nord</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wood</surname>
          </string-name>
          ,
          <article-title>Attribute-driven design (ADD), version 2</article-title>
          .0,
          <string-name>
            <surname>Technical</surname>
            <given-names>Report</given-names>
          </string-name>
          , Carnegie Mellon University-SEI,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R. N.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , N. Medvidovic,
          <string-name>
            <surname>D. E. M.</surname>
          </string-name>
          ,
          <source>Software Architecture: Foundations, theory and practice</source>
          , John Wiley and Sons, Inc,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>