<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Big Data Management Challenges in SUPERSEDE</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sergi Nadal</string-name>
          <email>snadal@essi.upc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alberto Abelló</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oscar Romero</string-name>
          <email>oromero@essi.upc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jovan Varga</string-name>
          <email>jvarga@essi.upc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universitat Politècnica de Catalunya</institution>
          ,
          <addr-line>BarcelonaTech Barcelona</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>The H2020 SUPERSEDE (www.supersede.eu) project aims
to support decision-making in the evolution and adaptation
of software services and applications by exploiting end-user
feedback and runtime data, with the overall goal of
improving the end-users quality of experience (QoE). Such QoE
is defined as the overall performance of a system from the
point of view of users, which must consider both feedback
and runtime data gathered. End-user’s feedback is extracted
from online forums, app stores, social networks and novel
direct feedback channels, which connect software applications
and service users to developers. Runtime data is primarily
gathered by monitoring environmental sensors,
infrastructures and usage logs. Hereafter, we discuss our solutions for
the main data management challenges in SUPERSEDE.
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>CHALLENGES</title>
    </sec>
    <sec id="sec-3">
      <title>Big Data Governance</title>
      <p>One well-known problem of NOSQL repositories is the lack
of semantics caused by their schemaless properties. This lack
of schema prevents the system from knowing which data is
stored and how they interrelate. Thus, data analysts are
hindered with data management tasks, like understanding
the specific structure and parsing it, before writing their
analytical pipelines. In SUPERSEDE, this gets more challenging
as it aims at performing integrated analysis over multiple,
evolving and heterogeneous data sources. A challenge that
current Big Data technologies fail to address.</p>
      <p>
        Big Data ecosystems demand complex metadata
governance processes spanning throughout all data management
phases, from ingestion to analysis [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Semantic Web
technologies have proven to be a valid asset for such purpose.
The Resource Description Framework (RDF) allows to
flexibly define concepts and their relationships in the form of a
semantic graph. Furthermore, it can leverage on the Linked
Data initiative to (a) reuse existing vocabularies, (b) make
data self-descriptive, and (c) publish such data to faciliate
on-the-fly data crossing [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In SUPERSEDE, an
integrationoriented RDF graph is used to represent and integrate the
data related to monitoring and user feedback, as well as
crossing it with contextual data from the use cases. Also,
the analytical processes to support decision making are
represented on top of such concepts.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Big Data Architectures</title>
      <p>
        The λ-architecture [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is currently the most widespread
reference architecture for scalable and fault-tolerant Big
Data processing. While succeeding at managing humongous
amounts of data (i.e., in the Batch layer), as well as
nearreal time data streams (i.e., in the Speed layer), it has two
main drawbacks. First, it completely overlooks semantics,
as discussed before, as it uses NOSQL technologies as its
baseline components. Second, its vaguely defined, which
hinders its instantiation.
      </p>
      <p>(i) Refining the λ-architecture, by defining its components
as well as their interconnections, would facilitate its
instantiation and allow a simpler deployment of SUPERSEDE’s
Big Data ecosystem. (ii) To accommodate the requirements
on governance, metadata should be considered as first-class
citizen throughout the data management processes.
3.</p>
    </sec>
    <sec id="sec-5">
      <title>PARTICIPATION BENEFITS</title>
      <p>Our objective is twofold. Firstly, we aim at presenting
our approach to tackle the previously described challenges.
Secondly, by leading a round table, we aim at discussing
pros and cons of this and other solutions pursued by other
reserachers in similar settings.
4.</p>
    </sec>
    <sec id="sec-6">
      <title>ACKNOWLEDGEMENTS</title>
      <p>This work has been partly supported by the SUPERSEDE
project, funded by the European Union’s Information and
Communication Technologies Programme (H2020) under
grant agreement number 644018.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          .
          <article-title>Linked Data - The Story So Far</article-title>
          .
          <source>Int. J. Semantic Web Inf. Syst.</source>
          ,
          <volume>5</volume>
          (
          <issue>3</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>22</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Kandogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Schwarz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hui</surname>
          </string-name>
          , I. Terrizzano,
          <string-name>
            <given-names>C.</given-names>
            <surname>Christodoulakis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Miller</surname>
          </string-name>
          .
          <article-title>LabBook: Metadata-driven Social Collaborative Data Analysis</article-title>
          .
          <source>In IEEE Big Data</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Marz</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Warren</surname>
          </string-name>
          .
          <source>Big Data: Principles and Best Practices of Scalable Realtime Data Systems. Manning, 1st edition</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>