<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>G. Arcidiacono</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>E. W. De Luca</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>F. Fallucchi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. Pieroni</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Innovation &amp; Information Engineering (DIIE) Guglielmo Marconi University</institution>
          ,
          <addr-line>Via Plinio 44, Rome 00193</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we give an overview about the current research in Big Data and Digital Curation with a focus on Lean Six Sigma and discuss how this methodology can help the Digital Curation lifecycle. For instance, the application of the Lean Six Sigma methodology is presented and discussed with a special focus on the selection, preservation, maintenance, collection and archiving of digital information, the socalled Big Cultural Data. The aim of our work is to present a methodology for Digital Curation lifecycle, asserting that all the actions belonging to the Data Curation may be performed and optimized by using DMAIC (Define, Measure, Analyze, Improve, Control) phases of Lean Six Sigma.</p>
      </abstract>
      <kwd-group>
        <kwd>Lean Six Sigma</kwd>
        <kwd>Big Data</kwd>
        <kwd>Digital Curation</kwd>
        <kwd>Digital Humanities</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The vast amount of data in the field of cultural heritage makes it often difficult for
the interested person to retrieve the desired information. It has been thus imperative
need to introduce automatic methods that increase the relevance of hits by semantic
search. This can be achieved with the use of the Semantic Web. However, the various
meta-languages that have already been adopted in the Semantic Web are usually not
compatible with each other.</p>
      <p>Furthermore, Digital Curation is generally referring the process of establishing and
developing long-term repositories of digital assets for research issues. Enterprises are
starting to utilize Digital Curation to improve the quality of information and data
within their operational and strategic processes.</p>
      <p>One of the biggest challenges is to extract the relevant information from the huge
amount of data available in the digital world. In this paper, we give an overview about
the current research in Big Data and Digital Curation with a focus on Lean Six Sigma
(LSS) and discuss how this methodology can help the Digital Curation lifecycle.</p>
    </sec>
    <sec id="sec-2">
      <title>Literature Review and research scope</title>
      <p>
        The connection between Lean Six Sigma (LSS) and Big Data has been increasingly
established by the contribution of LSS to accelerate the process of extracting key
insights from Big Data, while highlighting how Big Data can bring new light and
innovation to projects requiring the use of Lean Six Sigma
        <xref ref-type="bibr" rid="ref7 ref8">(Fogarty 2015)</xref>
        . Big Data
have been given different definitions
        <xref ref-type="bibr" rid="ref5 ref9">(Franks, 2012; Dumbill, 2012; Gobble, 2013)</xref>
        ;
however, it is commonly agreed that Big Data is often defined as an increased amount
of data thanks to the internet, and to the data related to the use of wireless devices,
leading to the opinion that Big Data are the next step in innovation
        <xref ref-type="bibr" rid="ref9">(Gobble, 2013)</xref>
        .
      </p>
      <p>
        While the connection between LSS, Big Data and Digital Humanities Research is
still under preliminary debate, the connection between LSS, Big Data and
manufacturing has been established and discussed in previous research. In fact, as
regards manufacturing systems, the influence of the onset of Big Data and Big Data
analytics has been well identified, as well as the relationship of IoT (Bi &amp; Cochran
2014; Bi, Xu, &amp; Wang 2014) and Big Data analytics (Mo &amp; Li 2015), which
facilitated information visibility, and increased the level of automation in design and
manufacturing engineering (Wang &amp; Alexander 2015). However,
        <xref ref-type="bibr" rid="ref12">Parker (2014)</xref>
        commented that only about 10% of the value potential of the information collected
was actually utilized to enhance the level of management productivity. This because
high volume manufacturing system architecture may be different than low volume
products and there may be great difference in initial system maturity (Bi 2011).
      </p>
    </sec>
    <sec id="sec-3">
      <title>Metadata, Digital Curation and Digital Humanities</title>
      <p>Metadata is used to describe objects or processes regardless of the domain. There
exist a large number of metadata standards and formats. The cultural heritage domain
deals with developments like EAD (Encoded Archival Description), EAC-CPF
(Encodes Archival Context - Corporate Bodies, Persons, and Families), CIDOC-CRM
(CIDOC Conceptual Reference Model) or METS (Metadata Encoding and
Transmission Standard) to describe specific objects. EAD is an XML-based standard
for representing the structure of archival finding aids. Developed in 1993 it is based
on ISAD(G) (General International Standard Archival Description). EAD is widely
used in the USA. In Europe it is implemented more and more. It is expected, that most
of the data collected by the project will be in EAD. EAC-CPF was developed as a
supplement to EAD and was introduced in 2008. The description of persons, families
and institutions that are associated with the creation, preservation or use of the archive
or in any other way is in the focus of this standard. CIDOC-CRM is the acronym for
International Documentation Committee of the International Council of Museums
Conceptual Reference Model. The data model has the goal of data sharing and data
integration of heterogeneous data sets between different systems and disciplines of the
cultural heritage sector, primarily museums but also archives. Semantic definitions are
proposed for the transformation of distributed information into comprehensive global
resources. METS is the XML-based description of descriptive, administrative and
structural metadata for a digital collection. Here, the digital objects are in the
foreground.</p>
      <p>To represent and use the metadata provided by different institutions in different
metadata formats a top level exchange format is needed. Additionally, we need
automatic processes to support data curation. For using such automatic processes in
Digital Humanities, we need to first understand or give the definition of ‘Digital
Humanities’. This research field is a building bridge between information sciences
and the various humanities disciplines.</p>
      <p>
        However, openness was always associated with a need for introspection and some
tentative boundaries definitions
        <xref ref-type="bibr" rid="ref13">(Risam 2015)</xref>
        . This research domain is defined
dynamically in the negotiation of these tensions as discussed by several Digital
Humanities scholars (Unsworth 2002; Svensson 2009; Rockwell 2011). In this paper,
we understand Digital Humanities as a scenario for the Lean Six Sigma and its use
with Big Data.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Lean Six Sigma and Big Data</title>
      <p>
        Big Data processing can successfully be integrated by Lean Six Sigma (LSS) can,
as LSS is a strategy addressing entire process systems, aimed at reducing
non-valueadding activities, this by processing a huge amount of data. Ideally LSS can be
implemented to optimize performances of a varied range of systems, from the least to
the most complex, even when limited resources are available, to allow those same
limited resources to be spent most productively, such as within hospitals
        <xref ref-type="bibr" rid="ref3 ref4">(Arcidiacono, Wang, &amp; Yang 2015)</xref>
        . This because LSS isolates the main critical
stages and features of the whole process in sub-phases: problems are deconstructed
into smaller areas, to make process knowledge more accessible, and to solve process
issues with surgical precise actions
        <xref ref-type="bibr" rid="ref1 ref2">(Arcidiacono, Costantino, &amp; Yang, 2016)</xref>
        .
      </p>
      <p>Axiomatic Design (AD) is the tool to design the LSS training and process
management model, because it provides relevant criteria to critically analyse design.</p>
      <p>
        Most significantly within the scope of this research, AD is indeed a flexible tool
suitable to be effectively applied to varied range of context and scenarios whereby
process improvement and optimization is required
        <xref ref-type="bibr" rid="ref3 ref3 ref4 ref4">(Arcidiacono, Giorgetti, &amp;
Pugliese, 2015; Arcidiacono &amp; Placidoli, 2015)</xref>
        . The DMAIC (Define, Measure,
Analyze, Improve, Control) could be applied as the methodology framing the entire
optimization process, to determine the dependence of a system and process reliability
        <xref ref-type="bibr" rid="ref1 ref2">(Arcidiacono &amp; Bucciarelli, 2016)</xref>
        . Therefore, LSS could be effectively implemented
in Digital Humanities Research, specifically within Digital Curation, to isolate
relevant data, avoid data obsolescence, and enhance data availability and high-quality
research, by optimizing data extraction and its related functions.
3
      </p>
    </sec>
    <sec id="sec-5">
      <title>Our Approach: the use of LSS Methodology in Digital Curation</title>
      <p>
        The exploration and the definition of the context and boundaries that belong to the
Big Data Digital Humanities Research area continue to be considered as an unsolved
and complex system. There are studies in literature
        <xref ref-type="bibr" rid="ref10">(Kaplan, 2015)</xref>
        that attempt to
represent Big Data Research in Digital Humanities as a structured field, by proposing
a division of three concentric areas of study: Big Cultural Data, Digital Culture and
Digital Experiences. The aim of the author consists of proving that this huge amount
of information can be organized as structured field and, consequently, can be
characterized by common methodologies.
      </p>
      <p>The goal of this paper, instead, consists of investigating the application of
wellknown methodologies of process improvement and process optimization, such as
Lean Six Sigma methodology, to the Digital Curation aspects of the Digital
Humanities, where Digital Curation consists of selection, preservation, maintenance,
collection and archiving of digital information, with particular focus on the so-called
Big Cultural Data. In other words, Digital Curation involves maintaining, preserving
and adding value to digital data throughout its lifecycle. The active management of
data reduces risks to their long-term value and mitigates the threat of digital
obsolescence. Meanwhile, curated data in digital repositories may be shared among a
wider research community, increasing the intrinsic value of the Cultural Data, as
shown in Figure 1.</p>
      <p>Furthermore, Data Curation enhances the long-term value of existing data by
making it available for further high quality research. On the other hand, Cultural
humanists, involved in the Digital Curation aspects, are increasingly engaged with
curating and making accessible the digital materials.</p>
      <p>
        As said above, Lean Six Sigma methodology is successfully and widely used in
many areas such as government, industry, healthcare, and education. The
1 The source of Fig.1 is the blog article “Digital Curation: putting the pieces together” by Sue Waters, available at
http://suewaters.com/author/suewaters/. [Last accessed 10th November, 2016]
methodology is based on the use of the DMAIC approach, a data-driven quality
strategy, as an instrument usable during the phase of extraction, analysis and sorting
of the data
        <xref ref-type="bibr" rid="ref7 ref8">(Fogarty, 2015)</xref>
        . DMAIC is an acronym representing the five phases that
make up the optimization process. In particular, Define the problem, the improvement
activity, the opportunity for improvement, the project goals, and the customer
(internal and external) requirements. Measure process performance. Analyze the
process to determine root causes of variation and poor performance (defects). Improve
process performance by addressing and eliminating the root causes. Control the
improved process and future process performance.
      </p>
      <p>The processes of extraction, analysis and sorting the data allow to predict the future
trends and to achieve advantages in all environments. Lean Six Sigma is a complex
methodology where the accurate organization is able to observe and mitigate the
errors and the deviations occurring in its operations by applying strict rules. This
paper aims at initiating a new line of research that consists in investigating how
methodologies, such as LSS, may automate and optimize functions, such as
categorization, classification, clustering and digitalizing Big Cultural Data. This new
line of research should study, first of all, the meaning of Data Curation, which are the
actions to be performed and which actions may be automated and supported by digital
instruments, such as LSS. The first action, as shown in Figure 2, consists in describing
and representing the information: appropriate standards should be used in order to
describe metadata, so that it can be controlled over the long term.</p>
      <p>Furthermore, all metadata and associated digital material should be represented in
appropriate formats. The second action consists in building a preservation strategy:
this action is important to plan for preservation throughout the data lifecycle.</p>
      <p>Collaborating, supervising, and participating are the actions to be performed in
order to supervise data creation activities and to assist in the creation of the standards
to be used. Finally, curating and preserving represent the actions to be performed to
take into account the managerial and administrative aspects.
2 http://oxdrrc.blogspot.it/2008/12/research-data-management-services.html “Research Data Management and
Curation Services Framework”, Oxford University Digital Repository</p>
    </sec>
    <sec id="sec-6">
      <title>Final Remarks and Conclusions</title>
      <p>In this paper we have given a research overview on Big Data and Digital Curation
and we have proposed the application of well-known methodologies of process
improvement and process optimization, such as LSS methodology, to the Digital
Curation aspects of the Digital Humanities. Thus, the LSS is applied for the selection,
preservation, maintenance, collection and archiving of digital information, with
particular focus on the so-called Big Cultural Data. Furthermore, we have discussed
how this methodology may help the Digital Curation lifecycle, asserting that all the
actions belonging to the Data Curation may be performed and optimized by using
DMAIC phases of LSS.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Arcidiacono</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Bucciarelli</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>TRIZ: Engineering Methodologies to Improve the Process Reliability</article-title>
          . Quality and Reliability Engineering International Journal,
          <volume>32</volume>
          (
          <issue>7</issue>
          ):
          <fpage>2537</fpage>
          -
          <lpage>2547</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Arcidiacono</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Costantino</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>The AMSE Lean Six Sigma Governance Model</article-title>
          .
          <source>International Journal of Lean Six Sigma</source>
          ,
          <volume>7</volume>
          (
          <issue>3</issue>
          ):
          <fpage>233</fpage>
          -
          <lpage>266</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Arcidiacono</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giorgetti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Pugliese</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Axiomatic Design to improve PRM airport assistance</article-title>
          .
          <source>In: Proceedings of ICAD</source>
          <year>2015</year>
          , 9th International Conference on Axiomatic Design,
          <string-name>
            <surname>edited by M. K. Thompson</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Matt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Giorgetti</surname>
            ,
            <given-names>N. P.</given-names>
          </string-name>
          <string-name>
            <surname>Suh</surname>
            , and
            <given-names>P. Citti P.</given-names>
          </string-name>
          ,
          <volume>106</volume>
          -
          <fpage>111</fpage>
          . Red Hook: Curran Associates.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Arcidiacono</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Placidoli</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Reality and illusion in Virtual Studios: Axiomatic Design applied to television recording</article-title>
          ,
          <source>In: Proceedings of ICAD</source>
          <year>2015</year>
          , 9th International Conference on Axiomatic Design,
          <string-name>
            <surname>edited by M. K. Thompson</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Matt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Giorgetti</surname>
            ,
            <given-names>N. P.</given-names>
          </string-name>
          <string-name>
            <surname>Suh</surname>
            , and
            <given-names>P. Citti P.</given-names>
          </string-name>
          ,
          <volume>137</volume>
          -
          <fpage>142</fpage>
          . Red Hook: Curran Associates.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Dumbill</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>What is Big Data? An Introduction to the Big Data Landscape</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>O'Reilly Strata</surname>
          </string-name>
          . http://strata.oreilly.com/
          <year>2012</year>
          /01/whatis- big
          <source>-data.html.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Fogarty</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Lean Six Sigma and Big Data: Continuing to Innovate and Optimize Business Processes</article-title>
          .
          <source>Journal of Management and Innovation;</source>
          <volume>1</volume>
          (
          <issue>2</issue>
          ):
          <fpage>2</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Fogarty</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Lean Six Sigma and Data Analytics: Integrating Complementary Activities</article-title>
          .
          <source>Global Journal of Advanced Research</source>
          ; Vol.
          <volume>2</volume>
          , Issue 2 Franks,
          <string-name>
            <surname>B.</surname>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics</article-title>
          . New York: Wiley.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Gobble</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Big Data: The Next Big Thing in Innovation</article-title>
          . Research and Technology Management;
          <volume>56</volume>
          (
          <issue>1</issue>
          ):
          <fpage>64</fpage>
          -
          <lpage>66</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Kaplan</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>A map for big data research in digital humanities</article-title>
          .
          <source>Front</source>
          . Digit.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          Humanit.
          <article-title>2:1</article-title>
          . doi:
          <volume>10</volume>
          .3389/fdigh.
          <year>2015</year>
          .00001 Mo,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Research of Big Data Based on the Views of Technology and Application</article-title>
          .
          <source>American Journal of Industrial and business Management;</source>
          <volume>5</volume>
          :
          <fpage>192</fpage>
          -
          <lpage>197</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Parker</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Big Data and Analytics in Manufacturing: Driving New Levels of Management productivity</article-title>
          .
          <source>IDC Manufacturing Insights; MI250786</source>
          . Available from: http://www.tcs.com/SiteCollectionDocuments/White%20Papers/
          <string-name>
            <surname>Big-Data-AnalyticsManufacturing-</surname>
            0914-1.pdf Wang,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alexander</surname>
            ,
            <given-names>C. A.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Big Data in Design and Manufacturing Engineering</article-title>
          . American Journal of Engineering and Applied Sciences;
          <volume>8</volume>
          (
          <issue>2</issue>
          ):
          <fpage>223</fpage>
          -
          <lpage>232</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Risam</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Beyond the Margins: Intersectionality and the Digital Humanities</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          http://www.digitalhumanities.org/dhq/vol/9/2/000208/000208.html#transformdh2012 Svensson,
          <string-name>
            <surname>P.</surname>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Humanities computing as digital huminites</article-title>
          .
          <source>Digital Humanities Quaterly</source>
          <volume>3</volume>
          :
          <fpage>3</fpage>
          . http://www.digitalhumanities.org/dhq/vol/3/3/000065/000065.html Unsworth,
          <string-name>
            <surname>J.</surname>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>What is humanities computing and what is it not? In Jahrbuch für Computer philologie</article-title>
          , Vol.
          <volume>4</volume>
          ,
          <string-name>
            <surname>Edited</surname>
            <given-names>by G.</given-names>
          </string-name>
          <string-name>
            <surname>Braungart</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Eibl</surname>
            , and
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Jannidis</surname>
          </string-name>
          ,
          <volume>71</volume>
          -
          <fpage>84</fpage>
          . Paderborn: Menis Verlag
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>