<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Data Integration for the Media Value Chain</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Henning Agt-Rickauer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jörg Waitelonis</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tabea Tietz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Harald Sack</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Hasso Plattner Institute</institution>
          ,
          <addr-line>Prof.-Dr.-Helmert-Str. 2-3, 14482 Potsdam</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>yovisto GmbH</institution>
          ,
          <addr-line>August-Bebel-Str. 26-53, 14482 Potsdam</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>With the switch from analog to digital technology the entire process of production, distribution, and archival of a film and tv program large amounts of data are created. Besides recorded and processed audiovisual information, in each single step of the production process and furthermore throughout the entire media value chain new metadata is created, administrated, and put into relation with already existing metadata mandatory for the management of these processes. Due to competing standards as well as to proprietary and incompatible interfaces of the applied software tools, a significant amount of this metadata cannot be reused and is not available for subsequent steps in the process chain. As a consequence most of this valuable information has to be costly recreated in each single step of media production, distribution, and archival. Currently, there is no generally accepted nor commonly used metadata exchange format that is applied throughout the media value chain. But, also the market for media production companies has changed dramatically towards the internet as being the preferred distribution channel for all media content. Today's available limited budget for media production companies puts additional pressure to work in a cost and time efficient way and not to waste resources due to the necessity of costly reengineering of lost metadata. The dwerft project aims to apply Linked Data principles for all metadata exchange through all steps of the media value chain [4]. Starting with the very first idea for a script, all metadata is converted according to either existing or newly developed ontologies to be reused in subsequent steps of the media value chain. Thus, metadata collected during the media production becomes a valuable asset not only for each step from pre- to postproduction, but also in distribution and archival. This paper presents results of the dwerft project about the successful integration of a set of film production tools based on the Linked Production Data Cloud, a technology platform for the film and tv industry to enable software interoperability used in production, distribution, and archival of audiovisual content.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The core of the dwerft project is the Linked Production Data Cloud (LPDC),
a technology platform for the film and television industry that allows lossless
interoperability between software and hardware tools used in production,
distribution, and archiving of audiovisual content. Based on Linked Open Data
principles [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] the LPDC stores and publishes semantic metadata originating
from different subtasks of the film production process under a unified
ontology schema. Fig. 1 provides an overview of the LPDC and connected production
tools of an example show case. The key components of the LPDC are: an
extensible vocabulary for metadata storage, a set of pre-defined converters for RDF
data generation, a framework to develop customized converters, a tool to
manage inserts and updates of RDF data including versioning, and a triplestore for
RDF data management and querying.
      </p>
      <p>
        The Film Ontology3 vocabulary was designed in collaboration with
domain experts to create a suitable terminology describing the different tasks of
media production and all associated metadata. The ontology schema is capable
of representing film scripts (e.g., scenes, scene content, characters, sets, etc.),
production planning metadata (e.g., film crew, departments, cast, filming
locations, shooting schedule, used equipment, etc.), on-set information (e.g., shots,
takes, and associated clips), post production metadata (e.g., timecodes, codecs,
resolutions, and formats of recorded and further processed clips), as well as
metadata for quality assessment of archived audiovisual material (e.g., surface
damages, splices, bulges, glued areas, etc.). Where ever possible, already existing
vocabularies have been reused, mapped, and interlinked, such as e.g., Broadcast
Metadata Exchange Format (BMF)4, EBUcore5, or DBpedia Ontology6. The
collaborative design of the Film Ontology was carried out with WebProtégé [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Currently, the vocabulary is further extended with rights management
information, film editing metadata (e.g., cut information), and technical metadata of
rendered movie containers for delivery and distribution (e.g., Material Exchange
Format (MXF)). None of the participating software applications was originally
capable of importing, exporting, or processing RDF data. First, a set of
cus3 http://filmontology.org
4 https://www.irt.de/en/activities/production/bmf.html
5 https://tech.ebu.ch/MetadataEbuCore
6 http://mappings.dbpedia.org/server/ontology/classes/
tomized converters was developed to transform proprietary metadata produced
by the tools into RDF representations conforming to the Film Ontology. The
analysis of the production workflows has shown that most of the created
production metadata is encoded in XML and CSV formats. Therefore, the dwerft
tools converter framework has been developed to efficiently create customized
CSV/XML-to-RDF converters7. The framework includes predefined converters
for a set of film production applications as well as a generic CSV/XML-to-RDF
converter that allows to create the required transformations on custom metadata
based on lightweight mapping definitions.
      </p>
      <p>RDF Metadata generated by different converters is stored in a RDF
triplestore and can be queried via SPARQL. As a proof of concept, semantic metadata
originating from a test film production at the Tempelhofer Feld in Berlin is
available for further use8 and can be searched9.</p>
      <p>
        In a setting where data from heterogenous sources is transformed, aggregated,
and stored in a triplestore, it is essential to manage updates of the data. In our
approach, we have integrated the linked data versioning system TailR [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. RDF
data generated by converters is first uploaded to TailR. In case the original data is
changed and converted again – as it usually often happens, as e.g., during filming,
when changes are made in dialogs to adapt them according to the intention of
the director or the preferences of an actor – , TailR stores each version and
generates RDF diffs. These are used to derive respective SPARQL insert and
delete statements in order to update the RDF data in the RDF store accordingly.
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>Integrated Film Production Applications</title>
      <p>An exemplary set of tools, representative for the different stages pre-production,
planning, shooting, post-production, distribution and archiving, was chosen,
analyzed with respect to interoperability and connected to the Linked Production
Data Cloud. DramaQueen10 is a script writing software to develop, visualize,
and analyze stories. It allows working from the first idea to the final script using
predefined formatting, storylines, characters, outline, synopsis, and story charts.
DramaQueen is a Java based standalone application and uses a proprietary data
format based on XML to store script projects. PreProducer 11 is a film production
management software to support the complete preproduction planning process.
It features general project management, script analysis, management of crew,
cast, inventory, and filming locations, development of shooting schedules,
budgeting and financial calculations. PreProducer is a web-based application and
offers partial export and import based on XML documents via a REST API.
LockitScript 12 is a mobile web application used during film shooting. It supports
7 The dwerft tools framework is available at https://github.com/yovisto/dwerft
8 http://filmontology.org/resource/DWERFT
9 http://filmontology.org/search/
10 http://dramaqueen.info/about-en/?lang=en
11 http://www.preproducer.com/index.html
12 http://lockitnetwork.com/home/
the script supervisor to oversee the continuity of the movie and keeps track of
the daily progress. It also manages the linking of scenes and takes to filmed clips
and uses a special hardware device to directly synchronize camera data with its
backend. LockitScript offers limited export facilities for daily reports and
camera metadata in the web interface. AVID Log Exchange (ALE)13 is a file format
used by various cameras and post-production tools (e.g., Arri Alexa, AVID
Media Composer, DaVinci Resolve, Silverstack) to exchange metadata about filmed
movie clips. The integration of ALE is challenging, because each tool defines
custom columns in the CSV format. While the previously described tools primarily
produce metadata, the distribution phase of a film production usually requires
metadata of all steps of the production process. Two tools already benefit from
the early availability of semantic metadata using SPARQL queries: rightsmap14,
a licence management solution for film and tv productions, and the
"Medienbegleitkarte" (MBK), a metadata set based on the Broadcast Metadata Exchange
Format (BMF) mandatory for delivery at German public-service tv
broadcasters. Finally, media condition analysis tools by the German Broadcasting Archive
directly insert analysis reports as RDF data into the LPDC.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusion and Outlook</title>
      <p>With the dwerft project and the LPDC framework a first subset of
applications and tools has been integrated for lossless metadata exchange in the media
production cycle. Metadata from media production and archival thus become
a valuable asset used to enable better search and retrieval as e.g. for video on
demand platforms, where it can also be used to support content-based
recommendation and customized advertising.</p>
      <p>Acknowledgement: This work has been funded by the German Government,
Federal Ministry of Education and Research under project number 03WKCJ4D.
13 http://www.avid.com/en/media-composer/features (Log and track metadata)
14 http://www.recoupmentpro.de/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          .
          <article-title>Linked Data: Evolving the Web Into a Global Data Space</article-title>
          . Synthesis Lectures on Web Engineering Series. Morgan &amp; Claypool,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>M.</given-names>
            <surname>Horridge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tudorache</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Nuylas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vendetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. F.</given-names>
            <surname>Noy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Musen</surname>
          </string-name>
          .
          <article-title>Webprotege: a collaborative web based platform for editing biomedical ontologies</article-title>
          . Bioinformatics, page btu256,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>P.</given-names>
            <surname>Meinhardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Knuth</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Sack</surname>
          </string-name>
          .
          <article-title>Tailr: a platform for preserving history on the web of data</article-title>
          .
          <source>In Proc.s of the 11th Int. Conf. on Semantic Systems</source>
          , pages
          <fpage>57</fpage>
          -
          <lpage>64</lpage>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>H.</given-names>
            <surname>Sack</surname>
          </string-name>
          .
          <article-title>From Script Idea to TV Rerun: The Idea of Linked Production Data in the Media Value Chain</article-title>
          .
          <source>In Proc. of the 24th Int. Conf. on World Wide Web Companion, WWW '15 Companion</source>
          , pages
          <fpage>719</fpage>
          -
          <lpage>720</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>