Data Integration for the Media Value Chain

    Henning Agt-Rickauer1 , Jörg Waitelonis2 , Tabea Tietz1 , and Harald Sack1
    1
        Hasso Plattner Institute, Prof.-Dr.-Helmert-Str. 2-3, 14482 Potsdam, Germany
                              {firstname.lastname}@hpi.de,
           2
             yovisto GmbH, August-Bebel-Str. 26-53, 14482 Potsdam, Germany
                                    joerg@yovisto.com


1       Introduction
With the switch from analog to digital technology the entire process of produc-
tion, distribution, and archival of a film and tv program large amounts of data
are created. Besides recorded and processed audiovisual information, in each sin-
gle step of the production process and furthermore throughout the entire media
value chain new metadata is created, administrated, and put into relation with
already existing metadata mandatory for the management of these processes.
Due to competing standards as well as to proprietary and incompatible inter-
faces of the applied software tools, a significant amount of this metadata cannot
be reused and is not available for subsequent steps in the process chain. As a
consequence most of this valuable information has to be costly recreated in each
single step of media production, distribution, and archival. Currently, there is no
generally accepted nor commonly used metadata exchange format that is applied
throughout the media value chain. But, also the market for media production
companies has changed dramatically towards the internet as being the preferred
distribution channel for all media content. Today’s available limited budget for
media production companies puts additional pressure to work in a cost and time
efficient way and not to waste resources due to the necessity of costly reengineer-
ing of lost metadata. The dwerft project aims to apply Linked Data principles
for all metadata exchange through all steps of the media value chain [4]. Start-
ing with the very first idea for a script, all metadata is converted according to
either existing or newly developed ontologies to be reused in subsequent steps of
the media value chain. Thus, metadata collected during the media production
becomes a valuable asset not only for each step from pre- to postproduction, but
also in distribution and archival.
     This paper presents results of the dwerft project about the successful integra-
tion of a set of film production tools based on the Linked Production Data Cloud,
a technology platform for the film and tv industry to enable software interoper-
ability used in production, distribution, and archival of audiovisual content.


2       Linked Production Data Cloud
The core of the dwerft project is the Linked Production Data Cloud (LPDC),
a technology platform for the film and television industry that allows lossless
interoperability between software and hardware tools used in production, dis-
tribution, and archiving of audiovisual content. Based on Linked Open Data
principles [1] the LPDC stores and publishes semantic metadata originating
from different subtasks of the film production process under a unified ontol-
ogy schema. Fig. 1 provides an overview of the LPDC and connected production
tools of an example show case. The key components of the LPDC are: an exten-
sible vocabulary for metadata storage, a set of pre-defined converters for RDF
data generation, a framework to develop customized converters, a tool to man-
age inserts and updates of RDF data including versioning, and a triplestore for
RDF data management and querying.


Fig. 1. Data integration use case for tools and applications in the media value chain


    The Film Ontology3 vocabulary was designed in collaboration with do-
main experts to create a suitable terminology describing the different tasks of
media production and all associated metadata. The ontology schema is capable
of representing film scripts (e.g., scenes, scene content, characters, sets, etc.),
production planning metadata (e.g., film crew, departments, cast, filming loca-
tions, shooting schedule, used equipment, etc.), on-set information (e.g., shots,
takes, and associated clips), post production metadata (e.g., timecodes, codecs,
resolutions, and formats of recorded and further processed clips), as well as
metadata for quality assessment of archived audiovisual material (e.g., surface
damages, splices, bulges, glued areas, etc.). Where ever possible, already existing
vocabularies have been reused, mapped, and interlinked, such as e.g., Broadcast
Metadata Exchange Format (BMF)4 , EBUcore5 , or DBpedia Ontology6 . The
collaborative design of the Film Ontology was carried out with WebProtégé [2].
Currently, the vocabulary is further extended with rights management informa-
tion, film editing metadata (e.g., cut information), and technical metadata of
rendered movie containers for delivery and distribution (e.g., Material Exchange
Format (MXF)). None of the participating software applications was originally
capable of importing, exporting, or processing RDF data. First, a set of cus-
3
  http://filmontology.org
4
  https://www.irt.de/en/activities/production/bmf.html
5
  https://tech.ebu.ch/MetadataEbuCore
6
  http://mappings.dbpedia.org/server/ontology/classes/
tomized converters was developed to transform proprietary metadata produced
by the tools into RDF representations conforming to the Film Ontology. The
analysis of the production workflows has shown that most of the created pro-
duction metadata is encoded in XML and CSV formats. Therefore, the dwerft
tools converter framework has been developed to efficiently create customized
CSV/XML-to-RDF converters7 . The framework includes predefined converters
for a set of film production applications as well as a generic CSV/XML-to-RDF
converter that allows to create the required transformations on custom metadata
based on lightweight mapping definitions.
    RDF Metadata generated by different converters is stored in a RDF triple-
store and can be queried via SPARQL. As a proof of concept, semantic metadata
originating from a test film production at the Tempelhofer Feld in Berlin is avail-
able for further use8 and can be searched9 .
    In a setting where data from heterogenous sources is transformed, aggregated,
and stored in a triplestore, it is essential to manage updates of the data. In our
approach, we have integrated the linked data versioning system TailR [3]. RDF
data generated by converters is first uploaded to TailR. In case the original data is
changed and converted again – as it usually often happens, as e.g., during filming,
when changes are made in dialogs to adapt them according to the intention of
the director or the preferences of an actor – , TailR stores each version and
generates RDF diffs. These are used to derive respective SPARQL insert and
delete statements in order to update the RDF data in the RDF store accordingly.


3    Integrated Film Production Applications

An exemplary set of tools, representative for the different stages pre-production,
planning, shooting, post-production, distribution and archiving, was chosen, an-
alyzed with respect to interoperability and connected to the Linked Production
Data Cloud. DramaQueen 10 is a script writing software to develop, visualize,
and analyze stories. It allows working from the first idea to the final script using
predefined formatting, storylines, characters, outline, synopsis, and story charts.
DramaQueen is a Java based standalone application and uses a proprietary data
format based on XML to store script projects. PreProducer 11 is a film production
management software to support the complete preproduction planning process.
It features general project management, script analysis, management of crew,
cast, inventory, and filming locations, development of shooting schedules, bud-
geting and financial calculations. PreProducer is a web-based application and
offers partial export and import based on XML documents via a REST API.
LockitScript 12 is a mobile web application used during film shooting. It supports
7
   The dwerft tools framework is available at https://github.com/yovisto/dwerft
8
   http://filmontology.org/resource/DWERFT
 9
   http://filmontology.org/search/
10
   http://dramaqueen.info/about-en/?lang=en
11
   http://www.preproducer.com/index.html
12
   http://lockitnetwork.com/home/
the script supervisor to oversee the continuity of the movie and keeps track of
the daily progress. It also manages the linking of scenes and takes to filmed clips
and uses a special hardware device to directly synchronize camera data with its
backend. LockitScript offers limited export facilities for daily reports and cam-
era metadata in the web interface. AVID Log Exchange (ALE)13 is a file format
used by various cameras and post-production tools (e.g., Arri Alexa, AVID Me-
dia Composer, DaVinci Resolve, Silverstack) to exchange metadata about filmed
movie clips. The integration of ALE is challenging, because each tool defines cus-
tom columns in the CSV format. While the previously described tools primarily
produce metadata, the distribution phase of a film production usually requires
metadata of all steps of the production process. Two tools already benefit from
the early availability of semantic metadata using SPARQL queries: rightsmap 14 ,
a licence management solution for film and tv productions, and the "Medienbe-
gleitkarte" (MBK), a metadata set based on the Broadcast Metadata Exchange
Format (BMF) mandatory for delivery at German public-service tv broadcast-
ers. Finally, media condition analysis tools by the German Broadcasting Archive
directly insert analysis reports as RDF data into the LPDC.


4      Conclusion and Outlook
With the dwerft project and the LPDC framework a first subset of applica-
tions and tools has been integrated for lossless metadata exchange in the media
production cycle. Metadata from media production and archival thus become
a valuable asset used to enable better search and retrieval as e.g. for video on
demand platforms, where it can also be used to support content-based recom-
mendation and customized advertising.
 Acknowledgement: This work has been funded by the German Government,
Federal Ministry of Education and Research under project number 03WKCJ4D.


References
1. T. Heath and C. Bizer. Linked Data: Evolving the Web Into a Global Data Space.
   Synthesis Lectures on Web Engineering Series. Morgan & Claypool, 2011.
2. M. Horridge, T. Tudorache, C. Nuylas, J. Vendetti, N. F. Noy, and M. A. Musen.
   Webprotege: a collaborative web based platform for editing biomedical ontologies.
   Bioinformatics, page btu256, 2014.
3. P. Meinhardt, M. Knuth, and H. Sack. Tailr: a platform for preserving history on
   the web of data. In Proc.s of the 11th Int. Conf. on Semantic Systems, pages 57–64.
   ACM, 2015.
4. H. Sack. From Script Idea to TV Rerun: The Idea of Linked Production Data in the
   Media Value Chain. In Proc. of the 24th Int. Conf. on World Wide Web Companion,
   WWW ’15 Companion, pages 719–720, 2015.

13
     http://www.avid.com/en/media-composer/features (Log and track metadata)
14
     http://www.recoupmentpro.de/